{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "5d24a692", "metadata": {}, "source": [ "# Example of using elementwise activation functions in the CUTLASS Python interface\n", "This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs with different epilogues.\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/cutlass/blob/main/examples/python/01_epilogue.ipynb)\n" ] }, { "cell_type": "markdown", "id": "28c916da", "metadata": {}, "source": [ "## Prerequisites for running on Colab\n", "This notebook requires an NVIDIA GPU. If `nvidia-smi` fails, go to Runtime -> Change runtime type -> Hardware accelerator and confirm a GPU is selected." ] }, { "cell_type": "code", "execution_count": null, "id": "0fcea8ea", "metadata": {}, "outputs": [], "source": [ "!#nvidia-smi" ] }, { "cell_type": "markdown", "id": "7ec60b57", "metadata": {}, "source": [ "If running on Colab, you will need to install the CUTLASS Python interface. To do so, uncomment the following line and run the cell:" ] }, { "cell_type": "code", "execution_count": null, "id": "1db9e51c", "metadata": {}, "outputs": [], "source": [ "!#pip install nvidia-cutlass" ] }, { "attachments": {}, "cell_type": "markdown", "id": "962324fd", "metadata": {}, "source": [ "## General setup\n", "We first import various packages needed for the example and construct the input and output tensors that will be used in our example." ] }, { "cell_type": "code", "execution_count": null, "id": "63a70a3c", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import cutlass\n", "\n", "# This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to\n", "# omit this information.\n", "print_module = True\n", "\n", "m = 256\n", "n = m\n", "k = m\n", "\n", "type_A = np.float16\n", "type_B = np.float16\n", "type_C = np.float16\n", "type_D = np.float16\n", "\n", "np.random.seed(1234)\n", "scope_min = -4\n", "scope_max = 4\n", "tensor_A = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, k)).astype(type_A))\n", "tensor_B = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(k, n)).astype(type_B))\n", "tensor_C = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, n)).astype(type_C))\n", "\n", "alpha = np.float16(1.)\n", "beta = np.float16(0.)\n", "\n", "tensor_D = np.zeros(tensor_C.shape).astype(type_D)" ] }, { "cell_type": "markdown", "id": "1eb0d95b", "metadata": {}, "source": [ "## Run a GEMM with an identity activation function\n", "To begin, we simply run a default GEMM with an identity activation function. This performs the well-known operation `D = alpha * (A @ B) + beta * C`. This is the default activation function used, and does not need to be specified." ] }, { "cell_type": "code", "execution_count": null, "id": "8d257833", "metadata": {}, "outputs": [], "source": [ "plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)\n", "plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)" ] }, { "cell_type": "markdown", "id": "54961694", "metadata": {}, "source": [ "## Run a GEMM with a ReLU element-wise activation function\n", "CUTLASS makes it easy to support other element-wise activation functions. This results in performing an element-wise after the generic linear combination performed in a GEMM. If we call such an activation function `act`, the resulting formulation is:\n", "```\n", "D = alpha * (A @ B) + beta * C\n", "D = act(D)\n", "```\n", "\n", "Here, we will add a ReLU activation function. Given an input `x`, ReLU returns `max(x, 0)`.\n", "\n", "This is easy to do in CUTLASS. One only needs to set the plan's `activation` field." ] }, { "cell_type": "code", "execution_count": null, "id": "5fe49443", "metadata": {}, "outputs": [], "source": [ "tensor_D_relu = np.zeros(tensor_C.shape).astype(type_D)\n", "plan.activation = \"relu\"\n", "plan.run(tensor_A, tensor_B, tensor_C, tensor_D_relu, print_module=print_module)" ] }, { "cell_type": "markdown", "id": "455d0a37", "metadata": {}, "source": [ "We can now verify that the result of the GEMM that used a ReLU activation function:" ] }, { "cell_type": "code", "execution_count": null, "id": "e32e7798", "metadata": {}, "outputs": [], "source": [ "relu_ref = (tensor_D >= 0).astype(type_D) * tensor_D\n", "np.testing.assert_array_equal(relu_ref, tensor_D_relu)" ] }, { "cell_type": "markdown", "id": "cf959171", "metadata": {}, "source": [ "## Other element-wise activation functions\n", "CUTLASS supports a variety of widely-used element-wise activation functions. We can obtain a list of these functions via the `get_activations()` method." ] }, { "cell_type": "code", "execution_count": null, "id": "9e17d730", "metadata": {}, "outputs": [], "source": [ "activations = plan.activations()\n", "for activation in activations:\n", " print(activation)" ] }, { "cell_type": "markdown", "id": "0e4599fa", "metadata": {}, "source": [ "We can then run each of them:" ] }, { "cell_type": "code", "execution_count": null, "id": "9c3598c9", "metadata": {}, "outputs": [], "source": [ "for activation in activations:\n", " print('=============================================================================================')\n", " print(f'Compiling and running activation {activation}')\n", " print('=============================================================================================')\n", " plan.activation = activation\n", " plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)" ] }, { "cell_type": "markdown", "id": "18828622", "metadata": {}, "source": [ "To add an activation with parameter such as `leaky_relu`, a tuple should be provided containing the activation function name and the (or a list of) parameter." ] }, { "cell_type": "code", "execution_count": null, "id": "53108eae", "metadata": {}, "outputs": [], "source": [ "negative_slope = 0.5\n", "plan.activation = (\"leaky_relu\", negative_slope)\n", "plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }