{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quick Start Examples for `constriction`'s Python API\n", "\n", "- **Author:** Robert Bamler, University of Tuebingen\n", "- **Initial Publication Date:** Jan 4, 2022\n", "\n", "This is an interactive jupyter notebook.\n", "You can read this notebook [online](https://github.com/bamler-lab/constriction/blob/main/examples/python/01-hello-world.ipynb) but if you want to execute any code, we recommend to [download](https://raw.githubusercontent.com/bamler-lab/constriction/main/examples/python/01-hello-world.ipynb) it.\n", "\n", "More examples, tutorials, and reference materials are available at ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Constriction\n", "\n", "Before you start, install `constriction` by executing the following cell, then restart your jupyter kernel:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install --upgrade constriction~=0.3.5 # (this will automatically also install numpy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Don't forget to restart your jupyter kernel now.**\n", "Then test if you can import `constriction`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import constriction # This should produce no output (in particular, no error messages)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1: Hello, World\n", "\n", "The following cell implements a very simple encoding-decoding round trip using `constriction`'s ANS coder.\n", "We'll explain what's going on and also show how to use a different entropy coder below." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original message: [ 6 10 -4 2 -9 41 3 0 2]\n", "compressed representation: [3436391223 862640052]\n", "(in binary: ['0b11001100110100110010101100110111', '0b110011011010101101011110110100'])\n", "Reconstructed message: [ 6 10 -4 2 -9 41 3 0 2]\n" ] } ], "source": [ "import constriction\n", "import numpy as np\n", "\n", "# Define some example message and entropy model:\n", "message = np.array([6, 10, -4, 2, -9, 41, 3, 0, 2 ], dtype=np.int32)\n", "means = np.array([2.5, 13.1, -1.1, -3.0, -6.1, 34.2, 2.8, -6.4, -3.1], dtype=np.float64)\n", "stds = np.array([4.1, 8.7, 6.2, 5.4, 24.1, 12.7, 4.9, 28.9, 4.2], dtype=np.float64)\n", "model_family = constriction.stream.model.QuantizedGaussian(-100, 100) # We'll provide `means` and `stds` when encoding/decoding.\n", "print(f\"Original message: {message}\")\n", "\n", "# Encode the message:\n", "encoder = constriction.stream.stack.AnsCoder()\n", "encoder.encode_reverse(message, model_family, means, stds)\n", "\n", "# Get and print the compressed representation:\n", "compressed = encoder.get_compressed()\n", "print(f\"compressed representation: {compressed}\")\n", "print(f\"(in binary: {[bin(word) for word in compressed]})\")\n", "\n", "# Decode the message:\n", "decoder = constriction.stream.stack.AnsCoder(compressed) # (we could also just reuse `encoder`.)\n", "reconstructed = decoder.decode(model_family, means, stds)\n", "print(f\"Reconstructed message: {reconstructed}\")\n", "assert np.all(reconstructed == message)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What's Going on Here?\n", "\n", "The above example compresses and then decompresses a short example message using one of the entropy coders provided by `constriction`.\n", "All messages in `constriction` are sequences of integers (\"symbols\"), represented as a rank-1 numpy array with `dtype=np.int32`.\n", "\n", "The variables `mean` and `stds` define an entropy model (see [explanation below](#Background-Information-on-Entropy-Models)).\n", "In our example, the entropy model for each symbol is a [`QuantizedGaussian`](https://bamler-lab.github.io/constriction/apidoc/python/stream/model.html#constriction.stream.model.QuantizedGaussian) distribution (see [below](#The-Specific-Entropy-Model-Used-Here)), which is a common type of entropy model in novel machine-learning based compression methods.\n", "Other entropy models are supported by `constriction` (see [API documentation](https://bamler-lab.github.io/constriction/apidoc/python/stream/model.html)), including custom models (see [next tutorial in this series](https://github.com/bamler-lab/constriction/blob/main/examples/python/02-custom-entropy-models.ipynb)).\n", "More precisely, the entropy model for the first symbol of the message in the above example is a `QuantizedGaussian` with mean 2.5 and standard deviation 4.1, the model for the second symbol has mean 13.1 and standard deviation 8.7, and so on.\n", "\n", "The next few lines of the above example *encode* the message.\n", "We use an [Asymmetric Numeral Systems (ANS)](https://en.wikipedia.org/wiki/Asymmetric_numeral_systems) entropy coder here, but we show [below](#Example-2:-Switching-Out-the-Entropy-Coding-Algorithm) how we can use a different entropy coder just as well.\n", "The actual encoding procedure happens in the method `encode_reverse`.\n", "The suffix \"_reverse\" is to remind us that ANS operates as a *stack* (i.e., \"last in first out\").\n", "We therefore encode the symbols in reverse order here so that we can subsequently decode them in forward order.\n", "\n", "Next, we obtain the compressed representation and print it.\n", "In `constriction`, compressed data is, by default, represented as an array of unsigned 32-bit integers.\n", "See [below](#Example-3:-Writing-Compressed-Data-to-a-File) for an example that writes compressed data to a file.\n", "\n", "The final four lines of code above *decode* the message from the compressed data and verify its integrity.\n", "We pass `compressed` as an argument to `AnsCoder` here, and we then call `decode` on it (without the suffix \"_reverse\").\n", "\n", "### Background Information on Entropy Models\n", "\n", "The above example uses an entropy model (defined by `means`, `stds`, and `model_family`) for both encoding and decoding.\n", "The entropy *model* and the entropy *coder* together comprise a lossless compression method on which two parties have to agree before they can meaningfully exchange any compressed data.\n", "The entropy *model* is a probability distribution over all conceivable messages.\n", "The job of an entropy *coder* is to come up with an encoding/decoding scheme that minimizes the *expected* bitrate under the entropy model.\n", "Thus, the coder has to assign short compressed representations to the most probable messages under the model at the cost of having to assigning longer compressed representations to less probable messages.\n", "This job is conveniently taken care of by the various entropy coders provided by `constriction`.\n", "\n", "### The Specific Entropy Model Used Here\n", "\n", "In the above example, we use an entropy model that factorizes over all symbols in the message (if you want to model correlations between symbols, you can use autoregressive models or the bits-back trick, see section [\"further reading\"](#Further-Reading) below).\n", "The marginal probability distribution for each symbol is a quantized (aka discretized) form of a Gaussian distribution, as it often arises in novel machine-learning based compression methods.\n", "More precisely, we model the probability that the $i$'th symbol $X_i$ of the message takes some integer value $x_i$ as follows,\n", "\\begin{align}\n", " P(X_i \\! = \\! x_i) = \\int_{x_i-\\frac12}^{x_i+\\frac12} f_{\\mathcal N}(\\xi;\\mu_i,\\sigma_i^2) \\,\\text{d}\\xi\n", " \\quad\\forall x_i\\in \\mathbb Z\n", "\\end{align}\n", "where $f_{\\mathcal N}(\\,\\cdot\\,;\\mu_i,\\sigma_i^2)$ is the probability density function of a normal distribution with mean $\\mu_i$ and standard deviation $\\sigma_i$.\n", "The means and standard deviations of our entropy models are assigned to variables `means` and `stds` in the above code example.\n", "\n", "The entropy coder slightly modifies the model by rounding all probabilities $P(X_i \\! = \\! x_i)$ to a fixed-point representation with some finite precision, while enforcing three guarantees:\n", "(i) all integers within the range from `-100` to `100` (defined by our arguments to the constructor, `QuantizedGaussian(-100, 100)`) are guaranteed to have a nonzero probability (so that they can be encoded without error);\n", "(ii) the probabilities within this range are guaranteed to sum *exactly* to one (despite the finite precision), and all integers outside of this range have exactly zero probability and cannot be encoded (and also will never be returned when decoding random compressed data with an `AnsCoder`); and\n", "(iii) the model is *exactly* invertible: encoding and decoding internally evaluate the model's cumulative distribution function and the model's quantile function, and `constriction` ensures (via fixed-point arithmetic) that these two functions are the exact inverse of each other since even tiny rounding errors could otherwise have catastrophic effects in an entropy coder." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 2: Switching Out the Entropy Coding Algorithm\n", "\n", "The [above example](#Example-1:-Hello,-World) used Asymmetric Numeral Systems (ANS) for entropy coding.\n", "We can also use a [Range Coder](https://en.wikipedia.org/wiki/Range_coding) instead.\n", "Before you look at the modified example below, try writing it yourself:\n", "\n", "- Start from [example 1 above](#Example-1:-Hello,-World) and replace `stack.AnsCoder` with `queue.RangeEncoder` for the encoder and with `queue.RangeDecoder` for the decoder (Range Coding uses different data structures for encoding and decoding because, in contrast to ANS, you generally lose the ability to encode any additional symbols once you start decoding with a Range Coder).\n", "- Replace `encode_reverse` with `encode` (i.e., drop the suffix \"_reverse\") because range coding operates as a queue (i.e., \"first in first out\").\n", "\n", "Your result should look as follows:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original message: [ 6 10 -4 2 -9 41 3 0 2]\n", "compressed representation: [3400499119 1762784004]\n", "(in binary: ['0b11001010101011110111111110101111', '0b1101001000100011111001100000100'])\n", "Reconstructed message: [ 6 10 -4 2 -9 41 3 0 2]\n" ] } ], "source": [ "import constriction\n", "import numpy as np\n", "\n", "# Define some example message and entropy model:\n", "message = np.array([6, 10, -4, 2, -9, 41, 3, 0, 2 ], dtype=np.int32)\n", "means = np.array([2.5, 13.1, -1.1, -3.0, -6.1, 34.2, 2.8, -6.4, -3.1], dtype=np.float64)\n", "stds = np.array([4.1, 8.7, 6.2, 5.4, 24.1, 12.7, 4.9, 28.9, 4.2], dtype=np.float64)\n", "model_family = constriction.stream.model.QuantizedGaussian(-100, 100) # We'll provide `means` and `stds` when encoding/decoding.\n", "print(f\"Original message: {message}\")\n", "\n", "# Encode the message:\n", "encoder = constriction.stream.queue.RangeEncoder()\n", "encoder.encode(message, model_family, means, stds)\n", "\n", "# Get and print the compressed representation:\n", "compressed = encoder.get_compressed()\n", "print(f\"compressed representation: {compressed}\")\n", "print(f\"(in binary: {[bin(word) for word in compressed]})\")\n", "\n", "# Decode the message:\n", "decoder = constriction.stream.queue.RangeDecoder(compressed)\n", "reconstructed = decoder.decode(model_family, means, stds)\n", "print(f\"Reconstructed message: {reconstructed}\")\n", "assert np.all(reconstructed == message)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 3: More Complex Entropy Models\n", "\n", "In Example 2 above, we changed the entropy coder from ANS to Range Coding but we left the entropy *model* unchanged.\n", "In this example, let's keep the ANS entropy coder but change the entropy model instead.\n", "Rather than modeling each symbol with a Quantized Gaussian distribution, we'll model only the first 6 symbols this way.\n", "For the last 3 symbols, we assume they all drawn from the *same* categorical distribution (we could also use an individual categorical distribution for each symbol, but we want to demonstrate how to encode and decode i.i.d. symbols in this example):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original message: [ 6 10 -4 2 -9 41 3 0 2]\n", "compressed representation: [3400506403 2908157178]\n", "(in binary: ['0b11001010101011111001110000100011', '0b10101101010101101111010011111010'])\n", "Reconstructed message: [ 6 10 -4 2 -9 41 3 0 2]\n" ] } ], "source": [ "import constriction\n", "import numpy as np\n", "\n", "# Same message as above, but a complex entropy model consisting of two parts:\n", "message = np.array([6, 10, -4, 2, -9, 41, 3, 0, 2 ], dtype=np.int32)\n", "means = np.array([2.5, 13.1, -1.1, -3.0, -6.1, 34.2], dtype=np.float64)\n", "stds = np.array([4.1, 8.7, 6.2, 5.4, 24.1, 12.7], dtype=np.float64)\n", "model_family1 = constriction.stream.model.QuantizedGaussian(-50, 50)\n", "model2 = constriction.stream.model.Categorical(np.array(\n", " [0.2, 0.1, 0.3, 0.4], dtype=np.float64)) # Specifies Probabilities of the symbols 0, 1, 2, 3.\n", "print(f\"Original message: {message}\")\n", "\n", "# Encode both parts of the message:\n", "encoder = constriction.stream.queue.RangeEncoder()\n", "encoder.encode(message[0:6], model_family1, means, stds)\n", "encoder.encode(message[6:9], model2) # No model parameters provided here since `model2` is already fully parameterized.\n", "\n", "# Get and print the compressed representation:\n", "compressed = encoder.get_compressed()\n", "print(f\"compressed representation: {compressed}\")\n", "print(f\"(in binary: {[bin(word) for word in compressed]})\")\n", "\n", "# Decode the message:\n", "decoder = constriction.stream.queue.RangeDecoder(compressed)\n", "reconstructed1 = decoder.decode(model_family1, means, stds)\n", "reconstructed2 = decoder.decode(model2, 3) # (decodes 3 additional symbols)\n", "reconstructed = np.concatenate((reconstructed1, reconstructed2))\n", "print(f\"Reconstructed message: {reconstructed}\")\n", "assert np.all(reconstructed == message)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We leave it as an exercise to the reader to change the entropy coder in the above example back to an ANS coder. (**Hint:** since ANS operates as a stack, you'll have to encode `message[6:9]` *before* encoding `message[0:6]`.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 4: Writing Compressed Data to a File\n", "\n", "In `constriction`, compressed data is represented by default as an array of unsigned 32-bit integers.\n", "Such data can trivially be written to a file or network socket.\n", "However, make sure you use a well-defined byte order ([\"endianness\"](https://en.wikipedia.org/wiki/Endianness)) so that data saved on one computer architecture can be read on another computer architecture.\n", "Here's Example 1 from above, but this time divided into two parts that only communicate via a file." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original message: [ 6 10 -4 2 -9 41 9 69 -6]\n", "Compressed data saved to file \"temporary-demo-file.bin\".\n" ] } ], "source": [ "import constriction\n", "import numpy as np\n", "import sys\n", "\n", "# Define some example message and entropy model:\n", "message = np.array([6, 10, -4, 2, -9, 41, 9, 69, -6 ], dtype=np.int32)\n", "means = np.array([2.5, 13.1, -1.1, -3.0, -6.1, 34.2, 12.8, 56.4, -3.1], dtype=np.float64)\n", "stds = np.array([4.1, 8.7, 6.2, 5.4, 24.1, 12.7, 4.9, 28.9, 4.2], dtype=np.float64)\n", "model_family = constriction.stream.model.QuantizedGaussian(-100, 100) # We'll provide `means` and `stds` when encoding/decoding.\n", "print(f\"Original message: {message}\")\n", "\n", "# Encode the message:\n", "encoder = constriction.stream.stack.AnsCoder()\n", "encoder.encode_reverse(message, model_family, means, stds)\n", "\n", "# Get the compressed representation and save it to a file:\n", "compressed = encoder.get_compressed()\n", "if sys.byteorder != 'little':\n", " # Let's use the convention that we always save data in little-endian byte order.\n", " compressed.byteswap(inplace=True)\n", "compressed.tofile('temporary-demo-file.bin')\n", "print(f'Compressed data saved to file \"temporary-demo-file.bin\".')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Read 2 words of data from \"temporary-demo-file.bin\".\n", "Reconstructed message: [ 6 10 -4 2 -9 41 9 69 -6]\n" ] } ], "source": [ "# Read the compressed representation from the file:\n", "compressed_read = np.fromfile('temporary-demo-file.bin', dtype=np.uint32)\n", "print(f'Read {len(compressed_read)} words of data from \"temporary-demo-file.bin\".')\n", "if sys.byteorder != 'little':\n", " # Turn data into native byte order before passing it to `constriction`\n", " compressed_read.byteswap(inplace=True)\n", "\n", "# Decode the message:\n", "decoder = constriction.stream.stack.AnsCoder(compressed_read)\n", "reconstructed = decoder.decode(model_family, means, stds)\n", "print(f\"Reconstructed message: {reconstructed}\")\n", "assert np.all(reconstructed == message)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Reading\n", "\n", "You now know how to use `constriction`'s Python API for some basic encoding and decoding operations.\n", "The [website](https://bamler-lab.github.io/constriction/) has links to more examples and tutorials.\n", "\n", "If you have a specific question, go to `constriction`'s [Python API documentation](https://bamler-lab.github.io/constriction/apidoc/python/).\n", "\n", "If you're still new to the concept of entropy coding, check out the [teaching material](https://robamler.github.io/teaching/compress21/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0rc2 (main, Sep 12 2022, 16:20:24) [GCC 12.2.0]" }, "vscode": { "interpreter": { "hash": "ead1b95f633dc9c51826328e1846203f51a198c6fb5f2884a80417ba131d4e82" } } }, "nbformat": 4, "nbformat_minor": 2 }