- do start position computation and 0xFF 0x00 escaping on the CPU using SIMD - use `override` in the shader once naga supports that, and make the shader more flexible - implement a faster huffman decoding strategy - play with loading 256 Bytes of the bitstream per thread into LDS, and, if needed, performing more than 1 shader pass (requires using `dispatch_workgroups_indirect`)