# wgml: cross-platform GPU LLM inference **/!\ This library is still under heavy development and is still missing many features.** The goal of **wgml** is to provide composable WGSl shaders and kernels for cross-platform GPU LLM inference. ## Running the models Currently, the `gpt2` and `llama2` models are implemented. They can be loaded from gguf files. Support of quantization is very limited (tensors are systematically unquantized upon loading) and somewhat untested. A very basic execution of these LLMs can be run from the examples. ### Running GPT-2 ```shell cargo run -p wgml --example gpt2 -- your_model_file_path.gguf --prompt "How do I bake a cake?" ``` Note that this will run both the gpu version and cpu version of the transformer. ### Running llama-2 ```shell cargo run -p wgml --example llama2 -- your_model_file_path.gguf ``` Note that this will run both the cpu version and gpu version of the transformer.