Crates.io | wgml |
lib.rs | wgml |
version | 0.2.0 |
source | src |
created_at | 2024-10-16 13:42:10.891873 |
updated_at | 2024-11-24 16:59:15.509686 |
description | Cross-platform GPU LLM inference. |
homepage | https://wgmath.rs |
repository | https://github.com/dimforge/wgmath |
max_upload_size | |
id | 1411819 |
size | 229,382 |
/!\ This library is still under heavy development and is still missing many features.
The goal of wgml is to provide composable WGSl shaders and kernels for cross-platform GPU LLM inference.
Currently, the gpt2
and llama2
models are implemented. They can be loaded from gguf files. Support of quantization
is very limited (tensors are systematically unquantized upon loading) and somewhat untested. A very basic execution
of these LLMs can be run from the examples.
cargo run -p wgml --example gpt2 -- your_model_file_path.gguf --prompt "How do I bake a cake?"
Note that this will run both the gpu version and cpu version of the transformer.
cargo run -p wgml --example llama2 -- your_model_file_path.gguf
Note that this will run both the cpu version and gpu version of the transformer.