wgml

Crates.iowgml
lib.rswgml
version0.2.0
sourcesrc
created_at2024-10-16 13:42:10.891873
updated_at2024-11-24 16:59:15.509686
descriptionCross-platform GPU LLM inference.
homepagehttps://wgmath.rs
repositoryhttps://github.com/dimforge/wgmath
max_upload_size
id1411819
size229,382
MACs (github:rustcrypto:macs)

documentation

README

wgml: cross-platform GPU LLM inference

/!\ This library is still under heavy development and is still missing many features.

The goal of wgml is to provide composable WGSl shaders and kernels for cross-platform GPU LLM inference.

Running the models

Currently, the gpt2 and llama2 models are implemented. They can be loaded from gguf files. Support of quantization is very limited (tensors are systematically unquantized upon loading) and somewhat untested. A very basic execution of these LLMs can be run from the examples.

Running GPT-2

cargo run -p wgml --example gpt2 -- your_model_file_path.gguf --prompt "How do I bake a cake?"

Note that this will run both the gpu version and cpu version of the transformer.

Running llama-2

cargo run -p wgml --example llama2 -- your_model_file_path.gguf

Note that this will run both the cpu version and gpu version of the transformer.

Commit count: 16

cargo fmt