wgml

Crates.io	wgml
lib.rs	wgml
version	0.2.0
created_at	2024-10-16 13:42:10.891873+00
updated_at	2024-11-24 16:59:15.509686+00
description	Cross-platform GPU LLM inference.
homepage	https://wgmath.rs
repository	https://github.com/dimforge/wgmath
max_upload_size
id	1411819
size	229,382

MACs (github:rustcrypto:macs)

documentation

README

wgml: cross-platform GPU LLM inference

/!\ This library is still under heavy development and is still missing many features.

The goal of wgml is to provide composable WGSl shaders and kernels for cross-platform GPU LLM inference.

Running the models

Currently, the gpt2 and llama2 models are implemented. They can be loaded from gguf files. Support of quantization is very limited (tensors are systematically unquantized upon loading) and somewhat untested. A very basic execution of these LLMs can be run from the examples.

Running GPT-2

cargo run -p wgml --example gpt2 -- your_model_file_path.gguf --prompt "How do I bake a cake?"

Note that this will run both the gpu version and cpu version of the transformer.

Running llama-2

cargo run -p wgml --example llama2 -- your_model_file_path.gguf

Note that this will run both the cpu version and gpu version of the transformer.

Commit count: 27