| Crates.io | drama_llama |
| lib.rs | drama_llama |
| version | 0.5.2 |
| created_at | 2024-04-20 20:13:24.956307+00 |
| updated_at | 2024-06-19 20:30:36.65483+00 |
| description | A library for language modeling and text generation. |
| homepage | |
| repository | https://github.com/mdegans/drama_llama |
| max_upload_size | |
| id | 1214837 |
| size | 3,085,489 |
drama_llamadrama_llama is yet another Rust wrapper for llama.cpp. It is a work in progress and not intended for production use. The API will change.
For examples, see the bin folder. There are two example binaries.
cuda and cuda_f16 features.rustfmt.SampleOptions. mode will
become modes and applied one after another until only a single
Candidate token remains.Modelfile.llama.cpp style)llama.cpp does not seem to manage a longest prefix cache automatically, so one will have to be written.llama.cpp (eg. MLC, TensorRT-LLM, Ollama)--vocab unsafe must be
passed as a command line argument or VocabKind::Unsafe used for an Engine
constructor.mmap is used, on subsequent process
launches, the model should already be cached by the OS.docs.rs because llama.cpp's CMakeLists.txt
generates code, and writing to the filesystem is not supported. For the moment
use cargo doc --open instead. Others have fixed this by patching
llama.cpp in their bindings, but I'm not sure I want to do that for now.