Crates.io | drama_llama |
lib.rs | drama_llama |
version | 0.5.2 |
source | src |
created_at | 2024-04-20 20:13:24.956307 |
updated_at | 2024-06-19 20:30:36.65483 |
description | A library for language modeling and text generation. |
homepage | |
repository | https://github.com/mdegans/drama_llama |
max_upload_size | |
id | 1214837 |
size | 3,085,489 |
drama_llama
drama_llama
is yet another Rust wrapper for llama.cpp
. It is a work in progress and not intended for production use. The API will change.
For examples, see the bin
folder. There are two example binaries.
cuda
and cuda_f16
features.rustfmt
.SampleOptions
. mode
will
become modes
and applied one after another until only a single
Candidate token remains.Modelfile
.llama.cpp
style)llama.cpp
does not seem to manage a longest prefix cache automatically, so one will have to be written.llama.cpp
(eg. MLC, TensorRT-LLM, Ollama)--vocab unsafe
must be
passed as a command line argument or VocabKind::Unsafe
used for an Engine
constructor.mmap
is used, on subsequent process
launches, the model should already be cached by the OS.docs.rs
because llama.cpp
's CMakeLists.txt
generates code, and writing to the filesystem is not supported. For the moment
use cargo doc --open
instead. Others have fixed this by patching
llama.cpp
in their bindings, but I'm not sure I want to do that for now.