yammer
======

Yammer provides asynchronous bindings to the Ollama API and the following CLI tools:

- `shellm` pass a file (or stdin if no file) to the generate endpoint and stream the result.
- `oneshot` open a temporary file in an editor to be passed to the generate endpoint; stream the
  result.
- `prompt` pass a prompt to the generate endpoint and stream the result.
- `chat` chat with a model using the chat endpoint.
- `chats` manage chat sessions.

Installation
------------

```sh
$ cargo install yammer
```

Usage
-----

The shellm tool multiplexes files over a model:

```sh
$ shellm --model llama3.2:3b << EOF
Why is the sky red?
EOF
I'm sorry.  The sky is not red.
$ shellm --model llama3.2:3b foo bar
Response to foo...
Response to bar...
```

The oneshot tool is conceptually the same as editing a temporary file and passing it to shellm:

```sh
$ oneshot llama3.2:3b gemma2
Opens $EDITOR with a temporary file.  Write your prompt and save the file.
Output of llama3.2:3b...
Output of gemma2....
```

The prompt tool is similar to shellm but takes prompts on the command line rather than files:

```sh
$ prompt llama3.2:3b "Why is the sky red?"
I'm sorry.  The sky is not red.
```

The chat command is used to chat with a model:

```sh
$ chat
>>> Why is the sky red?
The sky often appears red at sunrise and sunset. ...
>>> :edit
>>> :model llama3.2:3b
>>> :retry
The sky often appears red at sunrise and sunset due to Rayleigh scattering. ....
>>> :param --num-ctx 4096
>>> :exit
```

The chats command is used to manage chat sessions:

```sh
$ chats
recent:
2024-12-01T18:26 FP8MC gemma2              Why is the sky red?
2024-12-01T17:34 H5HMV llama3.2:3b         Hi there!  Tell me about first and follow sets for parsers.
> pin FP8MC
> status
pinned:
2024-12-01T18:29 FP8MC gemma2              Why is the sky red?

recent:
2024-12-01T17:34 H5HMV llama3.2:3b         Hi there!  Tell me about first and follow sets for parsers.
> archive H5HMV
> status
pinned:
2024-12-01T18:29 FP8MC gemma2              Why is the sky red?
> chat FP8MC
>>> Why is the sky red?
The sky often appears red at sunrise and sunset. ...
>>> exit
> new "Act like Mario, the video game character."
>>> Hi!
Hiya!  It'sa me, Mario!
>>> exit
> exit
```

Help
----

### shellm

```sh
$ shellm --help
USAGE: shellm [OPTIONS] [FILE]

Options:
    -h, -help           Print this help menu.
        -ollama-host    The host to connect to.
        -model          The model to use from the ollama library.
        -suffix         The suffix to append to the response.
        -system         The system to use in the template.
        -template       The template to use for the prompt.
        -json           Format the response in JSON. You must also ask the
                        model to do so.
        -raw            Whether to pass bypass formatting of the prompt.
        -keep-alive     Duration to keep the model in memory for after the
                        call.
        -param-mirostat
                        Enable Mirostat sampling for controlling perplexity.
                        (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
                        2.0)
        -param-mirostat-eta
                        Influences how quickly the algorithm responds to
                        feedback from the generated text.
        -param-mirostat-tau
                        Controls the balance between coherence and diversity
                        of the output.
        -param-num-ctx  The number of tokens worth of context to allocate.
        -param-repeat-last-n
                        Sets how far back for the model to look back to
                        prevent repetition.
        -param-repeat-penalty
                        Sets how strongly to penalize repetitions.
        -param-temperature
                        The temperature of the model.
        -param-seed     Sets the random number seed to use for generation.
        -param-tfs-z    Tail free sampling is used to reduce the impact of
                        less probable tokens from the output.
        -param-num-predict
                        Maximum number of tokens to predict when generating
                        text.
        -param-top-k    Reduces the probability of generating nonsense.
        -param-top-p    Works together with top-k.
        -param-min-p    Alternative to the top_p, and aims to ensure a balance
                        of quality and variety.
```

### oneshot

```sh
$ oneshot --help
USAGE: oneshot [OPTIONS] [MODEL]

Options:
    -h, -help           Print this help menu.
        -ollama-host    The host to connect to.
        -suffix         The suffix to append to the response.
        -system         The system to use in the template.
        -template       The template to use for the prompt.
        -json           Format the response in JSON. You must also ask the
                        model to do so.
        -raw            Whether to pass bypass formatting of the prompt.
        -keep-alive     Duration to keep the model in memory for after the
                        call.
        -param-mirostat
                        Enable Mirostat sampling for controlling perplexity.
                        (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
                        2.0)
        -param-mirostat-eta
                        Influences how quickly the algorithm responds to
                        feedback from the generated text.
        -param-mirostat-tau
                        Controls the balance between coherence and diversity
                        of the output.
        -param-num-ctx  The number of tokens worth of context to allocate.
        -param-repeat-last-n
                        Sets how far back for the model to look back to
                        prevent repetition.
        -param-repeat-penalty
                        Sets how strongly to penalize repetitions.
        -param-temperature
                        The temperature of the model.
        -param-seed     Sets the random number seed to use for generation.
        -param-tfs-z    Tail free sampling is used to reduce the impact of
                        less probable tokens from the output.
        -param-num-predict
                        Maximum number of tokens to predict when generating
                        text.
        -param-top-k    Reduces the probability of generating nonsense.
        -param-top-p    Works together with top-k.
        -param-min-p    Alternative to the top_p, and aims to ensure a balance
                        of quality and variety.
```

### prompt

```sh
$ prompt --help
USAGE: prompt [OPTIONS] [PROMPT]

Options:
    -h, -help           Print this help menu.
        -ollama-host    The host to connect to.
        -model          The model to use from the ollama library.
        -suffix         The suffix to append to the response.
        -system         The system to use in the template.
        -template       The template to use for the prompt.
        -json           Format the response in JSON. You must also ask the
                        model to do so.
        -raw            Whether to pass bypass formatting of the prompt.
        -keep-alive     Duration to keep the model in memory for after the
                        call.
        -param-mirostat
                        Enable Mirostat sampling for controlling perplexity.
                        (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
                        2.0)
        -param-mirostat-eta
                        Influences how quickly the algorithm responds to
                        feedback from the generated text.
        -param-mirostat-tau
                        Controls the balance between coherence and diversity
                        of the output.
        -param-num-ctx  The number of tokens worth of context to allocate.
        -param-repeat-last-n
                        Sets how far back for the model to look back to
                        prevent repetition.
        -param-repeat-penalty
                        Sets how strongly to penalize repetitions.
        -param-temperature
                        The temperature of the model.
        -param-seed     Sets the random number seed to use for generation.
        -param-tfs-z    Tail free sampling is used to reduce the impact of
                        less probable tokens from the output.
        -param-num-predict
                        Maximum number of tokens to predict when generating
                        text.
        -param-top-k    Reduces the probability of generating nonsense.
        -param-top-p    Works together with top-k.
        -param-min-p    Alternative to the top_p, and aims to ensure a balance
                        of quality and variety.
```

### chat

```sh
$ chat --help
USAGE: chat [OPTIONS]

Options:
    -h, -help           Print this help menu.
        -ollama-host    The host to connect to.
        -model          The model to use from the ollama library.
        -keep-alive     Duration to keep the model in memory for after the
                        call.
        -param-mirostat
                        Enable Mirostat sampling for controlling perplexity.
                        (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat
                        2.0)
        -param-mirostat-eta
                        Influences how quickly the algorithm responds to
                        feedback from the generated text.
        -param-mirostat-tau
                        Controls the balance between coherence and diversity
                        of the output.
        -param-num-ctx  The number of tokens worth of context to allocate.
        -param-repeat-last-n
                        Sets how far back for the model to look back to
                        prevent repetition.
        -param-repeat-penalty
                        Sets how strongly to penalize repetitions.
        -param-temperature
                        The temperature of the model.
        -param-seed     Sets the random number seed to use for generation.
        -param-tfs-z    Tail free sampling is used to reduce the impact of
                        less probable tokens from the output.
        -param-num-predict
                        Maximum number of tokens to predict when generating
                        text.
        -param-top-k    Reduces the probability of generating nonsense.
        -param-top-p    Works together with top-k.
        -param-min-p    Alternative to the top_p, and aims to ensure a balance
                        of quality and variety.
```

### chats

```sh
$ chats
> help
chats
=====

Commands:

status      Show the status of all chats.
archive     Archive a chat.
unarchive   Unarchive a chat.
archived    Show all archived chats.
pin         Pin a chat.
unpin       Unpin a chat.
pinned      Show all pinned chats.
new         Start a new chat.
chat        Continue a chat.
editor      Start a chat with a system message written in EDITOR.
```

Status
------

Active development.

Documentation
-------------

The latest documentation is always available at [docs.rs](https://docs.rs/yammer/latest/yammer/).