Amsterdam Prompt Gateway
======================

## Features:

- Routing: route requests to suitable LLM providers, (possibly using something like https://github.com/lm-sys/RouteLLM)
- Monitoring:
    - request / response token usage
    - latency, failure rates
    - allow clients to give feedback on results to track quality
- Tracking: group requests by templates / tags and threads
- Modifying: allow requests to specify what parts of a prompt are variable, store the templates in a database, and experiment with different templates.


## Request API:

The gateway has endpoints that mimic LLM API endpoints, but with additional fields to support the features above.

For example for the OpenAI API the request body would look like this:

```json
"messages": [
      {
        "role": "user",
        "content": "Hello Tinco!"
        "template": "Hello {{ name }}!",
        "template_id": "bla_template-v1.2321beta5",
        "variables": {
          "name": "Tinco"
        }
      }
],
"agent_id": "greeting-agent-v1.1231beta5",
"run_id": "abcdef123",
"request_parent_id": "abcdef123",
"request_id": "abcdef123"
```

### Modifications:

- variables + template / template_id: Passing these along alows the gateway to override the default prompt with alternatives to the prompt. It would be possible to drop the content property if the template and variables are given, but maybe it’s nice to keep compatibility with the openai protocol this way by only adding fields to it. Having a `template_id` allows us to easily group requests.
- run_id, request_parent_id, request_id: These fields allow us to establish a context to the requests and identify it uniquely.
- agent_id: This field allows us to group requests based on what agent is being run.

### Endpoints

Gateway endpoints start with `/v<version>/<provider>`, for example `/v1/openai/v1/chat/completions`. To ensure
compatibility, the requests are proxied to the provider as-is, with the additional fields stripped off.