# Together AI Provider Integration ## Overview The Together AI provider enables access to multiple open-source models through Together's inference infrastructure. ## Base Configuration Refer to the provider implementation in `src/providers/together.rs`. ## Configuration ### Request Headers ```bash Authorization: Bearer tok_... x-provider: together Content-Type: application/json ``` ## Supported Models ### Open Source Models - Llama 2 (7B, 13B, 70B) - Mixtral-8x7B - CodeLlama - Stable LM - Yi Series - Qwen Series For a complete list of models, refer to the [Together AI Dedicated Models](https://docs.together.ai/docs/dedicated-models). ## API Endpoints ### Chat Completions ```bash curl -X POST http://localhost:3000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "x-provider: together" \ -H "Authorization: Bearer $TOGETHER_API_KEY" \ -d '{ "model": "togethercomputer/llama-2-70b-chat", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ### Streaming ```bash curl -X POST http://localhost:3000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "x-provider: together" \ -H "Authorization: Bearer $TOGETHER_API_KEY" \ -d '{ "model": "togethercomputer/llama-2-70b-chat", "messages": [{"role": "user", "content": "Hello!"}], "stream": true }' ``` ## SDK Integration ### Node.js (OpenAI SDK Compatible) ```typescript import OpenAI from 'openai'; const together = new OpenAI({ apiKey: process.env.TOGETHER_API_KEY, baseURL: "http://localhost:3000/v1/", defaultHeaders: { "x-provider": "together" } }); const response = await together.chat.completions.create({ model: "meta-llama/Meta-Llama-3-8B-Instruct-Turbo", messages: [{ role: "user", content: "What are some fun things to do in New York?" }] }); console.log(response.choices[0].message.content); ``` ## Error Handling | Error Code | Description | Solution | |------------|-------------|----------| | 401 | Invalid API key | Check your Together API key | | 429 | Rate limit exceeded | Implement backoff strategy | | 500 | Server error | Check server logs | ## Best Practices 1. **Model Selection** - Choose appropriate model size - Consider inference speed requirements - Balance cost vs performance 2. **Rate Limiting** - Implement exponential backoff - Monitor usage quotas - Use streaming for long responses 3. **Error Handling** - Implement retry logic - Handle timeouts gracefully - Log errors appropriately ## Additional Resources - [Together AI Documentation](https://docs.together.ai) - [Available Models](https://docs.together.ai/reference/models) - [Pricing Information](https://www.together.ai/pricing) ### Serverless Models #### Meta - **Llama 3.1 8B Instruct Turbo** API Model: `meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo` Context Length: 131072 Quantization: FP8 - **Llama 3.1 70B Instruct Turbo** API Model: `meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo` Context Length: 131072 Quantization: FP8 - **Llama 3.1 405B Instruct Turbo** API Model: `meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo` Context Length: 130815 Quantization: FP8 - **Llama 3 8B Instruct Turbo** API Model: `meta-llama/Meta-Llama-3-8B-Instruct-Turbo` Context Length: 8192 Quantization: FP8 - **Llama 3 70B Instruct Turbo** API Model: `meta-llama/Meta-Llama-3-70B-Instruct-Turbo` Context Length: 8192 Quantization: FP8 - **Llama 3.2 3B Instruct Turbo** API Model: `meta-llama/Llama-3.2-3B-Instruct-Turbo` Context Length: 131072 Quantization: FP16 - **Llama 3 8B Instruct Lite** API Model: `meta-llama/Meta-Llama-3-8B-Instruct-Lite` Context Length: 8192 Quantization: INT4 - **Llama 3 70B Instruct Lite** API Model: `meta-llama/Meta-Llama-3-70B-Instruct-Lite` Context Length: 8192 Quantization: INT4 - **Llama 3 8B Instruct Reference** API Model: `meta-llama/Llama-3-8b-chat-hf` Context Length: 8192 Quantization: FP16 - **Llama 3 70B Instruct Reference** API Model: `meta-llama/Llama-3-70b-chat-hf` Context Length: 8192 Quantization: FP16 #### Nvidia - **Llama 3.1 Nemotron 70B** API Model: `nvidia/Llama-3.1-Nemotron-70B-Instruct-HF` Context Length: 32768 Quantization: FP16 #### Qwen - **Qwen 2.5 Coder 32B Instruct** API Model: `Qwen/Qwen2.5-Coder-32B-Instruct` Context Length: 32769 Quantization: FP16 - **Qwen 2.5 7B Instruct Turbo** API Model: `Qwen/Qwen2.5-7B-Instruct-Turbo` Context Length: 32768 Quantization: FP8 - **Qwen 2.5 72B Instruct Turbo** API Model: `Qwen/Qwen2.5-72B-Instruct-Turbo` Context Length: 32768 Quantization: FP8 - **Qwen 2 Instruct (72B)** API Model: `Qwen/Qwen2-72B-Instruct` Context Length: 32768 Quantization: FP16 #### Microsoft - **WizardLM-2 8x22B** API Model: `microsoft/WizardLM-2-8x22B` Context Length: 65536 Quantization: FP16 #### Google - **Gemma 2 27B** API Model: `google/gemma-2-27b-it` Context Length: 8192 Quantization: FP16 - **Gemma 2 9B** API Model: `google/gemma-2-9b-it` Context Length: 8192 Quantization: FP16 - **Gemma Instruct (2B)** API Model: `google/gemma-2b-it` Context Length: 8192 Quantization: FP16 #### Databricks - **DBRX Instruct** API Model: `databricks/dbrx-instruct` Context Length: 32768 Quantization: FP16 #### DeepSeek - **DeepSeek LLM Chat (67B)** API Model: `deepseek-ai/deepseek-llm-67b-chat` Context Length: 4096 Quantization: FP16 #### Gryphe - **MythoMax-L2 (13B)** API Model: `Gryphe/MythoMax-L2-13b` Context Length: 4096 Quantization: FP16 #### Mistralai - **Mistral (7B) Instruct** API Model: `mistralai/Mistral-7B-Instruct-v0.1` Context Length: 8192 Quantization: FP16 - **Mistral (7B) Instruct v0.2** API Model: `mistralai/Mistral-7B-Instruct-v0.2` Context Length: 32768 Quantization: FP16 - **Mistral (7B) Instruct v0.3** API Model: `mistralai/Mistral-7B-Instruct-v0.3` Context Length: 32768 Quantization: FP16 - **Mixtral-8x7B Instruct (46.7B)** API Model: `mistralai/Mixtral-8x7B-Instruct-v0.1` Context Length: 32768 Quantization: FP16 - **Mixtral-8x22B Instruct (141B)** API Model: `mistralai/Mixtral-8x22B-Instruct-v0.1` Context Length: 65536 Quantization: FP16 #### NousResearch - **Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B)** API Model: `NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO` Context Length: 32768 Quantization: FP16 #### Together - **StripedHyena Nous (7B)** API Model: `togethercomputer/StripedHyena-Nous-7B` Context Length: 32768 Quantization: FP16 #### Upstage - **Upstage SOLAR Instruct v1 (11B)** API Model: `upstage/SOLAR-10.7B-Instruct-v1.0` Context Length: 4096 Quantization: FP16 ### Dedicated Models #### 01.AI - **01-ai Yi Chat (34B)** API Model: `zero-one-ai/Yi-34B-Chat` Context Length: 4096 Quantization: FP16 #### AllenAI - **OLMo Instruct (7B)** API Model: `allenai/OLMo-7B-Instruct` Context Length: 2048 Quantization: FP16 #### Austism - **Chronos Hermes (13B)** API Model: `Austism/chronos-hermes-13b` Context Length: 2048 Quantization: FP16 #### Carson - **carson ml318br** API Model: `carson/ml318br` Context Length: 8192 Quantization: FP16 #### Cognitive Computations - **Dolphin 2.5 Mixtral 8x7B** API Model: `cognitivecomputations/dolphin-2.5-mixtral-8x7b` Context Length: 32768 Quantization: FP16 #### Databricks - **DBRX Instruct** API Model: `databricks/dbrx-instruct` Context Length: 32768 Quantization: FP16 #### DeepSeek - **DeepSeek LLM Chat (67B)** API Model: `deepseek-ai/deepseek-llm-67b-chat` Context Length: 4096 Quantization: FP16 - **Deepseek Coder Instruct (33B)** API Model: `deepseek-ai/deepseek-coder-33b-instruct` Context Length: 16384 Quantization: FP16 #### Garage-bAInd - **Platypus2 Instruct (70B)** API Model: `garage-bAInd/Platypus2-70B-instruct` Context Length: 4096 Quantization: FP16 #### Google - **Gemma-2 Instruct (9B)** API Model: `google/gemma-2-9b-it` Context Length: 8192 Quantization: FP16 - **Gemma-2 Instruct (27B)** API Model: `google/gemma-2-27b-it` Context Length: 8192 Quantization: FP16 - **Gemma Instruct (2B)** API Model: `google/gemma-2b-it` Context Length: 8192 Quantization: FP16 - **Gemma Instruct (7B)** API Model: `google/gemma-7b-it` Context Length: 8192 Quantization: FP16 #### GradientAI - **Llama-3 70B Instruct Gradient 1048K** API Model: `gradientai/Llama-3-70B-Instruct-Gradient-1048k` Context Length: 1048576 Quantization: FP16 #### Gryphe - **MythoMax-L2 (13B)** API Model: `Gryphe/MythoMax-L2-13b` Context Length: 4096 Quantization: FP16 - **Gryphe MythoMax L2 Lite (13B)** API Model: `Gryphe/MythoMax-L2-13b-Lite` Context Length: 4096 Quantization: FP16 #### Haotian Liu - **LLaVa-Next (Mistral-7B)** API Model: `llava-hf/llava-v1.6-mistral-7b-hf` Context Length: 4096 Quantization: FP16 #### HuggingFace - **Zephyr-7B-ß** API Model: `HuggingFaceH4/zephyr-7b-beta` Context Length: 32768 Quantization: FP16 #### LM Sys - **Koala (7B)** API Model: `togethercomputer/Koala-7B` Context Length: 2048 Quantization: FP16 - **Vicuna v1.3 (7B)** API Model: `lmsys/vicuna-7b-v1.3` Context Length: 2048 Quantization: FP16 - **Vicuna v1.5 16K (13B)** API Model: `lmsys/vicuna-13b-v1.5-16k` Context Length: 16384 Quantization: FP16 - **Vicuna v1.5 (13B)** API Model: `lmsys/vicuna-13b-v1.5` Context Length: 4096 Quantization: FP16 #### Meta - **Code Llama Instruct (34B)** API Model: `codellama/CodeLlama-34b-Instruct-hf` Context Length: 16384 Quantization: FP16 - **Meta Llama 3.2 90B Vision Instruct Turbo** API Model: `meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo` Context Length: 131072 Quantization: FP16 - **Meta Llama 3.2 11B Vision Instruct Turbo** API Model: `meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo` Context Length: 131072 Quantization: FP16 #### Microsoft - **WizardLM-2 (8x22B)** API Model: `microsoft/WizardLM-2-8x22B` Context Length: 65536 Quantization: FP16 #### Mistralai - **Mistral (7B) Instruct v0.3** API Model: `mistralai/Mistral-7B-Instruct-v0.3` Context Length: 32768 Quantization: FP16 #### NousResearch - **Nous Hermes 2 - Mixtral 8x7B-DPO** API Model: `NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO` Context Length: 32768 Quantization: FP16 #### Qwen - **Qwen 2 Instruct (72B)** API Model: `Qwen/Qwen2-72B-Instruct` Context Length: 32768 Quantization: FP16 #### Upstage - **Upstage SOLAR Instruct v1 (11B)** API Model: `upstage/SOLAR-10.7B-Instruct-v1.0` Context Length: 4096 Quantization: FP16 #### WizardLM - **WizardLM v1.2 (13B)** API Model: `WizardLM/WizardLM-13B-V1.2` Context Length: 4096 Quantization: FP16