| Crates.io | file_to_json |
| lib.rs | file_to_json |
| version | 0.1.6 |
| created_at | 2025-11-08 16:16:42.437533+00 |
| updated_at | 2025-11-25 01:15:16.310614+00 |
| description | Convert arbitrary text-based files into JSON using local parsers and an OpenRouter-powered fallback. |
| homepage | |
| repository | https://github.com/tomtang/file_to_json |
| max_upload_size | |
| id | 1923019 |
| size | 8,358,921 |
file_to_json is a Rust library that converts arbitrary text-based files into JSON. It understands a set of common structured formats locally (CSV, JSON, YAML, TOML) and falls back to an OpenRouter-hosted LLM for any formats it does not recognise.
anthropic/claude-3.7-sonnet).serde_json::Value.Add the crate to your project:
cargo add file_to_json --git https://github.com/your-org/file_to_json
(Replace the repository URL with where you host the crate.)
This repository uses Git LFS to manage large example files. After cloning, you'll need to:
brew install git-lfs (macOS) or see git-lfs.github.comgit lfs installgit lfs pullSee examples/README.md for more details.
use file_to_json::{Converter, FallbackStrategy, OpenRouterConfig};
use std::time::Duration;
fn main() -> Result<(), file_to_json::ConvertError> {
let config = OpenRouterConfig {
api_key: "sk-or-...".to_string(),
model: "anthropic/claude-3.7-sonnet".to_string(),
timeout: Duration::from_secs(60),
fallback_strategy: FallbackStrategy::Chunked,
vision_model: Some("anthropic/claude-3.7-sonnet".to_string()),
max_image_bytes: 5 * 1024 * 1024, // 5 MiB
};
let converter = Converter::new(config)?;
let value = converter.convert_path("data/sample.csv")?;
println!("{}", serde_json::to_string_pretty(&value)?);
Ok(())
}
The OpenRouterConfig struct accepts the following fields:
api_key – required. Your OpenRouter API key.model – optional. Defaults to anthropic/claude-3.7-sonnet.timeout – optional. Request timeout duration. Defaults to 60 seconds.fallback_strategy – optional. FallbackStrategy::Chunked (default) or FallbackStrategy::CodeGeneration.vision_model – optional. Defaults to anthropic/claude-3.5-sonnet. Must support image inputs and JSON output.max_image_bytes – optional. Maximum size (bytes) of image payloads; defaults to 5242880 (5 MiB).summary, tags, objects, dominant_colors, and confidence.chunked (default): splits the input into ≤128 KiB segments, converts each chunk, and merges the returned JSON (arrays concatenated, objects shallow-merged, mixed types wrapped in an array). Works best when each chunk shares a compatible structure.code: sends the first/middle/last 10 lines to the model, asks for Python 3 code that can parse the full file, writes that code to a temporary script, and executes it locally to produce JSON (requires python3 on the PATH).serde_json::Value.Binary files are rejected unless they are supported images (handled by the vision model), can be converted to UTF-8 text (e.g. PDFs via the built-in extractor), or can be handled by the code-generation fallback.
Running the bundled example on a JPEG:
cargo run --example convert -- ./examples/data/einstein.jpg <API_KEY>
produces structured JSON similar to:
{
"summary": "A black and white portrait of an elderly person with wild white hair.",
"tags": ["portrait", "black and white", "historical"],
"objects": ["face", "hair", "jacket"],
"dominant_colors": ["black", "white", "grey"],
"confidence": 0.98
}
cargo test
This project is distributed under the terms of the MIT license.