| Crates.io | datasphere |
| lib.rs | datasphere |
| version | 0.2.0 |
| created_at | 2026-01-07 21:39:14.722595+00 |
| updated_at | 2026-01-10 00:13:29.492127+00 |
| description | Background daemon that distills knowledge from Claude Code sessions into a searchable graph |
| homepage | |
| repository | https://github.com/cloud-atlas-ai/datasphere |
| max_upload_size | |
| id | 2029057 |
| size | 462,607 |
Named after the AI knowledge network in Dan Simmons' Hyperion Cantos — a vast repository where all information exists and can be accessed.
Background daemon that distills knowledge from Claude Code sessions into a queryable knowledge graph.
Datasphere watches your Claude Code sessions, extracts insights via LLM distillation, and embeds them for semantic search. Think of it as long-term memory for your AI coding sessions.
┌─────────────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
│ Session JSONL │────▶│ Distill │────▶│ Embed │────▶│ LanceDB │
│ (~/.claude/ │ │ (LLM) │ │ │ │ (nodes) │
│ projects/) │ └──────────┘ └─────────┘ └─────────┘
└─────────────────┘
# From crates.io
cargo install datasphere
# Or from source
cargo install --path .
# Verify
ds --help
# One-shot scan of current project's sessions
ds scan
# Start daemon (watches ALL projects continuously)
ds start
# Query the knowledge graph
ds query "how to chunk large texts"
# Check database stats
ds stats
| Command | Description |
|---|---|
ds scan |
One-shot distillation of sessions (current project) |
ds start |
Daemon mode - watches all projects, queues jobs |
ds query <text> |
Semantic search of knowledge graph |
ds queue |
Show job queue status |
ds queue pending |
List pending jobs |
ds queue clear |
Clear completed jobs |
ds queue nuke |
Delete all jobs |
ds stats |
Show database statistics |
ds show |
Display stored nodes |
ds add <file> |
Add a text file to the graph (no LLM distillation) |
ds related <id> |
Find nodes similar to a given node |
ds reset |
Delete database and queue (fresh start) |
The mcp/ directory contains a minimal MCP server for Claude Code integration:
# Install dependencies
cd mcp && npm install
# Add to Claude Code
claude mcp add datasphere -s user -- node /path/to/datasphere/mcp/index.js
This exposes datasphere_query and datasphere_related tools that Claude can use to search your knowledge graph during conversations.
The daemon watches ~/.claude/projects/ for changes to session transcript files (.jsonl). Events are queued and processed one at a time with rate limiting.
Each session is chunked and sent to an LLM for knowledge extraction. The LLM identifies:
Extracted insights are embedded (1536 dimensions) for semantic search.
SimHash is used to detect meaningful changes. Sessions are only re-processed when content changes significantly (Hamming distance > 10 bits).
~/.datasphere/
├── db/ # LanceDB database
│ ├── nodes.lance/ # Knowledge nodes with embeddings
│ └── processed.lance/ # Processing records (deduplication)
└── queue.jsonl # Persistent job queue
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Required for embeddings |
ANTHROPIC_API_KEY |
Required for distillation (or use claude CLI auth) |
Currently configuration is via code constants:
cargo build # Dev build
cargo test # Run tests
cargo build --release # Release build
See CLAUDE.md for detailed module structure and design principles.