datasphere

Crates.iodatasphere
lib.rsdatasphere
version0.2.0
created_at2026-01-07 21:39:14.722595+00
updated_at2026-01-10 00:13:29.492127+00
descriptionBackground daemon that distills knowledge from Claude Code sessions into a searchable graph
homepage
repositoryhttps://github.com/cloud-atlas-ai/datasphere
max_upload_size
id2029057
size462,607
Drazen Urch (durch)

documentation

README

Datasphere

Named after the AI knowledge network in Dan Simmons' Hyperion Cantos — a vast repository where all information exists and can be accessed.

Background daemon that distills knowledge from Claude Code sessions into a queryable knowledge graph.

What It Does

Datasphere watches your Claude Code sessions, extracts insights via LLM distillation, and embeds them for semantic search. Think of it as long-term memory for your AI coding sessions.

┌─────────────────┐     ┌──────────┐     ┌─────────┐     ┌─────────┐
│ Session JSONL   │────▶│ Distill  │────▶│  Embed  │────▶│ LanceDB │
│ (~/.claude/     │     │  (LLM)   │     │         │     │ (nodes) │
│   projects/)    │     └──────────┘     └─────────┘     └─────────┘
└─────────────────┘

Installation

# From crates.io
cargo install datasphere

# Or from source
cargo install --path .

# Verify
ds --help

Quick Start

# One-shot scan of current project's sessions
ds scan

# Start daemon (watches ALL projects continuously)
ds start

# Query the knowledge graph
ds query "how to chunk large texts"

# Check database stats
ds stats

CLI Commands

Command Description
ds scan One-shot distillation of sessions (current project)
ds start Daemon mode - watches all projects, queues jobs
ds query <text> Semantic search of knowledge graph
ds queue Show job queue status
ds queue pending List pending jobs
ds queue clear Clear completed jobs
ds queue nuke Delete all jobs
ds stats Show database statistics
ds show Display stored nodes
ds add <file> Add a text file to the graph (no LLM distillation)
ds related <id> Find nodes similar to a given node
ds reset Delete database and queue (fresh start)

MCP Server

The mcp/ directory contains a minimal MCP server for Claude Code integration:

# Install dependencies
cd mcp && npm install

# Add to Claude Code
claude mcp add datasphere -s user -- node /path/to/datasphere/mcp/index.js

This exposes datasphere_query and datasphere_related tools that Claude can use to search your knowledge graph during conversations.

How It Works

1. Watching

The daemon watches ~/.claude/projects/ for changes to session transcript files (.jsonl). Events are queued and processed one at a time with rate limiting.

2. Distillation

Each session is chunked and sent to an LLM for knowledge extraction. The LLM identifies:

  • Key insights and learnings
  • Patterns and approaches
  • Decisions and their rationale

3. Embedding

Extracted insights are embedded (1536 dimensions) for semantic search.

4. Change Detection

SimHash is used to detect meaningful changes. Sessions are only re-processed when content changes significantly (Hamming distance > 10 bits).

Storage

~/.datasphere/
├── db/                    # LanceDB database
│   ├── nodes.lance/       # Knowledge nodes with embeddings
│   └── processed.lance/   # Processing records (deduplication)
└── queue.jsonl            # Persistent job queue

Environment Variables

Variable Description
OPENAI_API_KEY Required for embeddings
ANTHROPIC_API_KEY Required for distillation (or use claude CLI auth)

Configuration

Currently configuration is via code constants:

  • Job processing delay: 500ms
  • SimHash change threshold: 10 bits

Development

cargo build              # Dev build
cargo test               # Run tests
cargo build --release    # Release build

Architecture

See CLAUDE.md for detailed module structure and design principles.

Commit count: 61

cargo fmt