mecha10-nodes-llm-command

Crates.io	mecha10-nodes-llm-command
lib.rs	mecha10-nodes-llm-command
version	0.1.39
created_at	2025-11-25 02:47:51.368456+00
updated_at	2026-01-09 17:09:03.337274+00
description	Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama)
homepage
repository	https://github.com/mecha-industries/mecha10
max_upload_size
id	1949107
size	120,927

Peter C (PeterChauYEG)

documentation

README

LLM Command Node

Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama) with dashboard control.

Quick Start

Copy .env.example to .env and add your API key:

cp .env.example .env
# Edit .env and set: OPENAI_API_KEY=sk-...

Start development mode:
```
mecha10 dev
```
Open the dashboard at http://localhost:3000/dashboard/robot-control
Send commands via the AI Command Control panel!

Overview

The LLM Command node allows users to control robots using natural language commands. It leverages large language models (LLMs) to parse commands and convert them into structured actions.

Features

Multi-Provider Support: OpenAI, Claude (Anthropic), and local Ollama
Command Parsing: Converts natural language into structured robot actions
Action Routing: Publishes to appropriate topics (motor commands, navigation goals, behaviors)
Vision Queries: Uses object detection data to answer "what do you see?" questions
Behavior Interruption: Automatically pauses autonomous behaviors when user commands are issued
Auto-Resume: Configurable automatic resumption of behaviors after timeout
Dashboard Integration: Real-time command input and response display
Error Handling: Clear error messages and timeout handling

Configuration

The node is configured via configs/*/llm-command.toml (or through mecha10.json):

# LLM Provider Configuration
provider = "openai"  # Options: "openai", "claude", "local"
model = "gpt-4o-mini"
temperature = 0.7
max_tokens = 500
vision_enabled = false

# Topic Configuration
[topics]
command_in = "/ai/command"
response_out = "/ai/response"
camera_in = "/robot/sensors/camera/rgb"
nav_goal_out = "/nav/goal"
motor_cmd_out = "/motor/cmd_vel"
behavior_out = "/behavior/execute"

# Behavior Interrupt Configuration
[behavior_interrupt]
enabled = true
mode = "interrupt_with_auto_resume"  # Options: "disabled", "interrupt_only", "interrupt_with_auto_resume"
timeout_secs = 30  # Auto-resume timeout (for interrupt_with_auto_resume mode)
await_completion = false
control_topic = "/behavior/control"

Behavior Interrupt Configuration

When the LLM issues motor or navigation commands, it can automatically interrupt autonomous behaviors:

enabled: Enable/disable behavior interruption (default: true)
mode: Interrupt behavior (options below):
- "disabled": Never interrupt behavior tree
- "interrupt_only": Interrupt but don't auto-resume (manual resume required)
- "interrupt_with_auto_resume": Interrupt and automatically resume after timeout
timeout_secs: Seconds before auto-resume (default: 30)
await_completion: Wait for command completion before resuming (not yet implemented)
control_topic: Topic for behavior control commands (default: "/behavior/control")

Environment Variables

Recommended: Use .env file in your project root

Copy .env.example to .env and add your API keys:

# Copy the example file
cp .env.example .env

# Edit .env and add your API key
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

The .env file is automatically loaded by mecha10 dev and passed to all nodes.

Alternative: Set environment variables directly

# For OpenAI
export OPENAI_API_KEY="sk-..."

# For Claude
export ANTHROPIC_API_KEY="sk-ant-..."

# For local Ollama (no key needed)
# Ensure Ollama is running on localhost:11434

Topics

Input Topics

/ai/command (CommandMessage): Natural language command from user

{
  "text": "move forward",
  "timestamp": 1234567890,
  "user_id": "optional_user_id"
}

Output Topics

/ai/response (ResponseMessage): LLM response with action feedback

{
  "text": "Moving the robot forward",
  "timestamp": 1234567890,
  "action_taken": true,
  "error": null
}

/motor/cmd_vel (MotorCommand): Motor velocity commands

{
  "linear": 0.5,
  "angular": 0.0,
  "timestamp": 1234567890
}

/nav/goal (NavigationGoal): Navigation waypoint goals

{
  "x": 5.0,
  "y": 3.0,
  "theta": 0.0,
  "timestamp": 1234567890
}

/behavior/execute (BehaviorCommand): Behavior execution commands

{
  "name": "follow_person",
  "params": null,
  "timestamp": 1234567890
}

Command Examples

Motor Commands

"move forward" → {"action": "motor", "linear": 0.5, "angular": 0.0}
"turn left" → {"action": "motor", "linear": 0.0, "angular": 0.5}
"stop" → {"action": "motor", "linear": 0.0, "angular": 0.0}

Navigation Commands

"go to x:5 y:3" → {"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}
"move to the door" → Extracts coordinates and navigates

Behavior Commands

"follow that person" → {"action": "behavior", "name": "follow_person"}
"patrol the area" → {"action": "behavior", "name": "patrol"}

Vision Queries

The node subscribes to /vision/detections from the object-detector node and uses this data to answer vision questions:

"what do you see?" → "I see a person (95% confidence) and a car (87% confidence)"
"is there a person in front of me?" → "Yes, I detect 1 person with 95% confidence"
"how many cars?" → "I see 2 cars: car (87% confidence) and car (82% confidence)"
"describe what's visible" → Natural language description based on detections

How it works:

Object detector node continuously publishes detections to /vision/detections
LLM command node stores the latest detections
When a vision query is detected, detections are formatted as text context
LLM analyzes the detections and provides a natural language response

Benefits over vision APIs:

✅ Much cheaper - No image tokens, just structured detection data
✅ Faster - No need to encode/send images
✅ More accurate - Uses specialized YOLO model for detection

Behavior Interruption

The LLM command node intelligently manages the interaction between user commands and autonomous behaviors:

How It Works

Automatic Interruption: When the LLM parses a motor or navigation command, it interrupts the behavior tree
User Priority: Direct user commands always take priority over autonomous behaviors
Auto-Resume: After a timeout (configurable), the behavior tree automatically resumes
Manual Resume: Users can manually re-enable behaviors via the dashboard

Interrupt Modes

Disabled (mode = "disabled")

Behavior tree is never interrupted by LLM commands
User commands may be overridden by autonomous behaviors
Use when you want autonomous behaviors to have priority

Interrupt Only (mode = "interrupt_only")

Behavior tree is paused when motor/navigation commands are issued
No automatic resumption - requires manual re-enable from dashboard
Use when you want explicit control over behavior resumption

Interrupt with Auto-Resume (mode = "interrupt_with_auto_resume")

Behavior tree is paused when motor/navigation commands are issued
Automatically resumes after timeout_secs (default: 30s)
Use for seamless switching between manual and autonomous control

Example Scenario

1. Robot is running "patrol" behavior (autonomous)
2. User says: "stop" via LLM command
   → Behavior tree is interrupted
   → Motor command published: {linear: 0.0, angular: 0.0}
3. Robot stops and remains idle
4. After 30 seconds (timeout):
   → Behavior tree automatically resumes
   → Robot continues patrolling

Control Messages

The system uses enhanced BehaviorControl messages:

{
  "action": "interrupt",
  "source": "llm-command",
  "duration_secs": 30,
  "timestamp": 1234567890
}

Actions:

interrupt: Pause behavior tree (from LLM command)
resume: Resume behavior tree (manual or auto)
enable: Enable behavior tree (from dashboard)
disable: Disable behavior tree (from dashboard)

System Prompt

The default system prompt guides the LLM to parse commands into structured JSON actions:

You are a helpful robot assistant. Parse user commands and respond with structured actions.

For navigation commands (e.g., "go to the door", "move to coordinates"), extract the goal and respond with JSON:
{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}

For motor commands (e.g., "move forward", "turn left", "stop"), respond with JSON:
{"action": "motor", "linear": 0.5, "angular": 0.0}

For behavior commands (e.g., "follow that person", "patrol the area"), respond with JSON:
{"action": "behavior", "name": "follow_person"}

For vision queries (e.g., "what do you see?"), describe what's visible in the camera feed.

For general questions, respond conversationally.

You can customize this prompt in the configuration.

Dashboard Integration

The dashboard provides a user-friendly interface for:

Command Input: Text field for natural language commands
Command History: Shows past commands with status indicators
Response Display: Shows LLM responses and action feedback
Status Badges: Connection status and processing indicators

Access the dashboard at http://localhost:3000/dashboard/robot-control

Architecture

┌─────────────────┐
│   Dashboard UI  │
│ (Command Input) │
└────────┬────────┘
         │ publishes
         ▼
    /ai/command
         │
         ▼
┌──────────────────┐
│ OpenAI Reasoning │
│      Node        │
│                  │
│  ┌────────────┐  │
│  │ LlmNode    │  │
│  │ (mecha10-  │  │
│  │  ai-llm)   │  │
│  └────────────┘  │
│         │        │
│    Parse JSON    │
│         │        │
└─────────┼────────┘
          │
    ┌─────┴─────┬──────────────┬───────────────┐
    ▼           ▼              ▼               ▼
/motor/cmd_vel /nav/goal  /behavior/execute /ai/response
    │           │              │               │
    ▼           ▼              ▼               ▼
┌────────┐ ┌──────────┐ ┌─────────────┐ ┌─────────┐
│ Motor  │ │Navigation│ │  Behavior   │ │Dashboard│
│ Driver │ │  Stack   │ │  Executor   │ │   UI    │
└────────┘ └──────────┘ └─────────────┘ └─────────┘

Dependencies

mecha10-core: Framework core (Context, Topic, Message)
mecha10-ai-llm: LLM integration library (providers, LlmNode)
tokio: Async runtime
serde/serde_json: Serialization
anyhow: Error handling
reqwest: HTTP client (for API calls)

Running

The node is launched automatically by mecha10 dev when included in mecha10.json.

To run manually:

cargo run -p mecha10-nodes-llm-command

Testing

Test the node with simulation:

Start control plane and simulation:
```
docker compose up -d
mecha10 dev
```

Send a test command via dashboard or Redis CLI:

redis-cli PUBLISH "/ai/command" '{"text":"move forward","timestamp":1234567890}'

Subscribe to response topic:

redis-cli SUBSCRIBE "/ai/response"
redis-cli SUBSCRIBE "/motor/cmd_vel"

Limitations

Vision queries not yet supported: Camera frame integration pending
No conversation context: Each command is processed independently
API rate limits: Subject to provider rate limits (OpenAI, Claude)
Network latency: Response time depends on LLM API latency

Future Enhancements

Vision query support (integrate camera feed)
Conversation context (multi-turn dialogue)
Voice input integration
Command validation and safety checks
Multi-language support
Offline fallback mode

License

MIT

Commit count: 0