mecha10-nodes-llm-command

Crates.iomecha10-nodes-llm-command
lib.rsmecha10-nodes-llm-command
version0.1.39
created_at2025-11-25 02:47:51.368456+00
updated_at2026-01-09 17:09:03.337274+00
descriptionNatural language command parsing via LLM APIs (OpenAI, Claude, Ollama)
homepage
repositoryhttps://github.com/mecha-industries/mecha10
max_upload_size
id1949107
size120,927
Peter C (PeterChauYEG)

documentation

README

LLM Command Node

Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama) with dashboard control.

Quick Start

  1. Copy .env.example to .env and add your API key:

    cp .env.example .env
    # Edit .env and set: OPENAI_API_KEY=sk-...
    
  2. Start development mode:

    mecha10 dev
    
  3. Open the dashboard at http://localhost:3000/dashboard/robot-control

  4. Send commands via the AI Command Control panel!

Overview

The LLM Command node allows users to control robots using natural language commands. It leverages large language models (LLMs) to parse commands and convert them into structured actions.

Features

  • Multi-Provider Support: OpenAI, Claude (Anthropic), and local Ollama
  • Command Parsing: Converts natural language into structured robot actions
  • Action Routing: Publishes to appropriate topics (motor commands, navigation goals, behaviors)
  • Vision Queries: Uses object detection data to answer "what do you see?" questions
  • Behavior Interruption: Automatically pauses autonomous behaviors when user commands are issued
  • Auto-Resume: Configurable automatic resumption of behaviors after timeout
  • Dashboard Integration: Real-time command input and response display
  • Error Handling: Clear error messages and timeout handling

Configuration

The node is configured via configs/*/llm-command.toml (or through mecha10.json):

# LLM Provider Configuration
provider = "openai"  # Options: "openai", "claude", "local"
model = "gpt-4o-mini"
temperature = 0.7
max_tokens = 500
vision_enabled = false

# Topic Configuration
[topics]
command_in = "/ai/command"
response_out = "/ai/response"
camera_in = "/robot/sensors/camera/rgb"
nav_goal_out = "/nav/goal"
motor_cmd_out = "/motor/cmd_vel"
behavior_out = "/behavior/execute"

# Behavior Interrupt Configuration
[behavior_interrupt]
enabled = true
mode = "interrupt_with_auto_resume"  # Options: "disabled", "interrupt_only", "interrupt_with_auto_resume"
timeout_secs = 30  # Auto-resume timeout (for interrupt_with_auto_resume mode)
await_completion = false
control_topic = "/behavior/control"

Behavior Interrupt Configuration

When the LLM issues motor or navigation commands, it can automatically interrupt autonomous behaviors:

  • enabled: Enable/disable behavior interruption (default: true)
  • mode: Interrupt behavior (options below):
    • "disabled": Never interrupt behavior tree
    • "interrupt_only": Interrupt but don't auto-resume (manual resume required)
    • "interrupt_with_auto_resume": Interrupt and automatically resume after timeout
  • timeout_secs: Seconds before auto-resume (default: 30)
  • await_completion: Wait for command completion before resuming (not yet implemented)
  • control_topic: Topic for behavior control commands (default: "/behavior/control")

Environment Variables

Recommended: Use .env file in your project root

Copy .env.example to .env and add your API keys:

# Copy the example file
cp .env.example .env

# Edit .env and add your API key
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

The .env file is automatically loaded by mecha10 dev and passed to all nodes.

Alternative: Set environment variables directly

# For OpenAI
export OPENAI_API_KEY="sk-..."

# For Claude
export ANTHROPIC_API_KEY="sk-ant-..."

# For local Ollama (no key needed)
# Ensure Ollama is running on localhost:11434

Topics

Input Topics

  • /ai/command (CommandMessage): Natural language command from user
    {
      "text": "move forward",
      "timestamp": 1234567890,
      "user_id": "optional_user_id"
    }
    

Output Topics

  • /ai/response (ResponseMessage): LLM response with action feedback

    {
      "text": "Moving the robot forward",
      "timestamp": 1234567890,
      "action_taken": true,
      "error": null
    }
    
  • /motor/cmd_vel (MotorCommand): Motor velocity commands

    {
      "linear": 0.5,
      "angular": 0.0,
      "timestamp": 1234567890
    }
    
  • /nav/goal (NavigationGoal): Navigation waypoint goals

    {
      "x": 5.0,
      "y": 3.0,
      "theta": 0.0,
      "timestamp": 1234567890
    }
    
  • /behavior/execute (BehaviorCommand): Behavior execution commands

    {
      "name": "follow_person",
      "params": null,
      "timestamp": 1234567890
    }
    

Command Examples

Motor Commands

  • "move forward"{"action": "motor", "linear": 0.5, "angular": 0.0}
  • "turn left"{"action": "motor", "linear": 0.0, "angular": 0.5}
  • "stop"{"action": "motor", "linear": 0.0, "angular": 0.0}

Navigation Commands

  • "go to x:5 y:3"{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}
  • "move to the door" → Extracts coordinates and navigates

Behavior Commands

  • "follow that person"{"action": "behavior", "name": "follow_person"}
  • "patrol the area"{"action": "behavior", "name": "patrol"}

Vision Queries

The node subscribes to /vision/detections from the object-detector node and uses this data to answer vision questions:

  • "what do you see?" → "I see a person (95% confidence) and a car (87% confidence)"
  • "is there a person in front of me?" → "Yes, I detect 1 person with 95% confidence"
  • "how many cars?" → "I see 2 cars: car (87% confidence) and car (82% confidence)"
  • "describe what's visible" → Natural language description based on detections

How it works:

  1. Object detector node continuously publishes detections to /vision/detections
  2. LLM command node stores the latest detections
  3. When a vision query is detected, detections are formatted as text context
  4. LLM analyzes the detections and provides a natural language response

Benefits over vision APIs:

  • Much cheaper - No image tokens, just structured detection data
  • Faster - No need to encode/send images
  • More accurate - Uses specialized YOLO model for detection

Behavior Interruption

The LLM command node intelligently manages the interaction between user commands and autonomous behaviors:

How It Works

  1. Automatic Interruption: When the LLM parses a motor or navigation command, it interrupts the behavior tree
  2. User Priority: Direct user commands always take priority over autonomous behaviors
  3. Auto-Resume: After a timeout (configurable), the behavior tree automatically resumes
  4. Manual Resume: Users can manually re-enable behaviors via the dashboard

Interrupt Modes

Disabled (mode = "disabled")

  • Behavior tree is never interrupted by LLM commands
  • User commands may be overridden by autonomous behaviors
  • Use when you want autonomous behaviors to have priority

Interrupt Only (mode = "interrupt_only")

  • Behavior tree is paused when motor/navigation commands are issued
  • No automatic resumption - requires manual re-enable from dashboard
  • Use when you want explicit control over behavior resumption

Interrupt with Auto-Resume (mode = "interrupt_with_auto_resume")

  • Behavior tree is paused when motor/navigation commands are issued
  • Automatically resumes after timeout_secs (default: 30s)
  • Use for seamless switching between manual and autonomous control

Example Scenario

1. Robot is running "patrol" behavior (autonomous)
2. User says: "stop" via LLM command
   → Behavior tree is interrupted
   → Motor command published: {linear: 0.0, angular: 0.0}
3. Robot stops and remains idle
4. After 30 seconds (timeout):
   → Behavior tree automatically resumes
   → Robot continues patrolling

Control Messages

The system uses enhanced BehaviorControl messages:

{
  "action": "interrupt",
  "source": "llm-command",
  "duration_secs": 30,
  "timestamp": 1234567890
}

Actions:

  • interrupt: Pause behavior tree (from LLM command)
  • resume: Resume behavior tree (manual or auto)
  • enable: Enable behavior tree (from dashboard)
  • disable: Disable behavior tree (from dashboard)

System Prompt

The default system prompt guides the LLM to parse commands into structured JSON actions:

You are a helpful robot assistant. Parse user commands and respond with structured actions.

For navigation commands (e.g., "go to the door", "move to coordinates"), extract the goal and respond with JSON:
{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}

For motor commands (e.g., "move forward", "turn left", "stop"), respond with JSON:
{"action": "motor", "linear": 0.5, "angular": 0.0}

For behavior commands (e.g., "follow that person", "patrol the area"), respond with JSON:
{"action": "behavior", "name": "follow_person"}

For vision queries (e.g., "what do you see?"), describe what's visible in the camera feed.

For general questions, respond conversationally.

You can customize this prompt in the configuration.

Dashboard Integration

The dashboard provides a user-friendly interface for:

  1. Command Input: Text field for natural language commands
  2. Command History: Shows past commands with status indicators
  3. Response Display: Shows LLM responses and action feedback
  4. Status Badges: Connection status and processing indicators

Access the dashboard at http://localhost:3000/dashboard/robot-control

Architecture

┌─────────────────┐
│   Dashboard UI  │
│ (Command Input) │
└────────┬────────┘
         │ publishes
         ▼
    /ai/command
         │
         ▼
┌──────────────────┐
│ OpenAI Reasoning │
│      Node        │
│                  │
│  ┌────────────┐  │
│  │ LlmNode    │  │
│  │ (mecha10-  │  │
│  │  ai-llm)   │  │
│  └────────────┘  │
│         │        │
│    Parse JSON    │
│         │        │
└─────────┼────────┘
          │
    ┌─────┴─────┬──────────────┬───────────────┐
    ▼           ▼              ▼               ▼
/motor/cmd_vel /nav/goal  /behavior/execute /ai/response
    │           │              │               │
    ▼           ▼              ▼               ▼
┌────────┐ ┌──────────┐ ┌─────────────┐ ┌─────────┐
│ Motor  │ │Navigation│ │  Behavior   │ │Dashboard│
│ Driver │ │  Stack   │ │  Executor   │ │   UI    │
└────────┘ └──────────┘ └─────────────┘ └─────────┘

Dependencies

  • mecha10-core: Framework core (Context, Topic, Message)
  • mecha10-ai-llm: LLM integration library (providers, LlmNode)
  • tokio: Async runtime
  • serde/serde_json: Serialization
  • anyhow: Error handling
  • reqwest: HTTP client (for API calls)

Running

The node is launched automatically by mecha10 dev when included in mecha10.json.

To run manually:

cargo run -p mecha10-nodes-llm-command

Testing

Test the node with simulation:

  1. Start control plane and simulation:

    docker compose up -d
    mecha10 dev
    
  2. Send a test command via dashboard or Redis CLI:

    redis-cli PUBLISH "/ai/command" '{"text":"move forward","timestamp":1234567890}'
    
  3. Subscribe to response topic:

    redis-cli SUBSCRIBE "/ai/response"
    redis-cli SUBSCRIBE "/motor/cmd_vel"
    

Limitations

  • Vision queries not yet supported: Camera frame integration pending
  • No conversation context: Each command is processed independently
  • API rate limits: Subject to provider rate limits (OpenAI, Claude)
  • Network latency: Response time depends on LLM API latency

Future Enhancements

  • Vision query support (integrate camera feed)
  • Conversation context (multi-turn dialogue)
  • Voice input integration
  • Command validation and safety checks
  • Multi-language support
  • Offline fallback mode

License

MIT

Commit count: 0

cargo fmt