change to: 'SDK runs inside each container'

2025-11-27 10:51:43 +01:00 · 2025-11-27 10:51:43 +01:00 · 03484cf5b7
commit 03484cf5b7
parent cc3148adc4
2 changed files with 230 additions and 159 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,71 @@
+# Agent Orchestra
+
+Multi-agent system with Claude agents communicating via Slack.
+Each agent runs inside its own Docker container with the Claude Agent SDK.
+
+## Quick Start
+```bash
+# Start all services
+docker-compose up -d
+
+# Or for local development:
+uv sync
+cp .env.example .env  # Fill in ANTHROPIC_API_KEY, Slack tokens, Gitea tokens
+uv run python -m orchestra.main
+```
+
+## Architecture
+
+- **Orchestrator**: Lightweight service that routes Slack messages to agents via HTTP
+- **Agent Containers**: Each runs Claude SDK + HTTP API for receiving messages
+- **Tools**: Built-in (Read/Write/Bash run in container) + custom MCP tools
+- **Permissions**: PreToolUse hooks enforce agent-specific restrictions
+- **Communication**: Orchestrator → HTTP → Agent containers
+
+```
+Orchestrator (Slack listener) --HTTP--> Agent Containers (SDK + HTTP API)
+```
+
+## Claude Agent SDK Usage
+
+Each agent container runs `ClaudeSDKClient` for persistent conversations.
+The orchestrator communicates with agents via HTTP API.
+
+```python
+# In agent container
+from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
+
+async with ClaudeSDKClient(options) as client:
+    await client.query(message)
+    async for msg in client.receive_response():
+        process(msg)
+```
+
+Custom tools use `@tool` decorator and `create_sdk_mcp_server()`.
+
+## Key Files
+- `config/orchestra.yml` - Global config (Slack, agent endpoints)
+- `config/agents/*.yml` - Agent definitions (tools, permissions, prompts)
+- `src/orchestra/core/orchestrator.py` - Slack listener, HTTP routing
+- `src/orchestra/agent/agent.py` - Agent service with SDK + HTTP API
+- `src/orchestra/tools/` - Custom MCP tool implementations
+
+## Agent Permissions
+
+Each agent has:
+- `allowed_tools` / `disallowed_tools` - Tool access
+- `permissions.filesystem` - Path restrictions
+- `permissions.git` - Branch push/merge restrictions
+
+Enforced via PreToolUse hooks that check before execution.
+
+## Testing
+```bash
+uv run pytest
+uv run pytest tests/test_agent.py -v
+```
+
+## Common Tasks
+- **Add agent**: Create YAML in config/agents/, add to docker-compose
+- **Add tool**: Use @tool decorator in src/orchestra/tools/, register in server
+- **Debug agent**: Check container logs: `docker logs agent-dev`
--- a/PLAN.md
+++ b/PLAN.md
@ -39,7 +39,7 @@ A multi-agent system where Claude-powered agents collaborate via Slack, each run

 ### Architecture Decision: Orchestrator vs. Per-Container SDK

-**Option A: Orchestrator runs SDK (chosen)**
+**Option A: Orchestrator runs SDK**
 ```
 ┌─────────────────────────────────────────┐
 │            Orchestrator                  │
@ -58,21 +58,31 @@ A multi-agent system where Claude-powered agents collaborate via Slack, each run
 └─────────────────┘
 ```

-**Option B: SDK runs inside each container**
+**Option B: SDK runs inside each container (chosen)**
 ```
+┌─────────────────────┐
+│    Orchestrator     │
+│  - Slack listener   │
+│  - Message routing  │
+│  - HTTP API         │
+└─────────────────────┘
+         │ HTTP
+         ▼
 ┌─────────────────┐     ┌─────────────────┐
 │ Agent Container │     │ Agent Container │
 │ - SDK Client    │     │ - SDK Client    │
 │ - Claude CLI    │     │ - Claude CLI    │
 │ - Node.js       │     │ - Node.js       │
+│ - Workspace     │     │ - Workspace     │
 └─────────────────┘     └─────────────────┘
 ```

-We choose **Option A** because:
- Simpler container images (no Node.js/CLI required)
- Centralized orchestration and routing
- SDK's built-in tools (Bash, Write) can still operate on container filesystems via mounted volumes
- Easier to manage API keys and rate limits
+We choose **Option B** because:
+- True isolation: each agent's SDK tools operate directly on their container filesystem
+- Simpler architecture: no `docker exec` complexity or volume mounting tricks
+- Better scalability: agents can run on different hosts
+- Cleaner permission model: container boundaries enforce filesystem isolation
+- More realistic: tools like Bash run in the agent's actual environment

 ### Setting Up SDK with Agent Options

@ -155,20 +165,23 @@ async with ClaudeSDKClient(options) as client:
 │  ┌─────────────────────────────────────────────────────────────┐│
 │  │                    Orchestrator Service                      ││
 │  │  - Slack event listener (Socket Mode)                       ││
+│  │  - HTTP API for agent communication                         ││
+│  │  - Message routing to agent containers                      ││
 │  │  - Agent lifecycle management                               ││
-│  │  - Message routing                                          ││
-│  │  - MCP server exposure                                      ││
 │  └─────────────────────────────────────────────────────────────┘│
-│                              │                                   │
+│                              │ HTTP                              │
 │         ┌────────────────────┼────────────────────┐             │
 │         ▼                    ▼                    ▼             │
 │  ┌─────────────┐     ┌─────────────┐      ┌─────────────┐       │
 │  │  CEO Agent  │     │  PM Agent   │      │  Dev Agent  │       │
 │  │  Container  │     │  Container  │      │  Container  │       │
 │  │             │     │             │      │             │       │
+│  │ - SDK Client│     │ - SDK Client│      │ - SDK Client│       │
+│  │ - Claude CLI│     │ - Claude CLI│      │ - Claude CLI│       │
+│  │ - Node.js   │     │ - Node.js   │      │ - Node.js   │       │
 │  │ - memory/   │     │ - memory/   │      │ - memory/   │       │
-│  │ - tools     │     │ - tools     │      │ - tools     │       │
 │  │ - git clone │     │ - git clone │      │ - git clone │       │
+│  │ - HTTP API  │     │ - HTTP API  │      │ - HTTP API  │       │
 │  └─────────────┘     └─────────────┘      └─────────────┘       │
 │         │                    │                    │             │
 │         └────────────────────┼────────────────────┘             │
@ -177,7 +190,6 @@ async with ClaudeSDKClient(options) as client:
 │                    │  Shared Volume  │                          │
 │                    │  - /repos/      │                          │
 │                    │  - /projects/   │                          │
-│                    │  - /memory/     │                          │
 │                    └─────────────────┘                          │
 └─────────────────────────────────────────────────────────────────┘
 ```
@ -192,8 +204,8 @@ agent-orchestra/
 ├── README.md
 ├── pyproject.toml
 ├── docker-compose.yml
-├── Dockerfile.orchestrator      # Main service (runs SDK)
-├── Dockerfile.workspace         # Lightweight workspace containers
+├── Dockerfile.orchestrator      # Lightweight routing service
+├── Dockerfile.agent             # Full agent container (SDK, Node.js, CLI)
 │
 ├── config/
 │   ├── orchestra.yml            # Global config (Slack, Gitea, Docker)
@ -632,33 +644,35 @@ can_delegate_to:

 ```python
 """
-Main orchestrator that:
- Loads agent configs from YAML
- Manages agent container lifecycle
- Routes Slack messages to appropriate agents
- Handles agent-to-agent communication
+Lightweight orchestrator that:
+- Listens to Slack events (Socket Mode)
+- Routes messages to agent containers via HTTP
+- Manages container lifecycle (start/stop)
+- Does NOT run the Claude SDK (agents do)
 """

 class Orchestrator:
    async def start(self) -> None: ...
    async def stop(self) -> None: ...
    async def route_message(self, event: SlackEvent) -> None: ...
-    async def spawn_agent(self, agent_id: str) -> AgentContainer: ...
-    async def get_agent(self, agent_id: str) -> Agent: ...
+    async def send_to_agent(self, agent_id: str, message: str) -> str: ...
+    async def start_agent_container(self, agent_id: str) -> None: ...
+    async def stop_agent_container(self, agent_id: str) -> None: ...
 ```

-### 2. Agent (`src/orchestra/core/agent.py`)
+### 2. Agent (`src/orchestra/agent/agent.py`)

-Uses `ClaudeSDKClient` from the Claude Agent SDK for persistent conversation sessions.
-Each agent maintains its own client instance for continuous context.
+Each agent runs inside its own Docker container with the Claude Agent SDK.
+The agent exposes an HTTP API for the orchestrator to send messages.

 ```python
 """
-Wraps Claude Agent SDK with:
- Persistent conversation via ClaudeSDKClient
+Agent service running inside container:
+- Exposes HTTP API for receiving messages from orchestrator
+- Uses ClaudeSDKClient for persistent conversation
 - Custom tools via SDK MCP servers (in-process)
- Memory management
- Slack integration
+- Tools operate directly on container filesystem
+- Memory management local to container
 """
 from claude_agent_sdk import (
    ClaudeSDKClient, 
@ -671,14 +685,16 @@ from claude_agent_sdk import (
    HookMatcher
 )

-class Agent:
+class AgentService:
+    """HTTP service that wraps the Claude SDK client."""
    id: str
    config: AgentConfig
    client: ClaudeSDKClient  # Persistent session
    memory: MemoryManager
+    app: FastAPI  # HTTP server for orchestrator communication

    async def start(self) -> None:
-        """Initialize the ClaudeSDKClient with agent-specific options."""
+        """Initialize SDK client and start HTTP server."""
        # Create in-process MCP server for custom tools
        tools_server = create_sdk_mcp_server(
            name=f"{self.id}-tools",
@ -701,13 +717,29 @@ class Agent:
                }
            },
            permission_mode="acceptEdits",  # Auto-accept in containers
-            cwd=f"/workspace/{self.id}",
+            cwd="/workspace",  # Agent's local workspace
            hooks=self._build_hooks()
        )

        self.client = ClaudeSDKClient(options)
        await self.client.connect()

+        # Start HTTP server for orchestrator
+        await self._start_http_server()
+
+    async def _start_http_server(self) -> None:
+        """Start FastAPI server to receive messages from orchestrator."""
+        self.app = FastAPI()
+
+        @self.app.post("/message")
+        async def handle_message(request: MessageRequest) -> MessageResponse:
+            response = await self.process_message(request.text, request.context)
+            return MessageResponse(text=response)
+
+        @self.app.get("/health")
+        async def health():
+            return {"status": "ok", "agent_id": self.id}
+
    async def process_message(self, message: str, context: dict) -> str:
        """Send message and collect response. Client maintains conversation history."""
        await self.client.query(message)
@ -722,7 +754,7 @@ class Agent:
        return response_text

    async def stop(self) -> None:
-        """Disconnect the client."""
+        """Disconnect the client and stop HTTP server."""
        await self.client.disconnect()
    
    def _build_tools(self) -> list:
@ -1213,65 +1245,96 @@ services:
    environment:
      - SLACK_APP_TOKEN=${SLACK_APP_TOKEN}
      - SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN}
+    volumes:
+      - ./config:/app/config:ro
+    networks:
+      - orchestra-net
+    depends_on:
+      - agent-ceo
+      - agent-pm
+      - agent-dev
+      - agent-techlead
+
+  # Full agent containers (SDK, Node.js, CLI)
+  agent-ceo:
+    build:
+      context: .
+      dockerfile: Dockerfile.agent
+    environment:
+      - AGENT_ID=ceo
+      - AGENT_PORT=8001
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - GITEA_URL=${GITEA_URL}
      - GITEA_API_TOKEN=${GITEA_API_TOKEN}
    volumes:
-      - ./config:/app/config:ro
-      - ./data:/app/data
-      - /var/run/docker.sock:/var/run/docker.sock  # For container management
-    networks:
-      - orchestra-net
-    depends_on:
-      - workspace-ceo
-      - workspace-pm
-      - workspace-dev
-      - workspace-techlead
-
-  # Lightweight workspace containers (no SDK, just filesystem)
-  workspace-ceo:
-    build:
-      context: .
-      dockerfile: Dockerfile.workspace
-    volumes:
+      - ./config/agents/ceo.yml:/app/config/agent.yml:ro
      - ./data/workspaces/ceo:/workspace
      - ./data/repos:/repos
      - ./data/memory/ceo:/memory
      - ./data/projects:/projects
+    ports:
+      - "8001:8001"
    networks:
      - orchestra-net

-  workspace-pm:
+  agent-pm:
    build:
      context: .
-      dockerfile: Dockerfile.workspace
+      dockerfile: Dockerfile.agent
+    environment:
+      - AGENT_ID=product_manager
+      - AGENT_PORT=8002
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
+      - GITEA_URL=${GITEA_URL}
+      - GITEA_API_TOKEN=${GITEA_API_TOKEN}
    volumes:
+      - ./config/agents/product_manager.yml:/app/config/agent.yml:ro
      - ./data/workspaces/product_manager:/workspace
      - ./data/repos:/repos
      - ./data/memory/product_manager:/memory
      - ./data/projects:/projects
+    ports:
+      - "8002:8002"
    networks:
      - orchestra-net

-  workspace-dev:
+  agent-dev:
    build:
      context: .
-      dockerfile: Dockerfile.workspace
+      dockerfile: Dockerfile.agent
+    environment:
+      - AGENT_ID=developer
+      - AGENT_PORT=8003
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
+      - GITEA_URL=${GITEA_URL}
+      - GITEA_API_TOKEN=${GITEA_API_TOKEN}
    volumes:
+      - ./config/agents/developer.yml:/app/config/agent.yml:ro
      - ./data/workspaces/developer:/workspace
      - ./data/repos:/repos
      - ./data/memory/developer:/memory
+    ports:
+      - "8003:8003"
    networks:
      - orchestra-net

-  workspace-techlead:
+  agent-techlead:
    build:
      context: .
-      dockerfile: Dockerfile.workspace
+      dockerfile: Dockerfile.agent
+    environment:
+      - AGENT_ID=tech_lead
+      - AGENT_PORT=8004
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
+      - GITEA_URL=${GITEA_URL}
+      - GITEA_API_TOKEN=${GITEA_API_TOKEN}
    volumes:
+      - ./config/agents/tech_lead.yml:/app/config/agent.yml:ro
      - ./data/workspaces/tech_lead:/workspace
      - ./data/repos:/repos:ro  # Read-only for tech lead
      - ./data/memory/tech_lead:/memory
+    ports:
+      - "8004:8004"
    networks:
      - orchestra-net

@ -1288,10 +1351,32 @@ networks:
 ```dockerfile
 FROM python:3.12-slim

-# Install Node.js (required for Claude Agent SDK)
+# Lightweight orchestrator - no Node.js/SDK needed
+RUN apt-get update && apt-get install -y \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Python dependencies
+WORKDIR /app
+COPY pyproject.toml .
+RUN pip install uv && uv sync
+
+COPY src/orchestra/ ./src/orchestra/
+COPY config/ ./config/
+
+CMD ["uv", "run", "python", "-m", "orchestra.main"]
+```
+
+### Dockerfile.agent
+
+```dockerfile
+FROM python:3.12-slim
+
+# Full agent container with Claude SDK and tools
 RUN apt-get update && apt-get install -y \
    curl \
    git \
+    jq \
    && curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y nodejs \
    && rm -rf /var/lib/apt/lists/*
@ -1304,105 +1389,18 @@ WORKDIR /app
 COPY pyproject.toml .
 RUN pip install uv && uv sync

-COPY src/ ./src/
-COPY config/ ./config/
-
-CMD ["uv", "run", "python", "-m", "orchestra.main"]
-```
-
-### Dockerfile.workspace
-
-```dockerfile
-FROM python:3.12-slim
-
-# Lightweight container with common dev tools
-RUN apt-get update && apt-get install -y \
-    git \
-    curl \
-    jq \
-    && rm -rf /var/lib/apt/lists/*
+# Copy agent service code
+COPY src/orchestra/agent/ ./src/orchestra/agent/
+COPY src/orchestra/tools/ ./src/orchestra/tools/

 # Create workspace structure
 RUN mkdir -p /workspace /repos /memory

-WORKDIR /workspace
+# Expose HTTP API port
+EXPOSE 8001

-# Keep container running (orchestrator executes commands via docker exec)
-CMD ["tail", "-f", "/dev/null"]
-```
-
---
-
-## CLAUDE.md (for Claude Code)
-
-```markdown
-# Agent Orchestra
-
-Multi-agent system with Claude agents communicating via Slack.
-Uses the Claude Agent SDK (claude-agent-sdk) for agent execution.
-
-## Quick Start
-```bash
-# Prerequisites: Node.js, Claude Code CLI
-npm install -g @anthropic-ai/claude-code
-
-# Install Python dependencies
-uv sync
-cp .env.example .env  # Fill in ANTHROPIC_API_KEY, Slack tokens, Gitea tokens
-uv run python -m orchestra.main
-```
-
-## Architecture
-
- **Orchestrator**: Routes Slack messages to agents, manages lifecycle
- **Agents**: Each agent uses ClaudeSDKClient for persistent conversations
- **Tools**: Built-in (Read/Write/Bash) + custom MCP tools (Slack/Tasks/Gitea)
- **Permissions**: PreToolUse hooks enforce agent-specific restrictions
- **Containers**: Each agent has isolated Docker container for workspace
-
-## Claude Agent SDK Usage
-
-We use `ClaudeSDKClient` (not `query()`) because agents need persistent
-conversation context. Key patterns:
-
-```python
-from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
-
-async with ClaudeSDKClient(options) as client:
-    await client.query(message)
-    async for msg in client.receive_response():
-        process(msg)
-```
-
-Custom tools use `@tool` decorator and `create_sdk_mcp_server()`.
-
-## Key Files
- `config/orchestra.yml` - Global config (Slack, Gitea, Docker)
- `config/agents/*.yml` - Agent definitions (tools, permissions, prompts)
- `src/orchestra/core/orchestrator.py` - Message routing, lifecycle
- `src/orchestra/core/agent.py` - ClaudeSDKClient wrapper
- `src/orchestra/tools/` - Custom MCP tool implementations
- `src/orchestra/tools/permissions.py` - PreToolUse hook for restrictions
-
-## Agent Permissions
-
-Each agent has:
- `allowed_tools` / `disallowed_tools` - Tool access
- `permissions.filesystem` - Path restrictions
- `permissions.git` - Branch push/merge restrictions
-
-Enforced via PreToolUse hooks that check before execution.
-
-## Testing
-```bash
-uv run pytest
-uv run pytest tests/test_agent.py -v
-```
-
-## Common Tasks
- **Add agent**: Create YAML in config/agents/, add to docker-compose
- **Add tool**: Use @tool decorator in src/orchestra/tools/, register in server
- **Debug agent**: Logs in stdout, use `--debug` for SDK verbose output
+# Run agent service
+CMD ["uv", "run", "python", "-m", "orchestra.agent.main"]
 ```

 ---
@ -1425,6 +1423,8 @@ dependencies = [
    "aiohttp>=3.9.0",
    "httpx>=0.26.0",
    "anyio>=4.0.0",
+    "fastapi>=0.109.0",            # HTTP API for agent containers
+    "uvicorn>=0.27.0",             # ASGI server for FastAPI
 ]

 [project.optional-dependencies]
@ -1438,7 +1438,7 @@ dev = [

 **Prerequisites:**
 - Python 3.10+
- Node.js (required by Claude Agent SDK)
- Claude Code CLI 2.0.0+: `npm install -g @anthropic-ai/claude-code`
+- Node.js (required by Claude Agent SDK in agent containers)
+- Claude Code CLI 2.0.0+: `npm install -g @anthropic-ai/claude-code` (in agent containers)

 ---