change to: 'SDK runs inside each container'

This commit is contained in:
Pierre Wessman 2025-11-27 10:51:43 +01:00
parent cc3148adc4
commit 03484cf5b7
2 changed files with 230 additions and 159 deletions

71
CLAUDE.md Normal file
View File

@ -0,0 +1,71 @@
# Agent Orchestra
Multi-agent system with Claude agents communicating via Slack.
Each agent runs inside its own Docker container with the Claude Agent SDK.
## Quick Start
```bash
# Start all services
docker-compose up -d
# Or for local development:
uv sync
cp .env.example .env # Fill in ANTHROPIC_API_KEY, Slack tokens, Gitea tokens
uv run python -m orchestra.main
```
## Architecture
- **Orchestrator**: Lightweight service that routes Slack messages to agents via HTTP
- **Agent Containers**: Each runs Claude SDK + HTTP API for receiving messages
- **Tools**: Built-in (Read/Write/Bash run in container) + custom MCP tools
- **Permissions**: PreToolUse hooks enforce agent-specific restrictions
- **Communication**: Orchestrator → HTTP → Agent containers
```
Orchestrator (Slack listener) --HTTP--> Agent Containers (SDK + HTTP API)
```
## Claude Agent SDK Usage
Each agent container runs `ClaudeSDKClient` for persistent conversations.
The orchestrator communicates with agents via HTTP API.
```python
# In agent container
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
async with ClaudeSDKClient(options) as client:
await client.query(message)
async for msg in client.receive_response():
process(msg)
```
Custom tools use `@tool` decorator and `create_sdk_mcp_server()`.
## Key Files
- `config/orchestra.yml` - Global config (Slack, agent endpoints)
- `config/agents/*.yml` - Agent definitions (tools, permissions, prompts)
- `src/orchestra/core/orchestrator.py` - Slack listener, HTTP routing
- `src/orchestra/agent/agent.py` - Agent service with SDK + HTTP API
- `src/orchestra/tools/` - Custom MCP tool implementations
## Agent Permissions
Each agent has:
- `allowed_tools` / `disallowed_tools` - Tool access
- `permissions.filesystem` - Path restrictions
- `permissions.git` - Branch push/merge restrictions
Enforced via PreToolUse hooks that check before execution.
## Testing
```bash
uv run pytest
uv run pytest tests/test_agent.py -v
```
## Common Tasks
- **Add agent**: Create YAML in config/agents/, add to docker-compose
- **Add tool**: Use @tool decorator in src/orchestra/tools/, register in server
- **Debug agent**: Check container logs: `docker logs agent-dev`

304
PLAN.md
View File

@ -39,7 +39,7 @@ A multi-agent system where Claude-powered agents collaborate via Slack, each run
### Architecture Decision: Orchestrator vs. Per-Container SDK
**Option A: Orchestrator runs SDK (chosen)**
**Option A: Orchestrator runs SDK**
```
┌─────────────────────────────────────────┐
│ Orchestrator │
@ -58,21 +58,31 @@ A multi-agent system where Claude-powered agents collaborate via Slack, each run
└─────────────────┘
```
**Option B: SDK runs inside each container**
**Option B: SDK runs inside each container (chosen)**
```
┌─────────────────────┐
│ Orchestrator │
│ - Slack listener │
│ - Message routing │
│ - HTTP API │
└─────────────────────┘
│ HTTP
┌─────────────────┐ ┌─────────────────┐
│ Agent Container │ │ Agent Container │
│ - SDK Client │ │ - SDK Client │
│ - Claude CLI │ │ - Claude CLI │
│ - Node.js │ │ - Node.js │
│ - Workspace │ │ - Workspace │
└─────────────────┘ └─────────────────┘
```
We choose **Option A** because:
- Simpler container images (no Node.js/CLI required)
- Centralized orchestration and routing
- SDK's built-in tools (Bash, Write) can still operate on container filesystems via mounted volumes
- Easier to manage API keys and rate limits
We choose **Option B** because:
- True isolation: each agent's SDK tools operate directly on their container filesystem
- Simpler architecture: no `docker exec` complexity or volume mounting tricks
- Better scalability: agents can run on different hosts
- Cleaner permission model: container boundaries enforce filesystem isolation
- More realistic: tools like Bash run in the agent's actual environment
### Setting Up SDK with Agent Options
@ -155,20 +165,23 @@ async with ClaudeSDKClient(options) as client:
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Orchestrator Service ││
│ │ - Slack event listener (Socket Mode) ││
│ │ - HTTP API for agent communication ││
│ │ - Message routing to agent containers ││
│ │ - Agent lifecycle management ││
│ │ - Message routing ││
│ │ - MCP server exposure ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ │ HTTP │
│ ┌────────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ CEO Agent │ │ PM Agent │ │ Dev Agent │ │
│ │ Container │ │ Container │ │ Container │ │
│ │ │ │ │ │ │ │
│ │ - SDK Client│ │ - SDK Client│ │ - SDK Client│ │
│ │ - Claude CLI│ │ - Claude CLI│ │ - Claude CLI│ │
│ │ - Node.js │ │ - Node.js │ │ - Node.js │ │
│ │ - memory/ │ │ - memory/ │ │ - memory/ │ │
│ │ - tools │ │ - tools │ │ - tools │ │
│ │ - git clone │ │ - git clone │ │ - git clone │ │
│ │ - HTTP API │ │ - HTTP API │ │ - HTTP API │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
@ -177,7 +190,6 @@ async with ClaudeSDKClient(options) as client:
│ │ Shared Volume │ │
│ │ - /repos/ │ │
│ │ - /projects/ │ │
│ │ - /memory/ │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
@ -192,8 +204,8 @@ agent-orchestra/
├── README.md
├── pyproject.toml
├── docker-compose.yml
├── Dockerfile.orchestrator # Main service (runs SDK)
├── Dockerfile.workspace # Lightweight workspace containers
├── Dockerfile.orchestrator # Lightweight routing service
├── Dockerfile.agent # Full agent container (SDK, Node.js, CLI)
├── config/
│ ├── orchestra.yml # Global config (Slack, Gitea, Docker)
@ -632,33 +644,35 @@ can_delegate_to:
```python
"""
Main orchestrator that:
- Loads agent configs from YAML
- Manages agent container lifecycle
- Routes Slack messages to appropriate agents
- Handles agent-to-agent communication
Lightweight orchestrator that:
- Listens to Slack events (Socket Mode)
- Routes messages to agent containers via HTTP
- Manages container lifecycle (start/stop)
- Does NOT run the Claude SDK (agents do)
"""
class Orchestrator:
async def start(self) -> None: ...
async def stop(self) -> None: ...
async def route_message(self, event: SlackEvent) -> None: ...
async def spawn_agent(self, agent_id: str) -> AgentContainer: ...
async def get_agent(self, agent_id: str) -> Agent: ...
async def send_to_agent(self, agent_id: str, message: str) -> str: ...
async def start_agent_container(self, agent_id: str) -> None: ...
async def stop_agent_container(self, agent_id: str) -> None: ...
```
### 2. Agent (`src/orchestra/core/agent.py`)
### 2. Agent (`src/orchestra/agent/agent.py`)
Uses `ClaudeSDKClient` from the Claude Agent SDK for persistent conversation sessions.
Each agent maintains its own client instance for continuous context.
Each agent runs inside its own Docker container with the Claude Agent SDK.
The agent exposes an HTTP API for the orchestrator to send messages.
```python
"""
Wraps Claude Agent SDK with:
- Persistent conversation via ClaudeSDKClient
Agent service running inside container:
- Exposes HTTP API for receiving messages from orchestrator
- Uses ClaudeSDKClient for persistent conversation
- Custom tools via SDK MCP servers (in-process)
- Memory management
- Slack integration
- Tools operate directly on container filesystem
- Memory management local to container
"""
from claude_agent_sdk import (
ClaudeSDKClient,
@ -671,14 +685,16 @@ from claude_agent_sdk import (
HookMatcher
)
class Agent:
class AgentService:
"""HTTP service that wraps the Claude SDK client."""
id: str
config: AgentConfig
client: ClaudeSDKClient # Persistent session
memory: MemoryManager
app: FastAPI # HTTP server for orchestrator communication
async def start(self) -> None:
"""Initialize the ClaudeSDKClient with agent-specific options."""
"""Initialize SDK client and start HTTP server."""
# Create in-process MCP server for custom tools
tools_server = create_sdk_mcp_server(
name=f"{self.id}-tools",
@ -701,13 +717,29 @@ class Agent:
}
},
permission_mode="acceptEdits", # Auto-accept in containers
cwd=f"/workspace/{self.id}",
cwd="/workspace", # Agent's local workspace
hooks=self._build_hooks()
)
self.client = ClaudeSDKClient(options)
await self.client.connect()
# Start HTTP server for orchestrator
await self._start_http_server()
async def _start_http_server(self) -> None:
"""Start FastAPI server to receive messages from orchestrator."""
self.app = FastAPI()
@self.app.post("/message")
async def handle_message(request: MessageRequest) -> MessageResponse:
response = await self.process_message(request.text, request.context)
return MessageResponse(text=response)
@self.app.get("/health")
async def health():
return {"status": "ok", "agent_id": self.id}
async def process_message(self, message: str, context: dict) -> str:
"""Send message and collect response. Client maintains conversation history."""
await self.client.query(message)
@ -722,7 +754,7 @@ class Agent:
return response_text
async def stop(self) -> None:
"""Disconnect the client."""
"""Disconnect the client and stop HTTP server."""
await self.client.disconnect()
def _build_tools(self) -> list:
@ -1213,65 +1245,96 @@ services:
environment:
- SLACK_APP_TOKEN=${SLACK_APP_TOKEN}
- SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN}
volumes:
- ./config:/app/config:ro
networks:
- orchestra-net
depends_on:
- agent-ceo
- agent-pm
- agent-dev
- agent-techlead
# Full agent containers (SDK, Node.js, CLI)
agent-ceo:
build:
context: .
dockerfile: Dockerfile.agent
environment:
- AGENT_ID=ceo
- AGENT_PORT=8001
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GITEA_URL=${GITEA_URL}
- GITEA_API_TOKEN=${GITEA_API_TOKEN}
volumes:
- ./config:/app/config:ro
- ./data:/app/data
- /var/run/docker.sock:/var/run/docker.sock # For container management
networks:
- orchestra-net
depends_on:
- workspace-ceo
- workspace-pm
- workspace-dev
- workspace-techlead
# Lightweight workspace containers (no SDK, just filesystem)
workspace-ceo:
build:
context: .
dockerfile: Dockerfile.workspace
volumes:
- ./config/agents/ceo.yml:/app/config/agent.yml:ro
- ./data/workspaces/ceo:/workspace
- ./data/repos:/repos
- ./data/memory/ceo:/memory
- ./data/projects:/projects
ports:
- "8001:8001"
networks:
- orchestra-net
workspace-pm:
agent-pm:
build:
context: .
dockerfile: Dockerfile.workspace
dockerfile: Dockerfile.agent
environment:
- AGENT_ID=product_manager
- AGENT_PORT=8002
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GITEA_URL=${GITEA_URL}
- GITEA_API_TOKEN=${GITEA_API_TOKEN}
volumes:
- ./config/agents/product_manager.yml:/app/config/agent.yml:ro
- ./data/workspaces/product_manager:/workspace
- ./data/repos:/repos
- ./data/memory/product_manager:/memory
- ./data/projects:/projects
ports:
- "8002:8002"
networks:
- orchestra-net
workspace-dev:
agent-dev:
build:
context: .
dockerfile: Dockerfile.workspace
dockerfile: Dockerfile.agent
environment:
- AGENT_ID=developer
- AGENT_PORT=8003
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GITEA_URL=${GITEA_URL}
- GITEA_API_TOKEN=${GITEA_API_TOKEN}
volumes:
- ./config/agents/developer.yml:/app/config/agent.yml:ro
- ./data/workspaces/developer:/workspace
- ./data/repos:/repos
- ./data/memory/developer:/memory
ports:
- "8003:8003"
networks:
- orchestra-net
workspace-techlead:
agent-techlead:
build:
context: .
dockerfile: Dockerfile.workspace
dockerfile: Dockerfile.agent
environment:
- AGENT_ID=tech_lead
- AGENT_PORT=8004
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GITEA_URL=${GITEA_URL}
- GITEA_API_TOKEN=${GITEA_API_TOKEN}
volumes:
- ./config/agents/tech_lead.yml:/app/config/agent.yml:ro
- ./data/workspaces/tech_lead:/workspace
- ./data/repos:/repos:ro # Read-only for tech lead
- ./data/memory/tech_lead:/memory
ports:
- "8004:8004"
networks:
- orchestra-net
@ -1288,10 +1351,32 @@ networks:
```dockerfile
FROM python:3.12-slim
# Install Node.js (required for Claude Agent SDK)
# Lightweight orchestrator - no Node.js/SDK needed
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
WORKDIR /app
COPY pyproject.toml .
RUN pip install uv && uv sync
COPY src/orchestra/ ./src/orchestra/
COPY config/ ./config/
CMD ["uv", "run", "python", "-m", "orchestra.main"]
```
### Dockerfile.agent
```dockerfile
FROM python:3.12-slim
# Full agent container with Claude SDK and tools
RUN apt-get update && apt-get install -y \
curl \
git \
jq \
&& curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y nodejs \
&& rm -rf /var/lib/apt/lists/*
@ -1304,105 +1389,18 @@ WORKDIR /app
COPY pyproject.toml .
RUN pip install uv && uv sync
COPY src/ ./src/
COPY config/ ./config/
CMD ["uv", "run", "python", "-m", "orchestra.main"]
```
### Dockerfile.workspace
```dockerfile
FROM python:3.12-slim
# Lightweight container with common dev tools
RUN apt-get update && apt-get install -y \
git \
curl \
jq \
&& rm -rf /var/lib/apt/lists/*
# Copy agent service code
COPY src/orchestra/agent/ ./src/orchestra/agent/
COPY src/orchestra/tools/ ./src/orchestra/tools/
# Create workspace structure
RUN mkdir -p /workspace /repos /memory
WORKDIR /workspace
# Expose HTTP API port
EXPOSE 8001
# Keep container running (orchestrator executes commands via docker exec)
CMD ["tail", "-f", "/dev/null"]
```
---
## CLAUDE.md (for Claude Code)
```markdown
# Agent Orchestra
Multi-agent system with Claude agents communicating via Slack.
Uses the Claude Agent SDK (claude-agent-sdk) for agent execution.
## Quick Start
```bash
# Prerequisites: Node.js, Claude Code CLI
npm install -g @anthropic-ai/claude-code
# Install Python dependencies
uv sync
cp .env.example .env # Fill in ANTHROPIC_API_KEY, Slack tokens, Gitea tokens
uv run python -m orchestra.main
```
## Architecture
- **Orchestrator**: Routes Slack messages to agents, manages lifecycle
- **Agents**: Each agent uses ClaudeSDKClient for persistent conversations
- **Tools**: Built-in (Read/Write/Bash) + custom MCP tools (Slack/Tasks/Gitea)
- **Permissions**: PreToolUse hooks enforce agent-specific restrictions
- **Containers**: Each agent has isolated Docker container for workspace
## Claude Agent SDK Usage
We use `ClaudeSDKClient` (not `query()`) because agents need persistent
conversation context. Key patterns:
```python
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
async with ClaudeSDKClient(options) as client:
await client.query(message)
async for msg in client.receive_response():
process(msg)
```
Custom tools use `@tool` decorator and `create_sdk_mcp_server()`.
## Key Files
- `config/orchestra.yml` - Global config (Slack, Gitea, Docker)
- `config/agents/*.yml` - Agent definitions (tools, permissions, prompts)
- `src/orchestra/core/orchestrator.py` - Message routing, lifecycle
- `src/orchestra/core/agent.py` - ClaudeSDKClient wrapper
- `src/orchestra/tools/` - Custom MCP tool implementations
- `src/orchestra/tools/permissions.py` - PreToolUse hook for restrictions
## Agent Permissions
Each agent has:
- `allowed_tools` / `disallowed_tools` - Tool access
- `permissions.filesystem` - Path restrictions
- `permissions.git` - Branch push/merge restrictions
Enforced via PreToolUse hooks that check before execution.
## Testing
```bash
uv run pytest
uv run pytest tests/test_agent.py -v
```
## Common Tasks
- **Add agent**: Create YAML in config/agents/, add to docker-compose
- **Add tool**: Use @tool decorator in src/orchestra/tools/, register in server
- **Debug agent**: Logs in stdout, use `--debug` for SDK verbose output
# Run agent service
CMD ["uv", "run", "python", "-m", "orchestra.agent.main"]
```
---
@ -1425,6 +1423,8 @@ dependencies = [
"aiohttp>=3.9.0",
"httpx>=0.26.0",
"anyio>=4.0.0",
"fastapi>=0.109.0", # HTTP API for agent containers
"uvicorn>=0.27.0", # ASGI server for FastAPI
]
[project.optional-dependencies]
@ -1438,7 +1438,7 @@ dev = [
**Prerequisites:**
- Python 3.10+
- Node.js (required by Claude Agent SDK)
- Claude Code CLI 2.0.0+: `npm install -g @anthropic-ai/claude-code`
- Node.js (required by Claude Agent SDK in agent containers)
- Claude Code CLI 2.0.0+: `npm install -g @anthropic-ai/claude-code` (in agent containers)
---