Building Multi-Agent Systems with Claude Opus 4 for Complex Tasks

Building Multi-Agent Systems with Claude Opus 4 for Complex Tasks
Photo by Zulfugar Karimov / Unsplash

The era of the "single prompt" solution is ending. As we move from simple chatbots to autonomous systems capable of executing complex enterprise workflows—like automated due diligence, legacy code migration, or dynamic market analysis—the limitation isn't the model's knowledge, but its architecture.

To solve multi-step, ambiguous problems, we must move toward Multi-Agent Systems (MAS). In these architectures, a highly intelligent "Orchestrator" breaks down high-level goals into sub-tasks and delegates them to specialized agents.

Currently, Claude 3 Opus stands as the premier choice for this "Orchestrator" role due to its superior reasoning, long-context recall, and ability to follow complex chain-of-thought instructions without hallucinating. While the industry anticipates the arrival of next-generation models (often speculated as "Claude 4"), the architectural patterns we build today with Opus are the foundation for those future capabilities.

In this article, we will engineer a robust multi-agent system using Python and the Anthropic API, designed for CTOs and Senior Engineers ready to move beyond proof-of-concept.

The Architecture: The Hub-and-Spoke "Supervisor" Pattern

For complex tasks, a flat structure where agents talk to everyone else often leads to infinite loops and state drift. Instead, we use a Hub-and-Spoke (Supervisor) pattern.

  • The Brain (Supervisor): Powered by Claude 3 Opus. It holds the "Global State" and the "Plan." It does not execute tools (like scraping or database writes) directly unless critical. Its job is to think, critique, and route.
  • The Workers (Sub-Agents): Powered by faster, cost-effective models like Claude 3.5 Sonnet or Haiku. These agents possess specific "Tools" (functions) and are myopic—they only care about their specific sub-task (e.g., "Scrape this URL" or "Run this SQL query").
  • Shared State: A persistent JSON or Pydantic object that tracks the history of actions, results, and the current plan.

Technical Implementation

We will build a system where an Orchestrator (Opus) manages a research workflow. It will delegate tasks to a "Search Agent" and a "Writer Agent."

1. Prerequisites and Setup

We rely on the native anthropic SDK and pydantic for strict type validation—a non-negotiable for production Product Engineering.

pip install anthropic pydantic

2. Defining the State and Tools

First, we define the structure of our messages and the tools our agents can use.

import os
import json
from typing import List, Dict, Any, Optional
from pydantic import BaseModel, Field
from anthropic import Anthropic

# Initialize Client
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

# Define a tool for our workers
def web_search_tool(query: str):
    # In production, replace with SerpAPI or similar
    return f"Simulated search results for: {query}"

TOOLS_DEFINITION = [
    {
        "name": "web_search",
        "description": "Searches the web for information.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"}
            },
            "required": ["query"]
        }
    }
]

3. The Opus Orchestrator (The "Brain")

The critical engineering challenge is the System Prompt. We must force Claude 3 Opus to act strictly as a manager, not a worker. We use XML tagging (which Claude prefers) to define clear boundaries.

ORCHESTRATOR_SYSTEM_PROMPT = """
You are the Chief Orchestrator of a research team.
Your Goal: Answer the user's complex question by delegating tasks to your workers.

You have access to the following workers:
1. 'researcher': Can search the internet.
2. 'writer': Can compile information into a summary.

Instructions:
1. Analyze the user's request.
2. Break it down into step-by-step sub-tasks.
3. Output a JSON object with the key "next_action" and "payload".
   - If you need information, route to 'researcher'.
   - If you have enough info, route to 'writer'.
   - If the task is done, output "FINISH".

Do not perform the search yourself. DELEGATE.
"""

def get_orchestrator_decision(messages: List[Dict]) -> Dict:
    """
    Asks Claude 3 Opus to decide the next step.
    """
    response = client.messages.create(
        model="claude-3-opus-20240229", # Using Opus for high-level reasoning
        max_tokens=1024,
        system=ORCHESTRATOR_SYSTEM_PROMPT,
        messages=messages
    )
    
    # In a real app, use robust JSON parsing/validation here
    try:
        decision = json.loads(response.content[0].text)
        return decision
    except json.JSONDecodeError:
        # Fallback logic or retry mechanism
        return {"next_action": "ERROR", "payload": "Failed to parse JSON"}

4. The Worker Agents (The Execution Layer)

For the workers, we don't need the massive reasoning cost of Opus. We can use Claude 3.5 Sonnet, which is faster and highly capable of tool execution.

def run_worker_agent(agent_name: str, task_description: str) -> str:
    """
    Executes a specific task using a sub-agent (Sonnet).
    """
    if agent_name == "researcher":
        # This agent is 'bound' to the web_search tool
        response = client.messages.create(
            model="claude-3-5-sonnet-20240620",
            max_tokens=1024,
            tools=TOOLS_DEFINITION,
            system=f"You are a specialist {agent_name}. Perform the requested task using your tools.",
            messages=[{"role": "user", "content": task_description}]
        )
        
        # specific logic to handle tool use response
        if response.stop_reason == "tool_use":
            tool_use = next(block for block in response.content if block.type == "tool_use")
            # Execute the function (simulated)
            result = web_search_tool(tool_use.input["query"])
            return f"Search Result: {result}"
            
    elif agent_name == "writer":
        # Pure generation task
        response = client.messages.create(
            model="claude-3-5-sonnet-20240620",
            max_tokens=1024,
            system="You are a technical writer. Summarize the provided context.",
            messages=[{"role": "user", "content": task_description}]
        )
        return response.content[0].text
        
    return "Unknown Agent"

5. The Execution Loop (The "Runtime")

Finally, we need a while loop to maintain the life-cycle of the request. This allows the system to be "stateful" over time, a requirement for Intelligent AI Workflows.

def run_multi_agent_system(user_query: str):
    conversation_history = [{"role": "user", "content": user_query}]
    
    print(f"Starting Task: {user_query}")
    
    while True:
        # 1. Orchestrator decides
        decision = get_orchestrator_decision(conversation_history)
        action = decision.get("next_action")
        payload = decision.get("payload")
        
        print(f"Orchestrator Decision: {action}")
        
        if action == "FINISH":
            print("Final Answer:", payload)
            break
            
        if action == "ERROR":
            print("Orchestration Error")
            break
            
        # 2. Worker executes
        worker_result = run_worker_agent(action, payload)
        
        # 3. Update State (Context)
        # We append the result back to the history so Opus knows what happened
        conversation_history.append({
            "role": "assistant", 
            "content": f"Delegated to {action}. Result: {worker_result}"
        })

Advanced Considerations for CTOs

Handling State and Context Drift

In production, you cannot simply append messages forever; the context window (even Opus's 200k tokens) will fill up, increasing latency and cost.

Strategy: Implement a "Summarizer" step. Every 5 turns, have a separate Sonnet call condense the conversation_history into a bulleted list of "Facts Known". Pass only the "Facts Known" and the "Current Objective" to the Orchestrator.

Error Recovery and Self-Correction

Agents fail. A web scraper might return a 403; a database query might syntax error.

Pattern: Do not crash the application. Feed the error message (e.g., "Tool execution failed: Timeout") back to the Orchestrator. Opus is smart enough to read the error and issue a new command (e.g., "Try a different search term" or "Use a different URL"). This "Self-Healing" loop is the defining characteristic of Custom AI Agent Development.

Cost vs. Accuracy

The "Orchestrator" pattern allows you to optimize costs. You pay for the intelligence of Opus only for the high-level planning (routing), while 90% of the token volume (reading docs, scraping, summarizing) is handled by the cheaper Sonnet or Haiku models.

Conclusion

Building multi-agent systems is no longer about prompt engineering; it is a software architecture discipline. It requires robust state management, clear interface definitions, and a strategic mix of models.

While we await the release of models like Claude 4, the patterns established here—Orchestration, Delegation, and Self-Correction—are future-proof. They allow you to swap in more powerful "brains" as they become available without rewriting your entire infrastructure.

If your organization is looking to scale its engineering capabilities to build these Custom AI Agents or complex Product Engineering solutions, partnering with a specialized team can accelerate your roadmap. 4Geeks Teams offers the on-demand, senior engineering talent required to turn these architectural concepts into deployed, revenue-generating software.

Read more