Building a proof-of-concept multi-agent system takes an afternoon. Building one that runs reliably in production takes weeks of hard-won lessons.
This post covers the architecture patterns I've developed after shipping a multi-agent investment advisory system - what actually breaks, and how to design around it. The tooling is LangGraph, but the patterns are transferable.
Why Multi-Agent, and Why It's Hard
Single-agent systems hit a ceiling when a task requires genuinely different capabilities: fetching live data, running statistical analysis, and writing a structured report all in one pass. Decomposing into specialized agents each with a narrower scope - and the right tools - produces better outputs than cramming everything into one agent.
The problem is that multi-agent systems have more failure modes:
- An agent calls a tool with malformed arguments
- An agent loops because it can't detect it's done
- State becomes inconsistent across agent handoffs
- One slow agent blocks the entire pipeline
LangGraph's state machine model addresses most of these - but only if you design your graph carefully.
The State Schema Is Everything
Every agent in LangGraph shares a state object. This is the single most important design decision.
Define your state schema explicitly using TypedDict (or Pydantic). Don't be tempted to use a loose dict - you'll regret it.
from typing import TypedDict, Annotated, List
import operator
class InvestmentState(TypedDict):
query: str
ticker: str
market_data: dict # populated by DataAgent
analysis: str # populated by AnalysisAgent
recommendation: str # populated by ReasoningAgent
messages: Annotated[List, operator.add] # accumulated message history
error: str | None # error surface from any agent
The Annotated[List, operator.add] pattern for messages lets each node append without overwriting. This is critical for tracing execution and debugging failures.
Agent Decomposition: Narrow Responsibilities
Resist the temptation to build a "general" agent. Specialized agents with narrow scopes are more predictable and easier to test in isolation.
For the investment system, we have three agents:
DataAgent - fetches market data. Its only job is retrieving structured data given a ticker. It should not interpret or summarize.
def data_agent(state: InvestmentState) -> InvestmentState:
ticker = state["ticker"]
data = fetch_market_data(ticker) # your data source
return {"market_data": data}
AnalysisAgent - receives raw market data, applies statistical analysis, writes a structured analysis string. No data fetching, no recommendations.
ReasoningAgent - receives the structured analysis and writes a final recommendation. It knows nothing about the underlying data retrieval.
This decomposition means each agent has a clearly testable contract: given input X, produce output Y. If the AnalysisAgent breaks, you don't need to look at the DataAgent.
Graph Topology: Linear vs. Conditional
LangGraph lets you define explicit edges - which nodes can transition to which.
For our pipeline, a linear graph works fine as a starting point:
from langgraph.graph import StateGraph, END
builder = StateGraph(InvestmentState)
builder.add_node("data_agent", data_agent)
builder.add_node("analysis_agent", analysis_agent)
builder.add_node("reasoning_agent", reasoning_agent)
builder.set_entry_point("data_agent")
builder.add_edge("data_agent", "analysis_agent")
builder.add_edge("analysis_agent", "reasoning_agent")
builder.add_edge("reasoning_agent", END)
graph = builder.compile()
But linear graphs are fragile - any failure halts the pipeline. Add conditional edges with a router function to handle errors:
def route_from_data(state: InvestmentState) -> str:
if state.get("error"):
return END # or a dedicated error-handling node
return "analysis_agent"
builder.add_conditional_edges(
"data_agent",
route_from_data,
{"analysis_agent": "analysis_agent", END: END}
)
Tool-Calling: Structured Outputs Are Non-Negotiable
Unstructured tool calls are the #1 source of agent failures I've seen. If a tool expects ticker: str and the agent passes ticker: "AAPL stock", you get a runtime error.
Enforce structured outputs using Pydantic:
from pydantic import BaseModel
from langchain_core.tools import tool
class MarketDataInput(BaseModel):
ticker: str
period: str = "1y"
@tool(args_schema=MarketDataInput)
def fetch_market_data(ticker: str, period: str = "1y") -> dict:
"""Fetch historical market data for a given ticker symbol."""
# implementation
...
When you bind this tool to an OpenAI model with model.bind_tools([fetch_market_data]), the model is forced to produce a valid JSON payload matching your schema. Validation happens before execution - no silent failures.
Handling Failures Gracefully
The naive approach: let exceptions propagate and crash the graph. This is fine for development.
For production, catch errors at the node level and surface them in state:
def data_agent(state: InvestmentState) -> InvestmentState:
try:
data = fetch_market_data(state["ticker"])
return {"market_data": data, "error": None}
except Exception as e:
return {"market_data": {}, "error": f"DataAgent failed: {str(e)}"}
Your router function then checks state["error"] and routes accordingly - either to a retry node, a fallback, or early termination with a user-facing error message. The LLM at the end of the pipeline can still produce a partial response (e.g., "I couldn't fetch live data - here's my analysis based on last known values") rather than a blank failure.
Observability
Complex graphs are opaque without logging. At minimum, log:
- Which node is executing
- Input state hash (for deduplication)
- Output state diff
- Execution time per node
LangSmith integrates directly with LangGraph and gives you a visual trace without extra instrumentation. For self-hosted setups, add a logging wrapper around each node:
import functools, logging, time
def traced_node(fn):
@functools.wraps(fn)
def wrapper(state):
start = time.time()
result = fn(state)
logging.info(f"{fn.__name__} completed in {time.time()-start:.2f}s")
return result
return wrapper
@traced_node
def data_agent(state: InvestmentState) -> InvestmentState:
...
Key Takeaways
- Design your state schema first - it's the contract between all agents
- Keep agent responsibilities narrow - one agent, one clear job
- Structured tool inputs with Pydantic - eliminate the largest class of runtime failures
- Conditional edges + error state - route around failures instead of crashing
- Observability is not optional - you need full trace visibility when something goes wrong in production
The gap between a demo multi-agent system and a production-reliable one is almost entirely in these operational details, not the underlying LLM capabilities.