Designing Reliable Multi-Agent Systems with LangGraph

Building a proof-of-concept multi-agent system takes an afternoon. Building one that runs reliably in production takes weeks of hard-won lessons.

This post covers the architecture patterns I've developed after shipping a multi-agent investment advisory system - what actually breaks, and how to design around it. The tooling is LangGraph, but the patterns are transferable.

Why Multi-Agent, and Why It's Hard

Single-agent systems hit a ceiling when a task requires genuinely different capabilities: fetching live data, running statistical analysis, and writing a structured report all in one pass. Decomposing into specialized agents each with a narrower scope - and the right tools - produces better outputs than cramming everything into one agent.

The problem is that multi-agent systems have more failure modes:

An agent calls a tool with malformed arguments
An agent loops because it can't detect it's done
State becomes inconsistent across agent handoffs
One slow agent blocks the entire pipeline

LangGraph's state machine model addresses most of these - but only if you design your graph carefully.

The State Schema Is Everything

Every agent in LangGraph shares a state object. This is the single most important design decision.

Define your state schema explicitly using TypedDict (or Pydantic). Don't be tempted to use a loose dict - you'll regret it.

from typing import TypedDict, Annotated, List
import operator

class InvestmentState(TypedDict):
    query: str
    ticker: str
    market_data: dict        # populated by DataAgent
    analysis: str            # populated by AnalysisAgent
    recommendation: str      # populated by ReasoningAgent
    messages: Annotated[List, operator.add]  # accumulated message history
    error: str | None        # error surface from any agent

The Annotated[List, operator.add] pattern for messages lets each node append without overwriting. This is critical for tracing execution and debugging failures.

Agent Decomposition: Narrow Responsibilities

Resist the temptation to build a "general" agent. Specialized agents with narrow scopes are more predictable and easier to test in isolation.

For the investment system, we have three agents:

DataAgent - fetches market data. Its only job is retrieving structured data given a ticker. It should not interpret or summarize.

def data_agent(state: InvestmentState) -> InvestmentState:
    ticker = state["ticker"]
    data = fetch_market_data(ticker)  # your data source
    return {"market_data": data}

AnalysisAgent - receives raw market data, applies statistical analysis, writes a structured analysis string. No data fetching, no recommendations.

ReasoningAgent - receives the structured analysis and writes a final recommendation. It knows nothing about the underlying data retrieval.

This decomposition means each agent has a clearly testable contract: given input X, produce output Y. If the AnalysisAgent breaks, you don't need to look at the DataAgent.

Graph Topology: Linear vs. Conditional

LangGraph lets you define explicit edges - which nodes can transition to which.

For our pipeline, a linear graph works fine as a starting point:

from langgraph.graph import StateGraph, END

builder = StateGraph(InvestmentState)

builder.add_node("data_agent", data_agent)
builder.add_node("analysis_agent", analysis_agent)
builder.add_node("reasoning_agent", reasoning_agent)

builder.set_entry_point("data_agent")
builder.add_edge("data_agent", "analysis_agent")
builder.add_edge("analysis_agent", "reasoning_agent")
builder.add_edge("reasoning_agent", END)

graph = builder.compile()

But linear graphs are fragile - any failure halts the pipeline. Add conditional edges with a router function to handle errors:

def route_from_data(state: InvestmentState) -> str:
    if state.get("error"):
        return END  # or a dedicated error-handling node
    return "analysis_agent"

builder.add_conditional_edges(
    "data_agent",
    route_from_data,
    {"analysis_agent": "analysis_agent", END: END}
)

Tool-Calling: Structured Outputs Are Non-Negotiable

Unstructured tool calls are the #1 source of agent failures I've seen. If a tool expects ticker: str and the agent passes ticker: "AAPL stock", you get a runtime error.

Enforce structured outputs using Pydantic:

from pydantic import BaseModel
from langchain_core.tools import tool

class MarketDataInput(BaseModel):
    ticker: str
    period: str = "1y"

@tool(args_schema=MarketDataInput)
def fetch_market_data(ticker: str, period: str = "1y") -> dict:
    """Fetch historical market data for a given ticker symbol."""
    # implementation
    ...

When you bind this tool to an OpenAI model with model.bind_tools([fetch_market_data]), the model is forced to produce a valid JSON payload matching your schema. Validation happens before execution - no silent failures.

Handling Failures Gracefully

The naive approach: let exceptions propagate and crash the graph. This is fine for development.

For production, catch errors at the node level and surface them in state:

def data_agent(state: InvestmentState) -> InvestmentState:
    try:
        data = fetch_market_data(state["ticker"])
        return {"market_data": data, "error": None}
    except Exception as e:
        return {"market_data": {}, "error": f"DataAgent failed: {str(e)}"}

Your router function then checks state["error"] and routes accordingly - either to a retry node, a fallback, or early termination with a user-facing error message. The LLM at the end of the pipeline can still produce a partial response (e.g., "I couldn't fetch live data - here's my analysis based on last known values") rather than a blank failure.

Observability

Complex graphs are opaque without logging. At minimum, log:

Which node is executing
Input state hash (for deduplication)
Output state diff
Execution time per node

LangSmith integrates directly with LangGraph and gives you a visual trace without extra instrumentation. For self-hosted setups, add a logging wrapper around each node:

import functools, logging, time

def traced_node(fn):
    @functools.wraps(fn)
    def wrapper(state):
        start = time.time()
        result = fn(state)
        logging.info(f"{fn.__name__} completed in {time.time()-start:.2f}s")
        return result
    return wrapper

@traced_node
def data_agent(state: InvestmentState) -> InvestmentState:
    ...

Key Takeaways

Design your state schema first - it's the contract between all agents
Keep agent responsibilities narrow - one agent, one clear job
Structured tool inputs with Pydantic - eliminate the largest class of runtime failures
Conditional edges + error state - route around failures instead of crashing
Observability is not optional - you need full trace visibility when something goes wrong in production

The gap between a demo multi-agent system and a production-reliable one is almost entirely in these operational details, not the underlying LLM capabilities.