Designing Reliable Multi-Agent Systems with LangGraph

April 1, 2025

Building a proof-of-concept multi-agent system takes an afternoon. Building one that runs reliably in production takes weeks of hard-won lessons.

This post covers the architecture patterns I've developed after shipping a multi-agent investment advisory system - what actually breaks, and how to design around it. The tooling is LangGraph, but the patterns are transferable.

Why Multi-Agent, and Why It's Hard

Single-agent systems hit a ceiling when a task requires genuinely different capabilities: fetching live data, running statistical analysis, and writing a structured report all in one pass. Decomposing into specialized agents each with a narrower scope - and the right tools - produces better outputs than cramming everything into one agent.

The problem is that multi-agent systems have more failure modes:

  • An agent calls a tool with malformed arguments
  • An agent loops because it can't detect it's done
  • State becomes inconsistent across agent handoffs
  • One slow agent blocks the entire pipeline

LangGraph's state machine model addresses most of these - but only if you design your graph carefully.

The State Schema Is Everything

Every agent in LangGraph shares a state object. This is the single most important design decision.

Define your state schema explicitly using TypedDict (or Pydantic). Don't be tempted to use a loose dict - you'll regret it.

from typing import TypedDict, Annotated, List
import operator

class InvestmentState(TypedDict):
    query: str
    ticker: str
    market_data: dict        # populated by DataAgent
    analysis: str            # populated by AnalysisAgent
    recommendation: str      # populated by ReasoningAgent
    messages: Annotated[List, operator.add]  # accumulated message history
    error: str | None        # error surface from any agent

The Annotated[List, operator.add] pattern for messages lets each node append without overwriting. This is critical for tracing execution and debugging failures.

Agent Decomposition: Narrow Responsibilities

Resist the temptation to build a "general" agent. Specialized agents with narrow scopes are more predictable and easier to test in isolation.

For the investment system, we have three agents:

DataAgent - fetches market data. Its only job is retrieving structured data given a ticker. It should not interpret or summarize.

def data_agent(state: InvestmentState) -> InvestmentState:
    ticker = state["ticker"]
    data = fetch_market_data(ticker)  # your data source
    return {"market_data": data}

AnalysisAgent - receives raw market data, applies statistical analysis, writes a structured analysis string. No data fetching, no recommendations.

ReasoningAgent - receives the structured analysis and writes a final recommendation. It knows nothing about the underlying data retrieval.

This decomposition means each agent has a clearly testable contract: given input X, produce output Y. If the AnalysisAgent breaks, you don't need to look at the DataAgent.

Graph Topology: Linear vs. Conditional

LangGraph lets you define explicit edges - which nodes can transition to which.

For our pipeline, a linear graph works fine as a starting point:

from langgraph.graph import StateGraph, END

builder = StateGraph(InvestmentState)

builder.add_node("data_agent", data_agent)
builder.add_node("analysis_agent", analysis_agent)
builder.add_node("reasoning_agent", reasoning_agent)

builder.set_entry_point("data_agent")
builder.add_edge("data_agent", "analysis_agent")
builder.add_edge("analysis_agent", "reasoning_agent")
builder.add_edge("reasoning_agent", END)

graph = builder.compile()

But linear graphs are fragile - any failure halts the pipeline. Add conditional edges with a router function to handle errors:

def route_from_data(state: InvestmentState) -> str:
    if state.get("error"):
        return END  # or a dedicated error-handling node
    return "analysis_agent"

builder.add_conditional_edges(
    "data_agent",
    route_from_data,
    {"analysis_agent": "analysis_agent", END: END}
)

Tool-Calling: Structured Outputs Are Non-Negotiable

Unstructured tool calls are the #1 source of agent failures I've seen. If a tool expects ticker: str and the agent passes ticker: "AAPL stock", you get a runtime error.

Enforce structured outputs using Pydantic:

from pydantic import BaseModel
from langchain_core.tools import tool

class MarketDataInput(BaseModel):
    ticker: str
    period: str = "1y"

@tool(args_schema=MarketDataInput)
def fetch_market_data(ticker: str, period: str = "1y") -> dict:
    """Fetch historical market data for a given ticker symbol."""
    # implementation
    ...

When you bind this tool to an OpenAI model with model.bind_tools([fetch_market_data]), the model is forced to produce a valid JSON payload matching your schema. Validation happens before execution - no silent failures.

Handling Failures Gracefully

The naive approach: let exceptions propagate and crash the graph. This is fine for development.

For production, catch errors at the node level and surface them in state:

def data_agent(state: InvestmentState) -> InvestmentState:
    try:
        data = fetch_market_data(state["ticker"])
        return {"market_data": data, "error": None}
    except Exception as e:
        return {"market_data": {}, "error": f"DataAgent failed: {str(e)}"}

Your router function then checks state["error"] and routes accordingly - either to a retry node, a fallback, or early termination with a user-facing error message. The LLM at the end of the pipeline can still produce a partial response (e.g., "I couldn't fetch live data - here's my analysis based on last known values") rather than a blank failure.

Observability

Complex graphs are opaque without logging. At minimum, log:

  • Which node is executing
  • Input state hash (for deduplication)
  • Output state diff
  • Execution time per node

LangSmith integrates directly with LangGraph and gives you a visual trace without extra instrumentation. For self-hosted setups, add a logging wrapper around each node:

import functools, logging, time

def traced_node(fn):
    @functools.wraps(fn)
    def wrapper(state):
        start = time.time()
        result = fn(state)
        logging.info(f"{fn.__name__} completed in {time.time()-start:.2f}s")
        return result
    return wrapper

@traced_node
def data_agent(state: InvestmentState) -> InvestmentState:
    ...

Key Takeaways

  • Design your state schema first - it's the contract between all agents
  • Keep agent responsibilities narrow - one agent, one clear job
  • Structured tool inputs with Pydantic - eliminate the largest class of runtime failures
  • Conditional edges + error state - route around failures instead of crashing
  • Observability is not optional - you need full trace visibility when something goes wrong in production

The gap between a demo multi-agent system and a production-reliable one is almost entirely in these operational details, not the underlying LLM capabilities.