Skip to content

LangChain vs CrewAI vs Raw SDK: Which AI Agent Framework in 2026?

16 min read3110 words

Every few months someone asks me: "Should I use LangChain or CrewAI?" And every time, my answer starts the same way: "What are you building?"

The AI agent framework landscape has shifted dramatically since 2024. Some tools that were essential then are now unnecessary. Others that barely existed have become production staples. The right choice depends entirely on your use case, team size, and how much abstraction you actually want.

I have built production agents with most of these frameworks. This post is the honest comparison I wish I had when I started — no marketing language, just code and tradeoffs.


What Changed from 2024 to 2026

The framework landscape in early 2024 was chaotic. LangChain was the default choice for everything, CrewAI was a promising experiment, and most teams were still figuring out if agents were production-ready at all.

Two years later, the picture is clearer:

Native tool calling killed simple chains. When OpenAI and Anthropic shipped native function/tool calling in their APIs, the primary value proposition of LangChain's chain abstraction evaporated. You no longer need a framework to connect an LLM to a tool — it is a first-class API feature.

LangGraph emerged as the real product. The LangChain team pivoted hard toward LangGraph, a graph-based state machine for agent workflows. This turned out to be the right bet. LangGraph solves a genuinely hard problem: managing complex, stateful multi-step agent workflows with branching, cycles, and human-in-the-loop patterns.

CrewAI found its niche. CrewAI's role-based approach works well for prototyping and for systems where you think in terms of "a team of agents." It is fast to set up but has real limitations at scale.

Vercel AI SDK became the default for web apps. If you are building a Next.js or React application with AI features, the Vercel AI SDK's streaming-first approach and React hooks are hard to beat.

Raw SDKs got better. The Anthropic and OpenAI SDKs now support tool calling, streaming, structured output, and even basic agent loops out of the box. For many use cases, you genuinely do not need anything else.

If you are new to AI Engineering, understanding these primitives before adopting a framework will save you significant debugging time later.


Framework-by-Framework Breakdown

1. Raw SDKs (Anthropic / OpenAI)

The simplest approach: call the API directly, handle tool calls in a loop, and manage state yourself.

When to use it: Single-agent systems, prototypes, when you want full control, when debugging matters more than development speed.

Here is a complete agent loop in about 20 lines of TypeScript:

import Anthropic from '@anthropic-ai/sdk';
 
const client = new Anthropic();
 
const tools: Anthropic.Tool[] = [
  {
    name: 'get_weather',
    description: 'Get current weather for a city',
    input_schema: {
      type: 'object',
      properties: { city: { type: 'string' } },
      required: ['city'],
    },
  },
];
 
async function agentLoop(userMessage: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: 'user', content: userMessage },
  ];
 
  while (true) {
    const response = await client.messages.create({
      model: 'claude-sonnet-4-6-20250514',
      max_tokens: 1024,
      tools,
      messages,
    });
 
    // If no tool use, we are done
    if (response.stop_reason === 'end_turn') {
      return response.content;
    }
 
    // Execute tool calls and feed results back
    messages.push({ role: 'assistant', content: response.content });
    const toolResults = response.content
      .filter((block): block is Anthropic.ToolUseBlock => block.type === 'tool_use')
      .map((toolUse) => ({
        type: 'tool_result' as const,
        tool_use_id: toolUse.id,
        content: executeTool(toolUse.name, toolUse.input),
      }));
 
    messages.push({ role: 'user', content: toolResults });
  }
}

That is a production-quality agent loop. No framework, no dependencies beyond the SDK. You control every aspect of the flow — retries, error handling, logging, token tracking. I covered the failure modes of loops like this in production agent architecture lessons.

The Python equivalent is equally concise:

import anthropic
 
client = anthropic.Anthropic()
 
def agent_loop(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    tools = [{"name": "get_weather", "description": "Get weather for a city",
              "input_schema": {"type": "object", "properties": {"city": {"type": "string"}}}}]
 
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6-20250514", max_tokens=1024,
            tools=tools, messages=messages,
        )
        if response.stop_reason == "end_turn":
            return response.content[0].text
 
        messages.append({"role": "assistant", "content": response.content})
        tool_results = [
            {"type": "tool_result", "tool_use_id": block.id,
             "content": execute_tool(block.name, block.input)}
            for block in response.content if block.type == "tool_use"
        ]
        messages.append({"role": "user", "content": tool_results})

Pros: Full control, minimal dependencies, easy to debug, no framework lock-in, smallest possible surface area for bugs.

Cons: You build everything yourself — state persistence, multi-agent coordination, complex branching. That is fine for simple agents and becomes painful for complex workflows.


2. LangGraph

LangGraph models agent workflows as directed graphs. Nodes are functions (agent steps), edges define transitions, and state flows through the graph. It is the most powerful framework for complex, stateful workflows.

When to use it: Multi-agent systems, complex branching logic, human-in-the-loop workflows, production systems that need checkpointing and replay.

Here is a simple two-node graph — a researcher that gathers information and a writer that produces output:

from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from typing import TypedDict, Annotated
from langchain_anthropic import ChatAnthropic
import operator
 
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    research_complete: bool
 
model = ChatAnthropic(model="claude-sonnet-4-6-20250514")
 
def researcher(state: AgentState) -> AgentState:
    """Gather information using tools."""
    response = model.invoke(
        [{"role": "system", "content": "You are a research agent. Find relevant facts."}]
        + state["messages"]
    )
    return {"messages": [response], "research_complete": True}
 
def writer(state: AgentState) -> AgentState:
    """Produce final output from research."""
    response = model.invoke(
        [{"role": "system", "content": "You are a writer. Summarize the research clearly."}]
        + state["messages"]
    )
    return {"messages": [response]}
 
def should_write(state: AgentState) -> str:
    return "writer" if state.get("research_complete") else "researcher"
 
# Build the graph
graph = StateGraph(AgentState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_edge(START, "researcher")
graph.add_conditional_edges("researcher", should_write)
graph.add_edge("writer", END)
 
app = graph.compile()
result = app.invoke({"messages": [{"role": "user", "content": "Write about quantum computing"}]})

The graph-based approach shines when workflows get complex. Adding a "reviewer" node that can loop back to the writer, or a "human approval" gate before publishing, is just adding nodes and edges. LangGraph also supports checkpointing — you can persist state, resume interrupted workflows, and replay for debugging.

Pros: Excellent for complex workflows, built-in state management and checkpointing, clear visual representation of flow, strong typing with TypedDict, good production tooling with LangSmith.

Cons: Steep learning curve, the graph abstraction can feel over-engineered for simple agents, Python-first (TypeScript support exists but is less mature), heavy dependency tree.


3. CrewAI

CrewAI takes a role-based approach: you define agents with roles, goals, and backstories, then organize them into a crew that executes tasks.

When to use it: Rapid prototyping, role-based multi-agent systems, when your mental model naturally maps to "a team of specialists."

from crewai import Agent, Task, Crew
 
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive, accurate information about the given topic",
    backstory="You are an experienced research analyst with 15 years in technology research.",
    tools=[search_tool, scrape_tool],
    llm="anthropic/claude-sonnet-4-6-20250514",
    verbose=True,
)
 
writer = Agent(
    role="Technical Writer",
    goal="Transform research into clear, engaging technical content",
    backstory="You are a technical writer who makes complex topics accessible.",
    llm="anthropic/claude-sonnet-4-6-20250514",
    verbose=True,
)
 
research_task = Task(
    description="Research the current state of quantum computing in 2026.",
    expected_output="A detailed research report with key findings and sources.",
    agent=researcher,
)
 
writing_task = Task(
    description="Write a blog post based on the research findings.",
    expected_output="A 1000-word blog post suitable for a technical audience.",
    agent=writer,
    context=[research_task],
)
 
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    verbose=True,
)
 
result = crew.kickoff()

CrewAI's API is intuitive — you think in terms of people and tasks, which maps well to how many teams reason about their AI workflows. The "backstory" field is essentially a character prompt that shapes agent behavior.

Pros: Fastest time-to-prototype for multi-agent systems, intuitive role-based mental model, good built-in tool library, active community.

Cons: Limited control over execution flow, debugging can be opaque, the role/backstory pattern adds token overhead, less suited for complex conditional workflows, production hardening requires significant customization.

For solo developers looking to build AI-powered workflows, I covered practical patterns that complement CrewAI in building agentic workflows as a solo developer.


4. AutoGen (Microsoft)

AutoGen models agents as participants in a conversation. Agents send messages to each other, and the conversation protocol drives the workflow. Microsoft has invested heavily in AutoGen 0.4+, which introduced a more modular, event-driven architecture.

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_ext.models.openai import OpenAIChatCompletionClient
 
model_client = OpenAIChatCompletionClient(model="gpt-4o")
 
researcher = AssistantAgent(
    name="researcher",
    model_client=model_client,
    system_message="You research topics thoroughly and share your findings.",
)
 
writer = AssistantAgent(
    name="writer",
    model_client=model_client,
    system_message="You write clear technical content based on research findings.",
)
 
team = RoundRobinGroupChat([researcher, writer], max_turns=4)
result = await team.run(task="Write about quantum computing advances in 2026")

AutoGen is strong for research-oriented use cases and conversational agent systems. The 0.4 rewrite improved modularity significantly. However, the API has changed multiple times, and the ecosystem is less stable than LangGraph.

Pros: Strong conversational multi-agent patterns, good for research workflows, Microsoft backing, supports code execution sandboxes.

Cons: API instability between versions, steeper learning curve than CrewAI, smaller community than LangGraph, documentation can lag behind code changes.


5. Vercel AI SDK

The Vercel AI SDK is designed for web applications. It provides streaming-first primitives, React hooks, and server actions that make building AI-powered UIs straightforward.

When to use it: Next.js or React applications with AI features, chat interfaces, any web app where you need streaming responses and tool calling with a good UI experience.

import { generateText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { z } from 'zod';
 
const result = await generateText({
  model: anthropic('claude-sonnet-4-6-20250514'),
  tools: {
    getWeather: tool({
      description: 'Get the weather for a city',
      parameters: z.object({
        city: z.string().describe('The city to get weather for'),
      }),
      execute: async ({ city }) => {
        // Call weather API
        return { temperature: 72, condition: 'sunny', city };
      },
    }),
  },
  maxSteps: 5, // Allow up to 5 tool-calling rounds
  prompt: 'What is the weather in London and Tokyo?',
});

The maxSteps parameter is key — it turns a single generateText call into an agent loop that automatically handles tool calls for up to N rounds. On the frontend, the useChat hook handles streaming, message history, and tool call rendering:

// React component
import { useChat } from '@ai-sdk/react';
 
export function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat',
    maxSteps: 5,
  });
 
  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}

Pros: Best-in-class streaming support, excellent React/Next.js integration, provider-agnostic (swap between Anthropic, OpenAI, Google with one line), strong TypeScript types, the maxSteps agent loop is simple and effective.

Cons: Web-focused (not ideal for backend-only agents), limited multi-agent orchestration, less control over the agent loop internals, tied to the Vercel ecosystem for the best experience.


Head-to-Head Comparison

DimensionRaw SDKLangGraphCrewAIAutoGenVercel AI SDK
ArchitectureImperative loopState machine graphRole-based crewsConversationalStreaming primitives
Learning curveLow (just the API)High (graph concepts)Low-mediumMedium-highLow (if you know React)
Production readinessHigh (you own it)HighMediumMediumHigh (for web apps)
Tool callingNative API supportBuilt-in + customBuilt-in libraryBuilt-inZod-typed tools
Multi-agentDIYExcellentGood (role-based)Good (conversational)Limited
Community sizeMassive (SDK users)LargeMedium-largeMediumLarge (web devs)
Token efficiencyBest (no overhead)GoodLower (backstory tokens)ModerateGood
DebuggingFull controlLangSmith integrationVerbose logsBasic loggingDev tools integration
Language supportTS + PythonPython-first, TS growingPythonPythonTypeScript-first
Best forSimple agents, full controlComplex stateful workflowsQuick multi-agent prototypesResearch, conversation agentsWeb apps with AI

Latency and Token Overhead

This matters more than most comparisons acknowledge. Every framework adds some overhead:

FrameworkExtra tokens per requestTypical latency overhead
Raw SDK00ms
LangGraph~50-100 (state serialization prompts)10-30ms (graph routing)
CrewAI~200-500 (role/backstory per agent)20-50ms (crew coordination)
AutoGen~100-300 (conversation protocol)15-40ms (message routing)
Vercel AI SDK~10-20 (tool schema formatting)5-10ms (middleware)

For a single request, these differences are negligible. At scale — thousands of agent invocations per hour — token overhead compounds. CrewAI's backstory pattern, for instance, adds 200-500 tokens per agent per turn. With four agents running ten turns, that is 8,000-20,000 extra tokens per workflow execution.


Decision Framework

Here is how I think about framework selection. Start from your requirements and follow the path:

Step 1: How many agents do you need?

  • Single agent with tools --> Raw SDK or Vercel AI SDK
  • Multiple agents coordinating --> Go to Step 2

Step 2: How complex is the workflow?

  • Linear pipeline (Agent A then Agent B then Agent C) --> CrewAI
  • Complex branching with conditionals and loops --> LangGraph
  • Conversational (agents discuss and refine) --> AutoGen

Step 3: What is your runtime?

  • Web application with UI --> Vercel AI SDK (even for multi-step tool calling)
  • Backend service or CLI --> Raw SDK, LangGraph, or CrewAI
  • Research/experimentation --> CrewAI or AutoGen

Step 4: What is your team's experience?

  • Python team --> LangGraph or CrewAI
  • TypeScript/React team --> Vercel AI SDK or Raw SDK
  • Mixed --> Raw SDK (both languages are well-supported)

My Recommendation by Use Case

Solo developer building a product

Start with the raw SDK. You will learn faster, debug easier, and avoid framework-specific gotchas. If you need multi-agent capabilities later, you can add them incrementally. CrewAI is a reasonable second choice if you want multi-agent out of the box and are comfortable with Python.

Enterprise with complex workflows

LangGraph. The graph-based state machine is the right abstraction for workflows with approval gates, conditional branching, human-in-the-loop steps, and error recovery. The checkpointing system means you can resume failed workflows without re-running everything. LangSmith provides the observability layer enterprises need.

Web application with AI features

Vercel AI SDK. The streaming primitives, React hooks, and provider-agnostic design make it the obvious choice. You get a good agent loop with maxSteps, typed tools with Zod, and the frontend integration is seamless. You can always drop down to the raw SDK for backend-only agent tasks.

Research and experimentation

CrewAI or AutoGen. CrewAI gets you a working multi-agent system in minutes. AutoGen is better if your agents need to have extended conversations to refine output. Both are good for exploring ideas before committing to a production architecture.

High-performance, high-volume systems

Raw SDK with custom orchestration. When you are processing thousands of agent requests per hour and every token counts, framework overhead is real. Build exactly what you need, optimize the parts that matter, and keep the dependency tree minimal.


The Build vs. Buy Framework Tradeoff

The "should I use a framework" question is really a build-vs-buy decision. Here is how I frame it:

Build (raw SDK) when:

  • Your agent workflow is straightforward (single agent, linear pipeline)
  • Debugging and observability are critical
  • You need to minimize dependencies in production
  • Your team has strong engineering fundamentals
  • Token cost is a primary concern

Buy (framework) when:

  • You need multi-agent orchestration with complex state
  • Time-to-prototype matters more than long-term maintenance
  • The framework's abstractions match your mental model
  • You want built-in integrations (tools, memory, vector stores)
  • Your team is less experienced with LLM APIs

The worst outcome is adopting a framework that does not match your needs and fighting its abstractions. I have seen teams spend weeks debugging LangChain chains when the equivalent raw SDK code would have been 50 lines. I have also seen teams build their own multi-agent orchestrator from scratch when LangGraph would have saved them months.

The framework should solve a problem you actually have, not a problem you think you might have.


What I Actually Use

Here is my honest current stack:

For most projects: raw Anthropic SDK. The majority of what I build is single-agent systems with tool calling. The agent loop pattern I showed above handles 80% of my use cases. I know every line of code, I can debug it easily, and there are no surprises.

For complex multi-agent workflows: LangGraph. When I need agents that coordinate, branch, loop, and checkpoint, LangGraph is the right tool. The graph abstraction clicks once you internalize it, and the production features (persistence, streaming, human-in-the-loop) are genuinely useful.

For web applications: Vercel AI SDK. When I am building a Next.js app with chat or AI features, the Vercel AI SDK's React hooks and streaming primitives are exactly right. I do not want to build my own streaming infrastructure.

What I stopped using: LangChain's core chain abstractions. The original chain pattern — LLMChain, SequentialChain, TransformChain — is no longer necessary for new projects. Native tool calling in the LLM APIs does what chains used to do, with less abstraction and better control. LangGraph is the valuable part of the LangChain ecosystem now.

What I use for experimentation: CrewAI. When I want to quickly test a multi-agent idea — "would a researcher + analyst + writer crew produce better results?" — CrewAI gets me there in 20 minutes. It is not what I deploy to production, but it is excellent for validation.


Where This Is Heading

The framework landscape is consolidating. I expect three things to happen by the end of 2026:

  1. Raw SDKs will absorb more agent primitives. Anthropic's SDK already supports basic agent loops. Expect built-in multi-turn tool calling, simple orchestration patterns, and better streaming. Frameworks will need to provide more differentiated value.

  2. LangGraph will become the de facto standard for complex agent orchestration. Its graph-based approach is the right level of abstraction for production multi-agent systems. The competitor to watch is the Anthropic Agent SDK, which could simplify things further.

  3. Framework-agnostic tooling will grow. Observability tools like Langfuse, evaluation frameworks, and tool registries will work across frameworks. The choice of agent framework will matter less as the ecosystem matures.

Pick the tool that matches your current problem. Do not over-engineer. Start simple, add complexity when you need it, and always make sure you understand what is happening under the framework's abstractions.

The best framework is the one that disappears — letting you focus on the problem you are actually solving.