Skip to content

AI Agent vs Chatbot: What's Actually Different (With Code Examples)

13 min read2498 words

Every product launch in 2026 seems to include the phrase "AI agent." Customer support chatbot? Agent. Form that auto-fills fields? Agent. A wrapper around a single API call to Claude? Apparently also an agent.

The term has lost all meaning in marketing. But in engineering, the distinction between a chatbot and an agent is precise, structural, and matters for how you architect your system. I have built both in production, and the difference comes down to one thing: who controls the loop.

This post walks through four patterns — from a basic chatbot to a full agent — with runnable TypeScript code for each. By the end, you will know exactly where your system sits on the spectrum and whether you actually need an agent.


The Four Patterns

Here is the progression, from simplest to most complex:

  1. Chatbot — single request, single response
  2. Chatbot + RAG — retrieve context, then respond
  3. Tool-calling chatbot — LLM can invoke functions, but the human drives each turn
  4. Agent — autonomous loop where the LLM decides what to do next

Each builds on the previous. Let me show you the code for each.


Pattern 1: The Chatbot

A chatbot is a single request-response cycle. The user sends a message, the LLM responds. No tools, no retrieval, no loops.

import Anthropic from '@anthropic-ai/sdk';
 
const client = new Anthropic();
 
async function chatbot(userMessage: string): Promise<string> {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6-20250514',
    max_tokens: 1024,
    system: 'You are a helpful customer support assistant for Acme Corp.',
    messages: [
      { role: 'user', content: userMessage }
    ]
  });
 
  return response.content[0].type === 'text'
    ? response.content[0].text
    : '';
}
 
// Usage
const answer = await chatbot('What is your return policy?');
console.log(answer);

That is it. One API call. The model answers from its training data and the system prompt. If the answer is wrong or outdated, there is no mechanism to fix it — the model only knows what it was trained on plus whatever you put in the system prompt.

When this is enough: FAQ bots, content generation, brainstorming, simple Q&A where the system prompt can contain all necessary context. You would be surprised how many "AI agent" products are actually this.


Pattern 2: Chatbot + RAG

The chatbot's knowledge is limited to its training data. RAG (Retrieval-Augmented Generation) fixes this by fetching relevant documents before the LLM responds.

import Anthropic from '@anthropic-ai/sdk';
 
interface Document {
  content: string;
  source: string;
  score: number;
}
 
async function searchKnowledgeBase(query: string): Promise<Document[]> {
  // In production: vector search against your embeddings database
  // e.g., Pinecone, Weaviate, pgvector
  const results = await vectorDb.search({
    query: await embed(query),
    topK: 5,
    minScore: 0.7
  });
  return results;
}
 
async function chatbotWithRAG(userMessage: string): Promise<string> {
  // Step 1: Retrieve relevant context
  const docs = await searchKnowledgeBase(userMessage);
  const context = docs
    .map(d => `[Source: ${d.source}]\n${d.content}`)
    .join('\n\n---\n\n');
 
  // Step 2: Generate response with context
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6-20250514',
    max_tokens: 1024,
    system: `You are a customer support assistant. Answer based on the provided context.
If the context does not contain the answer, say so — do not make things up.
 
Context:
${context}`,
    messages: [
      { role: 'user', content: userMessage }
    ]
  });
 
  return response.content[0].type === 'text'
    ? response.content[0].text
    : '';
}

The flow is still linear: retrieve, then respond. The LLM does not decide what to retrieve or whether to retrieve at all — your code always runs the search step. This is a critical distinction. The retrieval logic is hard-coded, not model-driven.

When this is enough: Knowledge bases, documentation search, customer support with a known corpus, any case where "look up relevant info then answer" is the complete workflow.


Pattern 3: Tool-Calling Chatbot

Now things get interesting. Instead of hard-coding the retrieval step, we give the LLM tools and let it decide which ones to use. For a deep dive on making tool calls reliable, see my post on tool calling patterns for reliable AI agents.

import Anthropic from '@anthropic-ai/sdk';
 
const tools: Anthropic.Tool[] = [
  {
    name: 'search_orders',
    description: 'Search for customer orders by email or order ID',
    input_schema: {
      type: 'object' as const,
      properties: {
        email: { type: 'string', description: 'Customer email address' },
        order_id: { type: 'string', description: 'Order ID' }
      },
      required: []
    }
  },
  {
    name: 'check_inventory',
    description: 'Check current inventory for a product SKU',
    input_schema: {
      type: 'object' as const,
      properties: {
        sku: { type: 'string', description: 'Product SKU' }
      },
      required: ['sku']
    }
  }
];
 
async function executeTool(
  name: string,
  input: Record<string, unknown>
): Promise<string> {
  switch (name) {
    case 'search_orders':
      return JSON.stringify(await db.orders.search(input));
    case 'check_inventory':
      return JSON.stringify(await db.inventory.check(input.sku as string));
    default:
      return JSON.stringify({ error: `Unknown tool: ${name}` });
  }
}
 
async function toolCallingChatbot(userMessage: string): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: 'user', content: userMessage }
  ];
 
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6-20250514',
    max_tokens: 1024,
    tools,
    messages
  });
 
  // If the model wants to use a tool, execute it and return result
  if (response.stop_reason === 'tool_use') {
    const toolUse = response.content.find(
      (block): block is Anthropic.ToolUseBlock => block.type === 'tool_use'
    );
 
    if (toolUse) {
      const result = await executeTool(
        toolUse.name,
        toolUse.input as Record<string, unknown>
      );
 
      // Send tool result back to get the final response
      messages.push({ role: 'assistant', content: response.content });
      messages.push({
        role: 'user',
        content: [{
          type: 'tool_result',
          tool_use_id: toolUse.id,
          content: result
        }]
      });
 
      const finalResponse = await client.messages.create({
        model: 'claude-sonnet-4-6-20250514',
        max_tokens: 1024,
        tools,
        messages
      });
 
      const textBlock = finalResponse.content.find(
        (block): block is Anthropic.TextBlock => block.type === 'text'
      );
      return textBlock?.text ?? '';
    }
  }
 
  const textBlock = response.content.find(
    (block): block is Anthropic.TextBlock => block.type === 'text'
  );
  return textBlock?.text ?? '';
}

The model now decides whether to call a tool and which tool to call. But notice: this code handles exactly one tool call. If the model needs to check inventory and then search orders, it cannot do both in sequence without the user sending another message. The human is still driving the loop.

When this is enough: Most chatbot products. If the user is present and can send follow-up messages, a tool-calling chatbot covers a wide range of use cases without the complexity of a full agent.


Pattern 4: The Agent

An agent is a loop. The LLM calls tools, observes results, decides what to do next, and continues until the task is complete — without waiting for human input between steps.

This is the structural difference. Not the tools. Not the model. The loop.

import Anthropic from '@anthropic-ai/sdk';
 
interface AgentConfig {
  maxIterations: number;
  tools: Anthropic.Tool[];
  system: string;
}
 
async function agent(
  task: string,
  config: AgentConfig
): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: 'user', content: task }
  ];
 
  for (let i = 0; i < config.maxIterations; i++) {
    const response = await client.messages.create({
      model: 'claude-sonnet-4-6-20250514',
      max_tokens: 4096,
      system: config.system,
      tools: config.tools,
      messages
    });
 
    // If the model is done (no more tool calls), return the response
    if (response.stop_reason === 'end_turn') {
      const textBlock = response.content.find(
        (block): block is Anthropic.TextBlock => block.type === 'text'
      );
      return textBlock?.text ?? '';
    }
 
    // Otherwise, execute all tool calls and continue the loop
    if (response.stop_reason === 'tool_use') {
      messages.push({ role: 'assistant', content: response.content });
 
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
 
      for (const block of response.content) {
        if (block.type === 'tool_use') {
          const result = await executeTool(
            block.name,
            block.input as Record<string, unknown>
          );
          toolResults.push({
            type: 'tool_result',
            tool_use_id: block.id,
            content: result
          });
        }
      }
 
      messages.push({ role: 'user', content: toolResults });
    }
  }
 
  return 'Agent reached maximum iterations without completing the task.';
}
 
// Usage: the agent autonomously handles a multi-step task
const result = await agent(
  'Find the order for customer jane@example.com, check if all items are in stock, and draft a shipping update email.',
  {
    maxIterations: 10,
    tools: [...orderTools, ...inventoryTools, ...emailTools],
    system: 'You are a fulfillment agent. Complete the task using the available tools.'
  }
);

Look at what changed. The for loop is the entire difference. The model keeps running — calling tools, observing results, deciding what to do next — until it either finishes the task or hits the iteration limit. No human in the loop between steps.

That single for loop is why agents are both powerful and dangerous. Powerful because they can handle multi-step workflows autonomously. Dangerous because a confused model can burn through API calls, take incorrect actions, or loop forever. For lessons on what goes wrong in production, see my post on production agent architecture failures and lessons.


Side-by-Side Comparison

DimensionChatbotChatbot + RAGTool-Calling ChatbotAgent
Control flowLinearLinearSingle tool turnAutonomous loop
Who drives the loopHumanHumanHumanThe LLM
External data accessNone (system prompt only)Retrieval (hard-coded)Tools (model-chosen)Tools (model-chosen)
Multi-step reasoningNoNoAcross human turnsYes, autonomous
State changesRead-onlyRead-onlyPer human turnAutonomous writes
Failure blast radiusOne bad responseOne bad responseOne bad tool callCascading failures
Token cost per interactionLow (1 API call)Medium (1 call + retrieval)Medium (2-3 calls)High (5-20+ calls)
Implementation complexityTrivialLowMediumHigh
Observability needsBasic loggingRetrieval metricsTool call trackingFull trace + cost monitoring
Testing approachPrompt evalsRetrieval accuracy + evalsTool execution testsEnd-to-end scenario tests

When to Use Which: A Decision Framework

I use this mental model when starting a new AI feature:

Start with the simplest pattern that works. Then upgrade only when you hit a limitation.

Use a chatbot when:

  • All necessary context fits in the system prompt
  • The task is single-turn (ask a question, get an answer)
  • Incorrect answers are low-stakes (brainstorming, drafts, suggestions)
  • You want low latency and low cost

Use RAG when:

  • You have a knowledge base that changes frequently
  • Answers need to be grounded in specific documents
  • The context is too large for a system prompt
  • You need source attribution

Use tool-calling when:

  • The model needs access to live data (databases, APIs)
  • Different user queries require different data sources
  • The user is present and can guide multi-step workflows
  • You want model-driven data access but human-driven control flow

Use an agent when:

  • The task requires multiple dependent steps
  • The model needs to observe results before deciding next steps
  • A human would need to do research, analysis, and synthesis
  • The workflow involves state changes (creating records, sending emails)
  • The user gives a goal, not step-by-step instructions

Real examples:

  • "Summarize this document" — Chatbot. One input, one output.
  • "Answer questions about our product docs" — RAG. Needs retrieval, but the flow is linear.
  • "Look up this customer's order status" — Tool-calling chatbot. Needs live data, but it is one step.
  • "Process this refund: find the order, verify the return, update inventory, notify the customer, and send a confirmation email" — Agent. Multiple dependent steps with state changes.

The Spectrum

In practice, most production systems are not cleanly one pattern. They sit on a spectrum.

The most common architecture I see in production is what I call a "bounded agent" — a system with the agent loop but with strict guardrails:

  • Maximum iteration limits (usually 5-15 steps)
  • Tool-level permissions (read-only tools vs. write tools with confirmation)
  • Cost caps per execution
  • Human-in-the-loop checkpoints for high-stakes actions
  • Timeout limits per tool and per overall execution

This is not a pure chatbot. It is not a fully autonomous agent. It is somewhere in between, and that is usually the right place to be. The AI engineering guide for 2026 covers more on how these patterns fit into the broader architecture of production AI systems.


Common Mistakes

After building several of these systems and reviewing others, I see the same mistakes repeatedly.

Mistake 1: Building an Agent When a Chatbot Would Do

I have seen teams spend months building an agent with tool calling, memory, and orchestration — for a use case that is fundamentally "take user input, generate text output." If the task does not require multiple steps, external data, or state changes, an agent adds cost, latency, and failure modes with no benefit.

The fix: Before building an agent, ask: "Can this be solved with a well-crafted system prompt and one API call?" If yes, start there.

Mistake 2: Building a Chatbot When You Need an Agent

The opposite mistake. A team builds a chatbot for a task that requires the model to look up information, make decisions based on that information, and take actions. They end up hard-coding every possible path, creating a brittle decision tree disguised as an AI feature.

The fix: If you find yourself writing if/else chains to handle what the model should do after getting data, you probably need to let the model drive the loop.

Mistake 3: No Iteration Limit on the Agent Loop

I have seen agent loops that burn through $50 in API calls because the model got confused and kept calling the same tool in a loop. The maxIterations guard in the code above is not optional — it is the most important safety mechanism in your agent.

The fix: Always set a maximum iteration count. Start conservative (5-10) and increase based on observed behavior.

Mistake 4: Treating Tool Calls as Reliable

Tool calls from an LLM are suggestions, not guaranteed-correct function invocations. The model can hallucinate tool names, provide invalid arguments, or call the wrong tool entirely. Every tool call needs validation, error handling, and retry logic. I covered this extensively in tool calling patterns for reliable AI agents.

Mistake 5: Skipping Observability

A chatbot that gives a bad answer is easy to debug — check the prompt, check the response. An agent that takes 12 steps and produces a bad outcome requires tracing every tool call, every intermediate result, and every decision point. If you are building agents without structured logging and tracing from day one, you are building a black box.


The Bottom Line

The difference between a chatbot and an agent is not about branding. It is about architecture:

  • Chatbot: human drives the loop, model responds
  • Agent: model drives the loop, human sets the goal

Start with the simplest pattern that solves your problem. Move up the spectrum only when you hit a real limitation, not because "agent" sounds better in a pitch deck. The right architecture is the one that delivers value without unnecessary complexity — and in my experience, that is a tool-calling chatbot more often than most teams want to admit.