What is the key technical difference between an AI agent and a chatbot?

A chatbot is a single request-response cycle: user sends a message, the LLM responds. An AI agent is a loop: the LLM can decide to take actions (call tools, read files, query APIs), observe the results, and continue reasoning until the task is complete. The agent has autonomy to take multiple steps without human intervention.

When should I use an AI agent instead of a chatbot?

Use a chatbot for Q&A, content generation, and simple conversations. Use an agent when the task requires multiple steps, external data access, decision-making, or actions that change state (creating files, sending emails, updating databases). If a human would need to do research before answering, you probably need an agent.

Can a chatbot be upgraded to an agent?

Yes. The progression is: chatbot (single turn) → chatbot with RAG (retrieval-enhanced responses) → tool-calling chatbot (can invoke functions) → agent (autonomous loop with tools). Each step adds capability and complexity. Most production systems are somewhere on this spectrum.

AI Agent vs Chatbot: What's Actually Different (With Code Examples)

Every product launch in 2026 seems to include the phrase “AI agent.” Customer support chatbot? Agent. Form that auto-fills fields? Agent. A wrapper around a single API call to Claude? Apparently also an agent.

The term has lost all meaning in marketing. But in engineering, the distinction between a chatbot and an agent is precise, structural, and matters for how you architect your system. I have built both in production, and the difference comes down to one thing: who controls the loop.

This post walks through four patterns — from a basic chatbot to a full agent — with runnable TypeScript code for each. By the end, you will know exactly where your system sits on the spectrum and whether you actually need an agent.

The Four Patterns

Here is the progression, from simplest to most complex:

Chatbot — single request, single response
Chatbot + RAG — retrieve context, then respond
Tool-calling chatbot — LLM can invoke functions, but the human drives each turn
Agent — autonomous loop where the LLM decides what to do next

Each builds on the previous. Let me show you the code for each.

Pattern 1: The Chatbot

A chatbot is a single request-response cycle. The user sends a message, the LLM responds. No tools, no retrieval, no loops.

import Anthropic from '@anthropic-ai/sdk';
 
const client = new Anthropic();
 
async function chatbot(userMessage: string): Promise<string> {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6-20250514',
    max_tokens: 1024,
    system: 'You are a helpful customer support assistant for Acme Corp.',
    messages: [
      { role: 'user', content: userMessage }
    ]
  });
 
  return response.content[0].type === 'text'
    ? response.content[0].text
    : '';
}
 
// Usage
const answer = await chatbot('What is your return policy?');
console.log(answer);

That is it. One API call. The model answers from its training data and the system prompt. If the answer is wrong or outdated, there is no mechanism to fix it — the model only knows what it was trained on plus whatever you put in the system prompt.

When this is enough: FAQ bots, content generation, brainstorming, simple Q&A where the system prompt can contain all necessary context. You would be surprised how many “AI agent” products are actually this.

Pattern 2: Chatbot + RAG

The chatbot’s knowledge is limited to its training data. RAG (Retrieval-Augmented Generation) fixes this by fetching relevant documents before the LLM responds.

import Anthropic from '@anthropic-ai/sdk';
 
interface Document {
  content: string;
  source: string;
  score: number;
}
 
async function searchKnowledgeBase(query: string): Promise<Document[]> {
  // In production: vector search against your embeddings database
  // e.g., Pinecone, Weaviate, pgvector
  const results = await vectorDb.search({
    query: await embed(query),
    topK: 5,
    minScore: 0.7
  });
  return results;
}
 
async function chatbotWithRAG(userMessage: string): Promise<string> {
  // Step 1: Retrieve relevant context
  const docs = await searchKnowledgeBase(userMessage);
  const context = docs
    .map(d => `[Source: ${d.source}]\n${d.content}`)
    .join('\n\n---\n\n');
 
  // Step 2: Generate response with context
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6-20250514',
    max_tokens: 1024,
    system: `You are a customer support assistant. Answer based on the provided context.
If the context does not contain the answer, say so — do not make things up.
 
Context:
${context}`,
    messages: [
      { role: 'user', content: userMessage }
    ]
  });
 
  return response.content[0].type === 'text'
    ? response.content[0].text
    : '';
}

The flow is still linear: retrieve, then respond. The LLM does not decide what to retrieve or whether to retrieve at all — your code always runs the search step. This is a critical distinction. The retrieval logic is hard-coded, not model-driven.

When this is enough: Knowledge bases, documentation search, customer support with a known corpus, any case where “look up relevant info then answer” is the complete workflow.

Pattern 3: Tool-Calling Chatbot

Now things get interesting. Instead of hard-coding the retrieval step, we give the LLM tools and let it decide which ones to use. For a deep dive on making tool calls reliable, see my post on tool calling patterns for reliable AI agents.

import Anthropic from '@anthropic-ai/sdk';
 
const tools: Anthropic.Tool[] = [
  {
    name: 'search_orders',
    description: 'Search for customer orders by email or order ID',
    input_schema: {
      type: 'object' as const,
      properties: {
        email: { type: 'string', description: 'Customer email address' },
        order_id: { type: 'string', description: 'Order ID' }
      },
      required: []
    }
  },
  {
    name: 'check_inventory',
    description: 'Check current inventory for a product SKU',
    input_schema: {
      type: 'object' as const,
      properties: {
        sku: { type: 'string', description: 'Product SKU' }
      },
      required: ['sku']
    }
  }
];
 
async function executeTool(
  name: string,
  input: Record<string, unknown>
): Promise<string> {
  switch (name) {
    case 'search_orders':
      return JSON.stringify(await db.orders.search(input));
    case 'check_inventory':
      return JSON.stringify(await db.inventory.check(input.sku as string));
    default:
      return JSON.stringify({ error: `Unknown tool: ${name}` });
  }
}
 
async function toolCallingChatbot(userMessage: string): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: 'user', content: userMessage }
  ];
 
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6-20250514',
    max_tokens: 1024,
    tools,
    messages
  });
 
  // If the model wants to use a tool, execute it and return result
  if (response.stop_reason === 'tool_use') {
    const toolUse = response.content.find(
      (block): block is Anthropic.ToolUseBlock => block.type === 'tool_use'
    );
 
    if (toolUse) {
      const result = await executeTool(
        toolUse.name,
        toolUse.input as Record<string, unknown>
      );
 
      // Send tool result back to get the final response
      messages.push({ role: 'assistant', content: response.content });
      messages.push({
        role: 'user',
        content: [{
          type: 'tool_result',
          tool_use_id: toolUse.id,
          content: result
        }]
      });
 
      const finalResponse = await client.messages.create({
        model: 'claude-sonnet-4-6-20250514',
        max_tokens: 1024,
        tools,
        messages
      });
 
      const textBlock = finalResponse.content.find(
        (block): block is Anthropic.TextBlock => block.type === 'text'
      );
      return textBlock?.text ?? '';
    }
  }
 
  const textBlock = response.content.find(
    (block): block is Anthropic.TextBlock => block.type === 'text'
  );
  return textBlock?.text ?? '';
}

The model now decides whether to call a tool and which tool to call. But notice: this code handles exactly one tool call. If the model needs to check inventory and then search orders, it cannot do both in sequence without the user sending another message. The human is still driving the loop.

When this is enough: Most chatbot products. If the user is present and can send follow-up messages, a tool-calling chatbot covers a wide range of use cases without the complexity of a full agent.

Pattern 4: The Agent

An agent is a loop. The LLM calls tools, observes results, decides what to do next, and continues until the task is complete — without waiting for human input between steps.

This is the structural difference. Not the tools. Not the model. The loop.

import Anthropic from '@anthropic-ai/sdk';
 
interface AgentConfig {
  maxIterations: number;
  tools: Anthropic.Tool[];
  system: string;
}
 
async function agent(
  task: string,
  config: AgentConfig
): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: 'user', content: task }
  ];
 
  for (let i = 0; i < config.maxIterations; i++) {
    const response = await client.messages.create({
      model: 'claude-sonnet-4-6-20250514',
      max_tokens: 4096,
      system: config.system,
      tools: config.tools,
      messages
    });
 
    // If the model is done (no more tool calls), return the response
    if (response.stop_reason === 'end_turn') {
      const textBlock = response.content.find(
        (block): block is Anthropic.TextBlock => block.type === 'text'
      );
      return textBlock?.text ?? '';
    }
 
    // Otherwise, execute all tool calls and continue the loop
    if (response.stop_reason === 'tool_use') {
      messages.push({ role: 'assistant', content: response.content });
 
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
 
      for (const block of response.content) {
        if (block.type === 'tool_use') {
          const result = await executeTool(
            block.name,
            block.input as Record<string, unknown>
          );
          toolResults.push({
            type: 'tool_result',
            tool_use_id: block.id,
            content: result
          });
        }
      }
 
      messages.push({ role: 'user', content: toolResults });
    }
  }
 
  return 'Agent reached maximum iterations without completing the task.';
}
 
// Usage: the agent autonomously handles a multi-step task
const result = await agent(
  'Find the order for customer jane@example.com, check if all items are in stock, and draft a shipping update email.',
  {
    maxIterations: 10,
    tools: [...orderTools, ...inventoryTools, ...emailTools],
    system: 'You are a fulfillment agent. Complete the task using the available tools.'
  }
);

Look at what changed. The for loop is the entire difference. The model keeps running — calling tools, observing results, deciding what to do next — until it either finishes the task or hits the iteration limit. No human in the loop between steps.

That single for loop is why agents are both powerful and dangerous. Powerful because they can handle multi-step workflows autonomously. Dangerous because a confused model can burn through API calls, take incorrect actions, or loop forever. For lessons on what goes wrong in production, see my post on production agent architecture failures and lessons.

Side-by-Side Comparison

Dimension	Chatbot	Chatbot + RAG	Tool-Calling Chatbot	Agent
Control flow	Linear	Linear	Single tool turn	Autonomous loop
Who drives the loop	Human	Human	Human	The LLM
External data access	None (system prompt only)	Retrieval (hard-coded)	Tools (model-chosen)	Tools (model-chosen)
Multi-step reasoning	No	No	Across human turns	Yes, autonomous
State changes	Read-only	Read-only	Per human turn	Autonomous writes
Failure blast radius	One bad response	One bad response	One bad tool call	Cascading failures
Token cost per interaction	Low (1 API call)	Medium (1 call + retrieval)	Medium (2-3 calls)	High (5-20+ calls)
Implementation complexity	Trivial	Low	Medium	High
Observability needs	Basic logging	Retrieval metrics	Tool call tracking	Full trace + cost monitoring
Testing approach	Prompt evals	Retrieval accuracy + evals	Tool execution tests	End-to-end scenario tests

When to Use Which: A Decision Framework

I use this mental model when starting a new AI feature:

Start with the simplest pattern that works. Then upgrade only when you hit a limitation.

Use a chatbot when:

All necessary context fits in the system prompt
The task is single-turn (ask a question, get an answer)
Incorrect answers are low-stakes (brainstorming, drafts, suggestions)
You want low latency and low cost

Use RAG when:

You have a knowledge base that changes frequently
Answers need to be grounded in specific documents
The context is too large for a system prompt
You need source attribution

Use tool-calling when:

The model needs access to live data (databases, APIs)
Different user queries require different data sources
The user is present and can guide multi-step workflows
You want model-driven data access but human-driven control flow

Use an agent when:

The task requires multiple dependent steps
The model needs to observe results before deciding next steps
A human would need to do research, analysis, and synthesis
The workflow involves state changes (creating records, sending emails)
The user gives a goal, not step-by-step instructions

Real examples:

“Summarize this document” — Chatbot. One input, one output.
“Answer questions about our product docs” — RAG. Needs retrieval, but the flow is linear.
“Look up this customer’s order status” — Tool-calling chatbot. Needs live data, but it is one step.
“Process this refund: find the order, verify the return, update inventory, notify the customer, and send a confirmation email” — Agent. Multiple dependent steps with state changes.

The Spectrum

In practice, most production systems are not cleanly one pattern. They sit on a spectrum.

The most common architecture I see in production is what I call a “bounded agent” — a system with the agent loop but with strict guardrails:

Maximum iteration limits (usually 5-15 steps)
Tool-level permissions (read-only tools vs. write tools with confirmation)
Cost caps per execution
Human-in-the-loop checkpoints for high-stakes actions
Timeout limits per tool and per overall execution

This is not a pure chatbot. It is not a fully autonomous agent. It is somewhere in between, and that is usually the right place to be. The AI engineering guide for 2026 covers more on how these patterns fit into the broader architecture of production AI systems.

Common Mistakes

After building several of these systems and reviewing others, I see the same mistakes repeatedly.

Mistake 1: Building an Agent When a Chatbot Would Do

I have seen teams spend months building an agent with tool calling, memory, and orchestration — for a use case that is fundamentally “take user input, generate text output.” If the task does not require multiple steps, external data, or state changes, an agent adds cost, latency, and failure modes with no benefit.

The fix: Before building an agent, ask: “Can this be solved with a well-crafted system prompt and one API call?” If yes, start there.

Mistake 2: Building a Chatbot When You Need an Agent

The opposite mistake. A team builds a chatbot for a task that requires the model to look up information, make decisions based on that information, and take actions. They end up hard-coding every possible path, creating a brittle decision tree disguised as an AI feature.

The fix: If you find yourself writing if/else chains to handle what the model should do after getting data, you probably need to let the model drive the loop.

Mistake 3: No Iteration Limit on the Agent Loop

I have seen agent loops that burn through $50 in API calls because the model got confused and kept calling the same tool in a loop. The maxIterations guard in the code above is not optional — it is the most important safety mechanism in your agent.

The fix: Always set a maximum iteration count. Start conservative (5-10) and increase based on observed behavior.

Mistake 4: Treating Tool Calls as Reliable

Tool calls from an LLM are suggestions, not guaranteed-correct function invocations. The model can hallucinate tool names, provide invalid arguments, or call the wrong tool entirely. Every tool call needs validation, error handling, and retry logic. I covered this extensively in tool calling patterns for reliable AI agents.

Mistake 5: Skipping Observability

A chatbot that gives a bad answer is easy to debug — check the prompt, check the response. An agent that takes 12 steps and produces a bad outcome requires tracing every tool call, every intermediate result, and every decision point. If you are building agents without structured logging and tracing from day one, you are building a black box.

The Bottom Line

The difference between a chatbot and an agent is not about branding. It is about architecture:

Chatbot: human drives the loop, model responds
Agent: model drives the loop, human sets the goal

Start with the simplest pattern that solves your problem. Move up the spectrum only when you hit a real limitation, not because “agent” sounds better in a pitch deck. The right architecture is the one that delivers value without unnecessary complexity — and in my experience, that is a tool-calling chatbot more often than most teams want to admit.