Tool Calling Patterns for Reliable AI Agents: Error Handling and Fallbacks

Tool calling is the part of LLM applications where the model makes decisions that have real-world effects. When it goes wrong and it will the agent either produces incorrect results silently, hangs waiting for a tool that will never return, or crashes with an unhandled exception.

None of those are acceptable in production. This post covers the patterns I've settled on for making tool calls in LLM agents reliable enough to deploy without losing sleep.

The Tool Call Lifecycle

Before patterns, the lifecycle. A tool call in an agent loop works like this:

The LLM decides to call a tool and returns a structured request (tool name + arguments)
Your code receives the tool call, validates the arguments, and executes the tool
The tool result (or error) is returned to the LLM as a tool result message
The LLM continues reasoning with the tool result in context

Failures can happen at every step: the model can request a non-existent tool, provide invalid arguments, your tool execution can fail, the result can be too large for the context window, or the whole thing can time out.

Pattern 1: Tool Registry with Validation

Don't execute arbitrary tool calls from the model. All tools should be registered in a central registry, and every tool call should be validated against the registry before execution.

interface Tool<TInput, TOutput> {
  name: string
  description: string
  inputSchema: z.ZodSchema<TInput>
  execute: (input: TInput) => Promise<TOutput>
  timeout?: number     // ms, default 30_000
  retries?: number     // default 2
}
 
class ToolRegistry {
  private tools = new Map<string, Tool<unknown, unknown>>()
 
  register<TInput, TOutput>(tool: Tool<TInput, TOutput>): void {
    this.tools.set(tool.name, tool as Tool<unknown, unknown>)
  }
 
  async execute(toolName: string, rawArgs: unknown): Promise<ToolResult> {
    const tool = this.tools.get(toolName)
    if (!tool) {
      return {
        success: false,
        error: `Unknown tool: ${toolName}. Available tools: ${[...this.tools.keys()].join(', ')}`,
      }
    }
 
    const parsed = tool.inputSchema.safeParse(rawArgs)
    if (!parsed.success) {
      return {
        success: false,
        error: `Invalid arguments for ${toolName}: ${parsed.error.message}`,
      }
    }
 
    return this.executeWithRetry(tool, parsed.data)
  }
}

The key properties here: unknown tools return a descriptive error (not a crash), invalid arguments return a descriptive error the model can use to correct itself, and execution goes through retry logic.

Pattern 2: Retry with Exponential Backoff

Most transient tool failures (network errors, rate limits, temporary service unavailability) are recoverable with a simple retry. Build this into the execution layer, not into each individual tool.

async function executeWithRetry<TInput, TOutput>(
  tool: Tool<TInput, TOutput>,
  input: TInput,
  maxRetries = tool.retries ?? 2
): Promise<ToolResult> {
  let lastError: Error | undefined
 
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    if (attempt > 0) {
      // Exponential backoff: 1s, 2s, 4s
      const delayMs = Math.min(1000 * 2 ** (attempt - 1), 10_000)
      await sleep(delayMs)
    }
 
    try {
      const result = await Promise.race([
        tool.execute(input),
        sleep(tool.timeout ?? 30_000).then(() => {
          throw new Error(`Tool ${tool.name} timed out after ${tool.timeout ?? 30_000}ms`)
        }),
      ])
 
      return { success: true, output: result }
    } catch (error) {
      lastError = error instanceof Error ? error : new Error(String(error))
 
      // Don't retry on validation errors or explicitly non-retryable errors
      if (error instanceof ValidationError || error instanceof NonRetryableError) {
        break
      }
 
      console.warn(`Tool ${tool.name} attempt ${attempt + 1} failed:`, lastError.message)
    }
  }
 
  return {
    success: false,
    error: lastError?.message ?? 'Unknown error',
    exhaustedRetries: true,
  }
}

The timeout race is important tools that hang indefinitely will freeze your entire agent. Set sensible per-tool timeouts and enforce them.

⚠️Retry Budget Awareness

Retries multiply your latency. Three retries with exponential backoff can add 7+ seconds to a single tool call. In multi-step agents, this compounds. Be conservative with retry counts for synchronous, user-facing operations. More aggressive retries are fine for background jobs.

Pattern 3: Returning Errors to the Model

When a tool fails after exhausting retries, you have two options: throw an exception and abort the agent run, or return the error to the model as a tool result and let it decide what to do.

In most cases, returning the error to the model is the better choice. The model may be able to:

Try a different approach to get the information it needed
Inform the user that a specific capability is unavailable
Complete the task with reduced functionality

async function runAgentLoop(messages: Message[]): Promise<string> {
  while (true) {
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages,
      tools: registry.getToolDefinitions(),
    })
 
    const message = response.choices[0].message
 
    if (message.finish_reason === 'stop') {
      return message.content ?? ''
    }
 
    if (message.finish_reason === 'tool_calls' && message.tool_calls) {
      messages.push(message) // add assistant message with tool_calls
 
      for (const toolCall of message.tool_calls) {
        const result = await registry.execute(
          toolCall.function.name,
          JSON.parse(toolCall.function.arguments)
        )
 
        // Return both successes and failures to the model
        messages.push({
          role: 'tool',
          tool_call_id: toolCall.id,
          content: result.success
            ? JSON.stringify(result.output)
            : `Error: ${result.error}. ${result.exhaustedRetries ? 'All retries exhausted.' : ''}`,
        })
      }
    }
  }
}

The model receives the error message as a tool result and can reason about it. Often it will adapt gracefully much more gracefully than a thrown exception would allow.

Pattern 4: Parallel Tool Execution

When the model requests multiple tool calls in a single response (which GPT-4 and Claude both do), execute them in parallel rather than sequentially. Sequential execution of parallel-safe tool calls is a common performance bottleneck.

if (message.tool_calls && message.tool_calls.length > 0) {
  messages.push(message)
 
  // Execute all tool calls in parallel
  const toolResults = await Promise.allSettled(
    message.tool_calls.map(async toolCall => ({
      toolCallId: toolCall.id,
      result: await registry.execute(
        toolCall.function.name,
        JSON.parse(toolCall.function.arguments)
      ),
    }))
  )
 
  // Add all results to messages
  for (const settled of toolResults) {
    if (settled.status === 'fulfilled') {
      const { toolCallId, result } = settled.value
      messages.push({
        role: 'tool',
        tool_call_id: toolCallId,
        content: result.success
          ? JSON.stringify(result.output)
          : `Error: ${result.error}`,
      })
    }
  }
}

Three tool calls that each take 500ms take 500ms total in parallel, not 1500ms sequentially.

Pattern 5: Tool Output Size Limits

Tool outputs that are too large for the context window will silently corrupt your agent's context or cause API errors. Implement output truncation at the tool registry level.

function truncateToolOutput(output: string, maxChars = 8000): string {
  if (output.length <= maxChars) return output
 
  const truncated = output.slice(0, maxChars)
  return `${truncated}\n\n[OUTPUT TRUNCATED: ${output.length - maxChars} characters omitted. Request more specific data to avoid truncation.]`
}

The truncation message tells the model what happened and gives it a hint about how to get better results. Don't truncate silently.

The Complete Tool Execution Flow

LLM returns tool_calls
  ↓
For each tool call:
  → Validate tool exists in registry
  → Validate arguments against schema
  → Execute with timeout and retry
  → Truncate output if needed
  → Return success or descriptive error
  ↓
All tool results added to messages
  ↓
LLM continues with full tool context

Every step in this flow either produces a useful result or produces a descriptive error that the model can reason about. There are no silent failures, no crashes, no hung processes.

Tool calling reliability is unglamorous work. There's no demo-worthy feature here just agents that keep working when external systems misbehave. But that's what production means: systems that hold up when things go wrong, not just when they go right.