LLM Structured Output Validation: From Fragile Strings to Typed Schemas
Every LLM integration starts the same way: you ask the model to "respond in JSON" and then write a JSON.parse() call to get the data out. This works in development. It breaks in production.
The model occasionally returns trailing commas, wraps the JSON in a markdown code block, adds an explanation before the JSON, or returns a slightly different structure than you expected. Your parser throws, your application breaks, and you're left adding increasingly desperate string cleanup code around every LLM call.
There is a better way. This post covers three levels of structured output enforcement from client-side validation with Zod, to API-level schema enforcement, to full typed inference and when to use each.
Level 1: Client-Side Validation with Zod
The minimum viable improvement over raw JSON.parse is validating the parsed output against a schema. Zod is the right tool for this in TypeScript.
import { z } from 'zod'
const ArticleSummarySchema = z.object({
title: z.string().min(1),
summary: z.string().min(10).max(500),
keyPoints: z.array(z.string()).min(1).max(5),
sentiment: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number().min(0).max(1),
})
type ArticleSummary = z.infer<typeof ArticleSummarySchema>
async function summarizeArticle(content: string): Promise<ArticleSummary> {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `You are a content analyzer. Respond with valid JSON only, no additional text.
Schema: { title, summary, keyPoints: string[], sentiment: "positive"|"negative"|"neutral", confidence: 0-1 }`,
},
{ role: 'user', content },
],
})
const rawContent = response.choices[0].message.content
if (!rawContent) throw new Error('Empty response from model')
// Clean common LLM response artifacts
const cleaned = rawContent
.replace(/^```json\n?/, '')
.replace(/\n?```$/, '')
.trim()
const parsed = JSON.parse(cleaned)
return ArticleSummarySchema.parse(parsed) // throws ZodError if schema mismatch
}This is better than raw JSON.parse, but it still has the string-cleaning problem. The model might still return invalid JSON even with the best prompting. You're one bad response away from a thrown exception.
Level 2: API-Level Structured Outputs (OpenAI)
OpenAI's structured outputs feature enforces your schema at the API level, before the response reaches your application. The model's output is constrained by the schema it cannot return something that doesn't match.
import { z } from 'zod'
import { zodResponseFormat } from 'openai/helpers/zod'
const ArticleSummarySchema = z.object({
title: z.string(),
summary: z.string(),
keyPoints: z.array(z.string()),
sentiment: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number(),
})
async function summarizeArticle(content: string) {
const response = await openai.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06', // structured outputs require this model or later
messages: [
{ role: 'system', content: 'You are a content analyzer.' },
{ role: 'user', content },
],
response_format: zodResponseFormat(ArticleSummarySchema, 'article_summary'),
})
// response.choices[0].message.parsed is already typed as z.infer<typeof ArticleSummarySchema>
const summary = response.choices[0].message.parsed
if (!summary) throw new Error('Model refused to return structured output')
return summary
}With zodResponseFormat, OpenAI converts your Zod schema to JSON Schema, constrains the model's output to match it, and returns the parsed, typed result directly. No string cleaning, no JSON.parse, no Zod runtime validation needed on your end the SDK handles it.
JSON Mode (response_format: { type: "json_object" }) only guarantees valid JSON not that the JSON matches your schema. Structured Outputs with a schema guarantees both. Use structured outputs whenever you need a specific shape, not just valid JSON.
Level 3: Tool Calling as a Validation Pattern
For cases where the model needs to reason before producing structured output, tool calling is more reliable than direct structured output. You define the expected output as a tool, and ask the model to "call" it with the data.
const tools = [
{
type: 'function' as const,
function: {
name: 'record_analysis',
description: 'Record the analysis results for the provided content',
parameters: {
type: 'object',
properties: {
classification: {
type: 'string',
enum: ['bug_report', 'feature_request', 'question', 'other'],
},
priority: {
type: 'string',
enum: ['low', 'medium', 'high', 'critical'],
},
summary: { type: 'string', maxLength: 200 },
tags: {
type: 'array',
items: { type: 'string' },
maxItems: 5,
},
},
required: ['classification', 'priority', 'summary', 'tags'],
additionalProperties: false,
},
},
},
]
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
tools,
tool_choice: { type: 'function', function: { name: 'record_analysis' } },
})
const toolCall = response.choices[0].message.tool_calls?.[0]
if (!toolCall) throw new Error('No tool call in response')
const args = JSON.parse(toolCall.function.arguments)
// args is now validated by the JSON Schema you definedThe tool_choice parameter forces the model to call a specific tool it cannot return a regular message. The tool parameters are validated against your JSON Schema. This pattern is particularly useful when you want the model to reason through the problem conversationally and then produce structured output at the end.
Handling Validation Failures Gracefully
Even with API-level constraints, structured output calls can fail. The model may refuse to return structured output for certain inputs (especially if the content violates content policies). Build explicit handling for this.
async function safeSummarize(content: string): Promise<ArticleSummary | null> {
try {
const response = await openai.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06',
messages: [
{ role: 'system', content: 'You are a content analyzer.' },
{ role: 'user', content },
],
response_format: zodResponseFormat(ArticleSummarySchema, 'summary'),
})
const parsed = response.choices[0].message.parsed
if (response.choices[0].message.refusal) {
console.warn('Model refused structured output:', response.choices[0].message.refusal)
return null
}
return parsed
} catch (error) {
if (error instanceof z.ZodError) {
console.error('Schema validation failed:', error.errors)
return null
}
throw error // Re-throw unexpected errors
}
}The .refusal field tells you explicitly when the model declined to produce structured output distinct from a parsing error or API error. Handle all three cases.
Anthropic: Tool Calling for Structure
Anthropic's Claude doesn't yet have a direct equivalent to OpenAI's structured outputs, but tool calling achieves the same result. Define your expected output as a tool and force the model to call it.
const response = await anthropic.messages.create({
model: 'claude-opus-4-6',
max_tokens: 1024,
tools: [
{
name: 'record_result',
description: 'Record the structured analysis result',
input_schema: {
type: 'object',
properties: {
sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
score: { type: 'number' },
explanation: { type: 'string' },
},
required: ['sentiment', 'score', 'explanation'],
},
},
],
tool_choice: { type: 'tool', name: 'record_result' },
messages: [{ role: 'user', content: text }],
})
const toolUse = response.content.find(block => block.type === 'tool_use')
if (!toolUse || toolUse.type !== 'tool_use') throw new Error('No tool use in response')
const result = toolUse.input as { sentiment: string; score: number; explanation: string }The pattern is identical conceptually structured output through forced tool use.
Moving from prompt-and-parse to schema-enforced structured outputs is one of the highest-leverage improvements you can make to any LLM integration. It removes an entire category of production failures, makes your TypeScript types accurate, and gives you a clear contract between your application and the model.
The Zod approach works today with any model. The OpenAI structured outputs API gives you the strongest guarantees when you're on supported models. Use whichever matches your current stack but pick one and use it consistently.