WebMCP: How AI Agents Will Interact with Web Pages
Every AI agent interacting with the web today is essentially screen-scraping with extra steps. Whether it's Puppeteer clicking buttons, a browser extension reading DOM elements, or an LLM parsing screenshots - they are all working against the web, not with it. WebMCP changes that by giving web pages a native way to expose structured tools that AI agents can discover and invoke directly.
After building custom MCP servers for backend integrations and watching the agentic development space evolve through 2025-2026, WebMCP feels like the missing piece. It brings the same structured tool protocol to the browser, where most user-facing interactions actually happen.
The Problem with AI Agents on the Web Today
Right now, AI agents interact with websites through three approaches, and all of them have serious limitations.
Screen scraping and DOM parsing means the agent reads raw HTML, guesses which elements are interactive, and tries to extract meaning from markup that was designed for visual rendering, not machine consumption. A button labeled "Submit" could do anything - place an order, delete an account, or send a message.
Browser automation (Puppeteer, Playwright, Selenium) gives agents the ability to click, type, and navigate. But the agent has no understanding of what actions are available or what they do. It navigates by coordinates and CSS selectors that break when the UI changes.
Screenshot analysis with vision models is the newest approach. The agent literally looks at the screen and decides where to click. This works surprisingly well for simple tasks but fails completely for complex workflows, hidden state, or actions that require understanding business logic.
All three share the same fundamental problem: the website has no idea an AI agent is interacting with it, and the agent has no structured understanding of what the website offers.
Agent reads raw HTML → guesses which elements are interactive → clicks by CSS selectors → parses unstructured responses → breaks on every UI change
- No understanding of available actions or parameters
- 70-90% of LLM context consumed by page structure
- Requires constant maintenance as UI evolves
Page declares capabilities → agent discovers typed tools → invokes with validated inputs → receives structured responses → UI-independent and resilient
- Explicit tool descriptions with typed schemas
- 15-25% context usage, leaving room for reasoning
- Stable interface regardless of visual changes
// How agents interact with the web today - fragile and blind
// browser-automation/checkout.ts
import { chromium } from 'playwright';
async function automatedCheckout(items: string[]) {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://store.example.com');
// Fragile: breaks if CSS class changes
await page.click('.cart-button');
// Guessing: no idea what fields are required
await page.fill('#address-line-1', '123 Main St');
// Blind: no structured error handling
await page.click('[data-testid="submit-order"]');
// Hope it worked?
const confirmation = await page.textContent('.confirmation-message');
return confirmation;
}This approach requires constant maintenance. Every UI update, A/B test, or layout change can break the automation. The agent never truly understands the application's capabilities.
What WebMCP Actually Is
WebMCP is a proposed JavaScript API that lets web pages register structured tools - functions with descriptions, typed input schemas, and defined behavior - that AI agents can discover and invoke. Think of it as giving your website a machine-readable menu of everything it can do.
The proposal comes from the W3C Web Machine Learning Community Group and builds directly on Anthropic's Model Context Protocol (MCP). Where standard MCP connects AI models to backend servers over stdio or HTTP, WebMCP connects them to web pages through browser-native APIs.
The core idea: instead of an AI agent guessing what a web page can do by reading its HTML, the page explicitly declares its capabilities.
// How WebMCP changes the interaction model
// The website explicitly declares what it can do
// Before: Agent scrapes DOM, guesses functionality
// <button class="btn-primary" onclick="checkout()">Complete Purchase</button>
// Agent sees: "a button with text 'Complete Purchase'" - knows nothing about parameters
// After: Website registers a structured tool
if (navigator.modelContext) {
navigator.modelContext.registerTool({
name: "complete_purchase",
description: "Process checkout with items in cart. Validates inventory, applies discounts, and charges payment method on file.",
inputSchema: {
type: "object",
properties: {
shippingSpeed: {
type: "string",
enum: ["standard", "express", "overnight"],
description: "Shipping speed selection"
},
giftWrap: {
type: "boolean",
description: "Whether to add gift wrapping"
},
promoCode: {
type: "string",
description: "Optional promotional code"
}
},
required: ["shippingSpeed"]
},
annotations: {
destructiveHint: true,
idempotentHint: false
},
execute: async (input) => {
const result = await processCheckout(input);
return [{ type: "text", text: JSON.stringify(result) }];
}
});
}The agent now knows exactly what the checkout tool does, what parameters it accepts, and that it is a destructive (non-reversible) action. No DOM scraping, no guessing.
How Standard MCP Works (Quick Recap)
If you've worked with MCP servers before, you know the architecture. An MCP client (the AI model) connects to MCP servers (backend services) that expose tools, resources, and prompts.
AI Model (Client) — Claude, GPT-4, Gemini — communicates over JSON-RPC 2.0 (stdio / HTTP SSE) with MCP Servers (Backend) — GitHub API, Slack API, Database.
The protocol uses three message types: tools/list (discover available tools), tools/call (invoke a tool), and resources/read (fetch data).
This works well for backend integrations - databases, APIs, file systems. But it completely misses the browser. The richest interaction layer, where users actually see and interact with applications, has no MCP story.
WebMCP Architecture
WebMCP adds the browser as a first-class MCP environment. Web pages become MCP servers, and AI agents become MCP clients, but the communication happens entirely in the browser.
AI Agent (Extension / Built-in) communicates with the Web Page (MCP Server) via navigator.modelContext.
tools/list— Agent discovers registered tools (search, add_to_cart, checkout)tools/call— Agent invokes a tool with typed parameterscontent[]— Page returns structured responses
This is the W3C proposal being incubated by the Web Machine Learning Community Group.
Since browsers don't natively support navigator.modelContext yet, the MCP-B project provides a polyfill that shims the API.
- Aggregates tools across multiple browser tabs
- Routes requests to the correct page
- Manages sessions and caching
The MCP-B bridge connects to external AI hosts (Claude Desktop, etc.) via Standard MCP (JSON-RPC 2.0).
Communication uses browser-native mechanisms: postMessage for same-tab, extension runtime messaging for cross-context, or HTTP/SSE for external AI hosts.
Registering Tools: The JavaScript API
The core API centers on navigator.modelContext.registerTool(). Each tool is a JavaScript object with a name, natural language description, JSON Schema for inputs, and an execute handler.
Here's a practical example - an e-commerce product page that exposes shopping tools to AI agents:
// store/product-page.ts
// Registering WebMCP tools on an e-commerce product page
function registerProductTools(product: Product) {
if (!navigator.modelContext) return;
// Tool 1: Search products with filters
navigator.modelContext.registerTool({
name: "search_products",
description: "Search the store catalog with filters for category, price range, size, color, and availability",
inputSchema: {
type: "object",
properties: {
query: { type: "string", description: "Search query text" },
category: {
type: "string",
enum: ["dresses", "tops", "pants", "shoes", "accessories"]
},
priceRange: {
type: "object",
properties: {
min: { type: "number" },
max: { type: "number" }
}
},
size: {
type: "string",
enum: ["XS", "S", "M", "L", "XL"]
},
inStock: { type: "boolean", description: "Only show items in stock" }
},
required: ["query"]
},
annotations: {
readOnlyHint: true
},
execute: async (input) => {
const results = await searchProducts(input);
return [{
type: "text",
text: JSON.stringify({
count: results.length,
products: results.map(p => ({
id: p.id,
name: p.name,
price: p.price,
sizes: p.availableSizes,
rating: p.averageRating
}))
})
}];
}
});
// Tool 2: Get detailed product information
navigator.modelContext.registerTool({
name: "get_product_details",
description: `Get full details for product: ${product.name}. Includes pricing, reviews, size guide, and material composition.`,
inputSchema: {
type: "object",
properties: {
includeReviews: {
type: "boolean",
description: "Include customer reviews in response"
},
reviewCount: {
type: "number",
description: "Number of reviews to include (max 10)"
}
}
},
annotations: {
readOnlyHint: true
},
execute: async (input) => {
const details = await getProductDetails(product.id);
const response: Record<string, unknown> = {
name: details.name,
price: details.price,
description: details.fullDescription,
materials: details.materials,
sizeGuide: details.sizeGuide,
availability: details.stockStatus
};
if (input.includeReviews) {
response.reviews = details.reviews.slice(0, input.reviewCount || 5);
}
return [{ type: "text", text: JSON.stringify(response) }];
}
});
// Tool 3: Add to cart (destructive action)
navigator.modelContext.registerTool({
name: "add_to_cart",
description: "Add the current product to the shopping cart with selected size and quantity",
inputSchema: {
type: "object",
properties: {
size: {
type: "string",
enum: product.availableSizes,
description: "Selected size"
},
quantity: {
type: "number",
minimum: 1,
maximum: 10,
description: "Quantity to add"
}
},
required: ["size", "quantity"]
},
annotations: {
destructiveHint: false,
idempotentHint: false
},
execute: async (input) => {
const cartItem = await addToCart(product.id, input.size, input.quantity);
return [{
type: "text",
text: JSON.stringify({
added: true,
cartTotal: cartItem.cartTotal,
itemCount: cartItem.totalItems
})
}];
}
});
}★ Insight ─────────────────────────────────────
The annotations field is critical for agent safety. readOnlyHint tells the agent a tool only reads data (safe to call freely). destructiveHint flags irreversible actions (the agent should confirm with the user first). idempotentHint indicates whether calling the tool multiple times produces the same result. These annotations let AI agents make informed decisions about when to ask for user confirmation.
─────────────────────────────────────────────────
React Integration with useWebMCP
For React applications, the MCP-B library provides a useWebMCP hook that integrates tool registration with the component lifecycle. Tools are registered when the component mounts and cleaned up on unmount.
// components/ProductPage.tsx
import { useWebMCP } from '@mcp-b/react-webmcp';
import { z } from 'zod';
interface ProductPageProps {
productId: string;
product: Product;
}
function ProductPage({ productId, product }: ProductPageProps) {
const [cartCount, setCartCount] = useState(0);
// Register tool tied to this component's lifecycle
useWebMCP({
name: 'add_to_cart',
description: `Add ${product.name} to shopping cart. Price: $${product.price}. Available sizes: ${product.availableSizes.join(', ')}`,
inputSchema: {
quantity: z.number().min(1).max(10).default(1),
size: z.enum(product.availableSizes as [string, ...string[]])
},
handler: async ({ quantity, size }) => {
const result = await addToCart(productId, size, quantity);
setCartCount(prev => prev + quantity);
return {
success: true,
quantity,
size,
newCartTotal: result.total
};
}
});
useWebMCP({
name: 'check_size_availability',
description: 'Check which sizes are currently in stock for this product',
inputSchema: {},
handler: async () => {
const availability = await checkStock(productId);
return {
product: product.name,
sizes: availability
};
}
});
return (
<div>
<h1>{product.name}</h1>
<p>${product.price}</p>
<p>Cart items: {cartCount}</p>
{/* Regular UI rendering */}
</div>
);
}The beauty of this pattern is that tools are contextual. When a user navigates to a product page, that page's tools become available to the AI agent. Navigate away, and those tools are cleaned up. The agent always has access to exactly the tools relevant to the current page context.
Security: How WebMCP Protects Users and Applications
Security was a primary concern in the WebMCP design. An AI agent executing arbitrary JavaScript on web pages would be a nightmare. WebMCP addresses this through multiple layers.
Internal-URL Shielding
Web pages often call internal APIs that should never be exposed directly. WebMCP uses symbolic names instead of real URLs:
// security/url-shielding.ts
// Tool definition uses symbolic API references
navigator.modelContext.registerTool({
name: "process_payment",
description: "Process payment for current order",
inputSchema: {
type: "object",
properties: {
paymentMethodId: { type: "string" }
},
required: ["paymentMethodId"]
},
// Internal API endpoint is never exposed to the agent
// @PAYMENT_API resolves server-side after token validation
execute: async (input) => {
// The agent never sees the actual URL
// Internally this calls: POST /api/v3/payments/charge
// But the mapping happens server-side with wmcp-token validation
const result = await fetch('/wmcp/resolve/@PAYMENT_API', {
method: 'POST',
headers: {
'X-WMCP-Token': await getWMCPToken(),
'Content-Type': 'application/json'
},
body: JSON.stringify({
action: 'charge',
paymentMethodId: input.paymentMethodId
})
});
return [{ type: "text", text: await result.text() }];
}
});The @PAYMENT_API symbolic name maps to the real API endpoint only on the server side, after validating the wmcp-token. The AI agent never sees internal API URLs.
Scope Hints and Token Validation
Tools declare what permissions they need, and the agent must have matching scopes:
// security/scopes.ts
navigator.modelContext.registerTool({
name: "update_profile",
description: "Update user profile information",
scopes: ["read:profile", "write:profile"],
inputSchema: {
type: "object",
properties: {
displayName: { type: "string", maxLength: 50 },
bio: { type: "string", maxLength: 500 }
}
},
// Auto-throttle prevents abuse
throttle: {
rate: "5/min",
burst: 2
},
execute: async (input) => {
// Scope validation happens before execute is called
// If agent doesn't have write:profile scope, this never runs
const updated = await updateProfile(input);
return [{ type: "text", text: JSON.stringify(updated) }];
}
});The wmcp-token (JWT or session-based) carries the agent's granted scopes. If the token doesn't include write:profile, the tool call is rejected with a 403 before the execute function ever runs.
Auto-Throttle Hints
Tools can specify rate limits that the MCP-B polyfill enforces client-side. This prevents a misbehaving agent from hammering an endpoint. The rate: "5/min" hint tells the polyfill to queue or reject calls that exceed 5 invocations per minute.
Performance: Why WebMCP Is More Efficient Than Alternatives
The research behind WebMCP (documented in arXiv:2508.09171) shows significant efficiency gains over both traditional MCP and browser automation approaches.
67.6% Token Reduction
Traditional approaches to AI-web interaction burn through tokens describing page structure. When an agent uses Puppeteer or screenshot analysis, the LLM context gets filled with HTML, accessibility trees, or image descriptions. WebMCP cuts this dramatically:
Traditional approach — LLM context consumed by page structure:
WebMCP approach — LLM context freed for reasoning:
In benchmarks on e-commerce workflows with 10,000+ tool calls, WebMCP reduced average token usage from 2,100 to 680 tokens per task - a 67.6% reduction. Tool descriptions are 40% smaller than DOM representations, and responses are 30% leaner through structured content arrays.
Token Reduction with WebMCP
Avg tokens/task (DOM scraping)
Avg tokens/task (WebMCP)
Smaller tool descriptions
Leaner responses
Before
2,100 tokens
After
680 tokens
O(|E|) Discovery Complexity
Tool discovery scales as O(|E|) where |E| is the number of relevant elements (registered tools), not the total number of DOM elements on the page. A page with 500 DOM nodes but 5 registered tools only requires the agent to process 5 tool descriptions, not parse the entire document tree.
The MCP-B polyfill uses graph traversal for cross-tab tool aggregation, with topological sorting for tools that depend on each other. In benchmarks with 100+ tools registered across multiple tabs, discovery latency stayed under 50ms on Chrome.
Real-World Use Cases
E-Commerce: Intelligent Shopping Assistants
This is the most compelling use case. A user tells their AI assistant: "Find me a red summer dress under $100, size M, with good reviews." Without WebMCP, the agent needs to navigate the site visually, click through filters, and parse results from HTML. With WebMCP:
// The AI agent's interaction flow with WebMCP
// Agent receives structured tool list from the store page:
// Step 1: Search with precise filters
const searchResult = await tools.search_products({
query: "summer dress",
category: "dresses",
priceRange: { min: 0, max: 100 },
size: "M",
inStock: true
});
// Step 2: Get details on top matches
for (const product of searchResult.products.slice(0, 3)) {
const details = await tools.get_product_details({
productId: product.id,
includeReviews: true,
reviewCount: 5
});
// Agent can now compare products with full context
}
// Step 3: Present options to user with rich data
// The user sees results in the store's own UI
// Agent and user share the same visual contextThe agent works within the store's interface. The user sees the same products, can click through images, read full descriptions, and the agent can reference specific products without ambiguity.
Find me a red summer dress under $100, size M, with good reviews
I found 3 dresses matching your criteria using the store's search tools. Here are the top picks: (1) Floral Wrap Dress - $79.99 - 4.8 stars (142 reviews) - Size M in stock. (2) Cotton A-Line Sundress - $64.50 - 4.6 stars (89 reviews) - Size M in stock. (3) Linen Midi Dress - $92.00 - 4.9 stars (67 reviews) - Size M, only 2 left. Want me to add any of these to your cart, or see detailed reviews?
Add the first one to my cart
Done! Added the Floral Wrap Dress (Size M, qty 1) to your cart. Your cart total is now $79.99. The store page is showing your updated cart - you can see it in the top right.
Form-Heavy Applications: Insurance, Banking, Government Services
Complex multi-step forms are painful for users and even worse for AI agents trying to automate them. WebMCP lets the form expose its structure:
// insurance-application/tools.ts
navigator.modelContext.registerTool({
name: "fill_application_section",
description: "Fill a section of the insurance application. Validates inputs against policy requirements.",
inputSchema: {
type: "object",
properties: {
section: {
type: "string",
enum: ["personal_info", "vehicle_info", "coverage_selection", "payment"],
description: "Which section to fill"
},
data: {
type: "object",
description: "Section-specific data matching the form fields"
}
},
required: ["section", "data"]
},
annotations: {
destructiveHint: false,
idempotentHint: true
},
execute: async (input) => {
const validation = await validateAndFill(input.section, input.data);
return [{
type: "text",
text: JSON.stringify({
filled: validation.success,
errors: validation.errors,
nextSection: validation.nextRequired,
completionPercent: validation.progress
})
}];
}
});
navigator.modelContext.registerTool({
name: "get_form_requirements",
description: "Get required fields and validation rules for a form section",
inputSchema: {
type: "object",
properties: {
section: {
type: "string",
enum: ["personal_info", "vehicle_info", "coverage_selection", "payment"]
}
},
required: ["section"]
},
annotations: { readOnlyHint: true },
execute: async (input) => {
const requirements = getFieldRequirements(input.section);
return [{ type: "text", text: JSON.stringify(requirements) }];
}
});The agent can query form requirements before filling them, handle validation errors gracefully, and track progress through multi-step workflows - all through structured interactions instead of blind form filling.
Accessibility and Assistive Technology
WebMCP has significant implications for accessibility. Screen readers and assistive technologies currently rely on ARIA attributes and semantic HTML to understand web pages. WebMCP provides an additional structured layer:
An AI assistant helping a visually impaired user can discover exactly what actions are available on a page, understand their parameters, and execute them - without needing to parse complex UI layouts or rely on potentially incomplete ARIA markup.
WebMCP vs Browser Automation: Why Structured Tools Win
| Aspect | Browser Automation (Puppeteer/Playwright) | WebMCP |
|---|---|---|
| Discovery | Parse full DOM, guess interactive elements | Explicit tool list with descriptions |
| Invocation | CSS selectors, XPath, coordinates | Typed function calls with schemas |
| Error handling | Try/catch around DOM operations | Structured error responses |
| Maintenance | Breaks on UI changes | Stable tool interface |
| Security | Full page access, credential exposure risk | Scoped permissions, URL shielding |
| Performance | Full page rendering required | Direct function execution |
| Context sharing | Agent and user see different things | Shared visual context |
| Token efficiency | 70-90% of context for page structure | 15-25% for tool descriptions |
The fundamental difference: browser automation fights the web. WebMCP works with it.
In my experience building automations with Playwright, the most frustrating aspect is fragility. A CSS class rename, a layout shift, or an A/B test can break everything. WebMCP tool interfaces are a contract between the page and the agent. The UI can change completely, and as long as the tool's execute function still works, the agent doesn't care.
Current Status and Ecosystem
WebMCP is still early, but the ecosystem is taking shape across several efforts.
W3C Web Machine Learning Community Group
The webmachinelearning/webmcp repository hosts the core specification proposal for the navigator.modelContext API. This is an incubation effort aimed at eventual standardization, similar to how the Web Serial API or WebGPU progressed through community groups before becoming web standards.
The spec defines the core interfaces: how tools are registered, how agents discover them, and the content response format. It is still in active development with open issues around security boundaries, cross-origin tool sharing, and capability negotiation.
MCP-B: The Polyfill and Bridge
MCP-B is the most complete implementation today. It provides:
- Global polyfill (
@mcp-b/global) that shimsnavigator.modelContextfor current browsers - React integration (
@mcp-b/react-webmcp) with theuseWebMCPhook - Browser extension for tool discovery and AI agent exposure
- Transport layer for cross-tab tool aggregation and external AI host connections
MCP-B is production-ready enough for experimentation. The architecture documentation at docs.mcp-b.ai is thorough and shows clear thinking about security, performance, and developer experience.
Community Implementations
Several community projects are exploring different angles:
- jasonjmcghee/WebMCP focuses on browser extension-based tool injection
- WebMCP-org/mcp-ui-webmcp explores UI-specific tool patterns
The research paper (arXiv:2508.09171) provides formal analysis of the token efficiency and security models.
Adding WebMCP to Your Website Today
While the spec is still in proposal stage, you can experiment with WebMCP using the MCP-B polyfill. Here's a minimal setup:
// 1. Install the polyfill
// npm install @mcp-b/global
// 2. Initialize in your app entry point
// app/layout.tsx or index.ts
import '@mcp-b/global';
// 3. Register tools anywhere in your application
// lib/webmcp-tools.ts
export function registerSiteTools() {
if (!navigator.modelContext) {
console.log('WebMCP not available');
return;
}
navigator.modelContext.registerTool({
name: "site_search",
description: "Search all content on this website including blog posts, documentation, and projects",
inputSchema: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
contentType: {
type: "string",
enum: ["all", "blog", "docs", "projects"],
description: "Filter by content type"
},
limit: {
type: "number",
description: "Max results to return",
maximum: 20
}
},
required: ["query"]
},
annotations: { readOnlyHint: true },
execute: async (input) => {
const results = await fetch(
`/api/search?q=${encodeURIComponent(input.query)}&type=${input.contentType || 'all'}&limit=${input.limit || 10}`
);
return [{ type: "text", text: await results.text() }];
}
});
navigator.modelContext.registerTool({
name: "get_page_info",
description: "Get information about the current page including title, description, and navigation context",
inputSchema: {
type: "object",
properties: {}
},
annotations: { readOnlyHint: true },
execute: async () => {
return [{
type: "text",
text: JSON.stringify({
title: document.title,
url: window.location.href,
description: document
.querySelector('meta[name="description"]')
?.getAttribute('content'),
headings: Array.from(document.querySelectorAll('h1, h2'))
.map(h => h.textContent)
})
}];
}
});
}For React applications, the hook-based approach is cleaner:
// components/BlogPost.tsx
import { useWebMCP } from '@mcp-b/react-webmcp';
function BlogPost({ post }: { post: BlogPostType }) {
useWebMCP({
name: 'get_blog_post_summary',
description: `Get a structured summary of the blog post: "${post.title}"`,
inputSchema: {},
handler: async () => ({
title: post.title,
author: post.author,
publishDate: post.publishDate,
category: post.category,
tags: post.tags,
readingTime: post.readingTime,
excerpt: post.description
})
});
return <article>{/* post rendering */}</article>;
}What This Means for the Future of AI on the Web
WebMCP represents a shift in how we think about the relationship between AI agents and web applications. Instead of agents being external automation scripts that push buttons, they become first-class consumers of web application functionality.
For developers, this means a new consideration when building web applications: what tools should your page expose to AI agents? Not every feature needs a WebMCP tool, but the high-value interactions - search, transactions, data retrieval - benefit enormously from structured exposure.
For the broader ecosystem, WebMCP is one piece of a larger puzzle. Combined with standard MCP for backend services, A2A (Agent-to-Agent) for inter-agent communication, and improving vision models for visual understanding, we're moving toward a web where AI agents can participate as capable, well-behaved citizens rather than screen-scraping tourists.
The proposal is early. Browser vendors haven't committed to native navigator.modelContext support yet. The security model needs more real-world testing. Cross-origin tool sharing raises questions that don't have easy answers. But the direction is clear: the web needs a structured way to communicate with AI agents, and WebMCP is the most thoughtful proposal I've seen for how to do it.
If you're building web applications today, experimenting with WebMCP tools through MCP-B is a low-risk way to prepare for this future. The tools you define now will likely map directly to whatever standardized API emerges, because the fundamental primitives - named tools, typed schemas, structured responses - are already well-established in the MCP ecosystem.