The Agent Discovery Problem

The assumption: that an AI agent visiting your website should figure out what your website does by reading it.

That sounds reasonable until you think about what "reading it" actually means at scale.

What agents actually do when they visit your site

Here's what happens when an AI agent — say, a shopping assistant helping someone find running shoes — encounters your ecommerce site for the first time:

It fetches your homepage. Gets HTML, parses text, discards markup. Doesn't find a structured product catalogue. Follows a few links. Loads product pages. Loads a category page. Tries a search. Parses the search results page. Makes 40–50 HTTP requests in a few seconds, each one loading your full page stack — HTML, JavaScript bundles, API calls, CDN assets — and using maybe 200 tokens of actual signal from all of it.

Then it does this again tomorrow when a different user asks the same question. And the day after. Every agent. Every time.

This isn't a hypothetical future problem. Cloudflare published data showing AI crawlers now drive over 10 billion automated requests per week. Thirty-two percent of all web traffic is automated. The really telling number: AI crawlers have over 90% unique URL ratios. Human traffic revisits pages — you reload your Twitter feed, you check your bank balance. AI crawlers don't. They hit every URL once and move on, which means they don't benefit from the caching infrastructure the entire web was built to rely on.

Your CDN is being billed. Your origin servers are being hit. Your cache is being churned. And the agent got back a few product names it could have gotten from a five-line JSON file.

The standard the web built for this problem. Twice.

We've been here before. The web has a track record of solving this kind of problem with a tiny file at a well-known path.

1994. robots.txt. Web crawlers started hammering servers in ways their operators didn't intend. The response: a convention. Put a file at /robots.txt. Tell crawlers what not to touch. No authentication. No schema. No machine-readable capability declaration. Just a blocklist. It took 28 years to get ratified as RFC 9309 in 2022, by which point every search engine had already been following it for decades through sheer convention.

2005. sitemap.xml. Search engines wanted a better signal for what existed on a site, instead of crawling everything and hoping. The response: another file. Put it at /sitemap.xml. Tell crawlers where your pages are. Again, no structured capability declaration. Just a map.

Both of these files answered a negative question. robots.txt says "don't go here." sitemap.xml says "these pages exist." Neither one answers the question an agent actually needs answered:

"What can this site do?"

The approaches that got close but didn't solve it

Before proposing W2A, it's worth being honest about what already exists and why each piece falls short.

Schema.org JSON-LD is genuinely useful and W2A's generator reads it. If you embed {"@type": "Product"} JSON-LD on your pages, an agent can infer you sell things. But Schema.org is a vocabulary for describing what things are, not what a site can do. It has no concept of API endpoints, authentication requirements, rate limits, or machine-callable actions. A site with beautiful Schema.org markup is still opaque to an agent that wants to add something to a cart.

OpenAPI/Swagger specs are excellent for developers integrating with an API. If you publish one, W2A will find it automatically and import your endpoints directly. But OpenAPI solves half the problem: it covers API-first sites and explicitly exposes API developers, but the 150 million WordPress sites, Shopify stores, blogs, and SaaS products in the world aren't publishing OpenAPI specs. They're not even thinking about it.

llms.txt is a recent proposal that asks site owners to put a human-readable summary at /llms.txt for LLMs to read. Well-intentioned. But "human-readable summary for an LLM" is still just text. An agent reading it knows what your site is about but still can't call anything. It's documentation, not an interface.

A2A's agent-card.json (Google's Agent2Agent protocol, now under the Linux Foundation) is the closest thing in the ecosystem to what W2A does, and it's important to be precise about the distinction. A2A's /.well-known/agent-card.json describes a running agent service — an AI with a JSON-RPC endpoint that other agents can send tasks to. It solves agent-to-agent communication. Your Stripe account doesn't have an A2A agent. Your Shopify store doesn't have one either. A2A is infrastructure for companies that have already built custom agents. W2A is for every website that hasn't and won't.

MCP (Anthropic's Model Context Protocol) solves the tool layer. It lets agents connect to tools in a standardised way. It doesn't solve discovery of what a public website can do without prior setup from both sides.

The gap

None of these answer the question from the perspective of a website owner with zero engineering resources who just wants agents to understand their site.

The actual gap in the stack

Draw the emerging agent-web protocol stack honestly:

Cloudflare edge

request identity, rate limiting, payment (HTTP 402) RFC 9421 · Pay Per Crawl · content negotiation

W2A

discovery — what can this site do? /.well-known/agents.json · open standard · Apache 2.0

A2A

agent-to-agent communication /.well-known/agent-card.json · Linux Foundation

MCP

tool protocol agent ↔ tools · Anthropic

Rover / CUA

execution — acting on pages DOM-native · screenshot-based · rtrvr.ai

HTTP / TLS

transport unchanged since 1991

W2A sits at the discovery layer. Before any execution happens, before any tool is called, before any agent-to-agent handoff — an agent checks what a site declares it can do. That's the layer that was missing.

W2A: a single file at /.well-known/agents.json

The design philosophy is deliberately boring. No new transport protocol. No authentication scheme. No agent runtime. One JSON file, served at a well-known path, following the same RFC 8615 convention that A2A and every other .well-known/ standard uses.

Here's what it looks like for a real site:

{
  "w2a": "1.0",
  "site": {
    "name": "Acme Store",
    "type": "ecommerce",
    "language": "en"
  },
  "skills": [
    {
      "id": "search_products",
      "intent": "Find products by keyword or category",
      "action": "GET /api/search",
      "input": { "q": "string", "category": "string?" },
      "output": { "items": "Product[]", "total": "int" },
      "auth": "none"
    },
    {
      "id": "add_to_cart",
      "intent": "Add a product to the shopping cart",
      "action": "POST /api/cart/items",
      "input": { "sku": "string", "qty": "int" },
      "output": { "cart_id": "string", "subtotal": "float" },
      "auth": "session"
    }
  ],
  "policies": {
    "rate_limit": "60/min",
    "allowed_agents": ["*"]
  }
}

Before visiting anything else, an agent checks /.well-known/agents.json. It finds out what the site can do, which endpoints exist, what they expect, what they return, and whether they need authentication. One targeted call instead of fifty exploratory ones.

The intent field is the most important design decision in the spec. It's not documentation. It's the signal an agent uses to decide which skill to invoke. Write it as a plain English description of what the action does, not what the endpoint is called. "Find products by keyword or category" is correct. "Calls the search endpoint" is useless.

A2A compatibility without extra work

The agents.json format includes an optional a2a_profile block that maps to a valid A2A AgentCard. Any A2A client — LangChain, any agent framework that supports A2A discovery — can read your agents.json and treat your site as a node in the A2A ecosystem without modification.

You don't build a custom agent. You don't run a JSON-RPC server. You serve one file and get A2A compatibility automatically.

"a2a_profile": {
  "name": "Acme Store Agent",
  "url": "https://acme.com/.well-known/agents.json",
  "version": "1.0",
  "provider": { "organization": "Acme Inc" },
  "capabilities": {
    "streaming": false,
    "pushNotifications": false
  },
  "defaultInputModes": ["application/json"],
  "defaultOutputModes": ["application/json"]
}

The state of adoption right now

Here's the honest picture. We checked the major platforms on April 14, 2026:

# check any site for W2A support
curl "https://w2a-protocol.org/api/check?url=stripe.com"

stripe.com✗ not enabled

shopify.com✗ not enabled

langchain.com✗ not enabled

vercel.com✗ not enabled

openai.com✗ not enabled

anthropic.com✗ not enabled

huggingface.co✗ not enabled

github.com✗ not enabled

w2a-protocol.org✓ valid · 6 skills · A2A compatible

The only site on the internet with a valid agents.json right now is the one that defined the spec. This is not embarrassing — it's the natural state of any protocol in its first days. robots.txt was also only on one server in 1994. The question is whether the problem is real enough that adoption follows.

The signals say yes. An AWS engineer asked to circulate it internally the same day it launched. An independent analysis from Grok described it as "absolutely needed." @b_kalisetty from rtrvr.ai published a research paper this month saying "nobody built the layer that gives websites a say" — the day before W2A went live.

Generating one in 30 seconds

The barrier to adoption has to be zero. That's why we built a generator at w2a-protocol.org/tools.

Enter your URL. The generator fetches your homepage, reads Schema.org JSON-LD blocks, reads Open Graph tags, probes 24 common OpenAPI/Swagger paths, detects embedded Swagger UI spec URLs, parses your sitemap for URL patterns, and reads HTML forms. If your site has an OpenAPI spec — even one you didn't actively publish — the generator finds it.

We tested it on petstore.swagger.io: 8 real endpoints from the Swagger spec, correct HTTP methods, typed inputs, proper auth, at 92% confidence. No manual authoring.

For sites with no machine-readable signals, the generator returns a skeleton with a clear confidence score and specific instructions. It doesn't silently hallucinate endpoints.

What we're not solving

Honesty here matters more than hype.

Authentication flows. The auth field declares what kind of auth a skill needs. It does not handle the credential exchange. The agent still needs to obtain credentials through whatever mechanism the site provides.

Dynamic capabilities. Right now, agents.json is a static file. A site whose available actions change based on the logged-in user can't fully express that. Federation via the federation block is a path toward it.

Enforcement. An agent can ignore agents.json the same way a bot can ignore robots.txt. W2A is a declaration protocol for well-behaved agents, not a firewall.

Get involved

The spec is at github.com/Nijjwol23/w2a. Apache 2.0, same license as A2A — intentionally. Open standard, no single company owns it.

The live proposal is in the A2A GitHub discussions. The question for the A2A community: should W2A register /.well-known/agents.json as a companion discovery mechanism in the A2A ecosystem?

The web got its crawler layer in 1994. Its search layer in 2005. The agent layer is being built right now. This is your chance to be in the first commit.

Add agents.json to your site

Enter your URL, generate your manifest, deploy one file. Under two minutes.

Open the generator Read the spec →

W2A Protocol v0.1 · Apache 2.0 · April 2026

w2a-protocol.org github.com/Nijjwol23/w2a @oai_official