What Is llms.txt and How to Implement It for AI Bots (2026 Guide)

llms.txt is a small text file at the root of your website that points language models to your most useful pages. A junior developer can ship a clean one in about an hour. OpenAI, Anthropic, Stripe, Cloudflare, and Mastercard have all done it. Shipping the file alone will not move your ChatGPT citations next week, but for an hour of work, it earns its place in a serious plan for being visible to AI assistants and agents.
This guide covers what llms.txt is, what the latest industry studies show, why every major developer platform has shipped one anyway, how to implement it well, and where it fits in a real Answer Engine Optimization (AEO) program.
Key takeaways
- llms.txt is a short Markdown file at your site root that gives AI assistants a curated list of your most useful pages. It uses an H1, a blockquote summary, and H2 sections of links. A junior developer can ship a good one in an hour.
- It is not the same as robots.txt or sitemap.xml. robots.txt controls crawler access. sitemap.xml lists every URL you want indexed. llms.txt curates a high-signal reading list for AI tools.
- Studies across 300,000 domains (SE Ranking), 62,100 AI bot visits (Otterly.AI), 37,894 AI-cited domains (Trakkr), and 500 million-plus bot events (Limy.AI) show no measurable lift in AI search citations from llms.txt by itself.
- Major developer platforms ship it anyway. OpenAI, Anthropic, Stripe, Cloudflare, Mastercard, Vercel, and the Microsoft Teams SDK use llms.txt as a routing layer for AI coding agents. Chrome Lighthouse 13.3 (released May 7, 2026) now audits for the file.
- Real AI visibility comes from content quality, schema, accurate listings, semantic HTML, and earned media. The Princeton and Georgia Tech GEO paper found citations, direct quotes, and statistics lift source visibility by up to 40%. Yext found 86% of 6.8 million AI citations come from brand-controlled sources.
- AI-referred traffic now converts 42% better than non-AI traffic in US retail (Adobe, March 2026) and grew 393% year over year in Q1 2026. The audience is small but worth the work.
- Ship llms.txt as cheap infrastructure inside a serious AEO program. Do not treat it as a citation lever.
What is llms.txt?
llms.txt is a proposed open standard for helping large language models read your website. Jeremy Howard, co-founder of Answer.AI and fast.ai, published the proposal on September 3, 2024. The file lives at the root of your domain, for example https://example.com/llms.txt.
The problem it tries to solve is simple. Many AI assistants visit a site at the exact moment a user asks a question. They have limited memory (called a context window) and limited time to read pages full of navigation, scripts, and ads. The official spec puts it this way:
Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.
Source: llmstxt.org
A clean, curated Markdown index gives an AI tool a faster path to your best material than scraping a full HTML page. The spec calls for a short, predictable structure:
- An H1 with your site or product name.
- A blockquote with a one-line summary.
- Optional notes that explain how to use the file.
- H2 sections, each with a Markdown link list. One bullet per page, with a short description.
- An "Optional" H2 section for lower-priority links that agents can skip when context is tight.
It is plain text in Markdown, UTF-8, served with text/plain or text/markdown and an HTTP 200 response. URLs inside the file should be absolute HTTPS.
How is llms.txt different from robots.txt and sitemap.xml?
This is the most common point of confusion. The three files solve different problems. robots.txt controls access. sitemap.xml lists everything you want indexed. llms.txt curates a short, high-signal reading list for AI tools.
A quick way to remember it: robots.txt tells bots what they can read. sitemap.xml tells them everything that exists. llms.txt tells them what is most worth their time.
What does the latest data say about llms.txt?
If the question is whether the file alone will boost your citations in ChatGPT or Google AI Overviews this quarter, the public data points to no. Direct citation lift from llms.txt by itself does not show up in the studies that have been published.
SE Ranking analyzed about 300,000 domains in late 2025. Adoption was 10.13%, balanced across traffic tiers. Of the 50 most AI-cited domains, only 1 had the file. Their headline finding:
Both statistical analysis and machine learning showed no effect of LLMs.txt on how often a domain is cited by LLMs. Removing this variable from our XGBoost model actually improved its accuracy.
Source: SE Ranking, 300,000-domain study, 2025
Otterly.AI ran a 90-day server-log experiment. Out of more than 62,100 AI bot visits, only 84 (about 0.1%) targeted the llms.txt file. The file received roughly a third of the AI traffic of a typical content page.
Trakkr's analysis of 37,894 AI-cited domains reported a citation advantage of zero for sites with llms.txt versus those without. Limy.AI monitored over 500 million AI bot events across 90 days and found only 408 targeted llms.txt directly.
Search Engine Land's 10-site before-and-after test tracked sites for 90 days on each side of implementation. Eight saw no change. Two grew, but the publication attributed the gains to other simultaneous work (PR pushes, FAQ launches, technical SEO fixes), not the file. One declined by 19.7%.
Google's own Search team published new generative-AI guidance on May 15, 2026. Under "Mythbusting generative AI search," it groups llms.txt with other markup that is not required for AI features, and notes that AI features run on the same ranking systems as traditional Search.
That covers the direct-citation question. The separate question worth asking is why every major developer platform keeps shipping the file anyway. The answer there is more interesting.
Why are major developer platforms shipping llms.txt anyway?
The pattern across companies that build AI themselves is hard to brush off.
OpenAI publishes llms.txt for its developer docs. Anthropic publishes one for Claude Code documentation. The Claude Code docs repeat a near-identical sentence at the top of many pages:
Fetch the complete documentation index at: https://code.claude.com/docs/llms.txt. Use this file to discover all available pages before exploring further.
Source: Anthropic, Claude Code documentation
Mastercard Developers ships an agentic llms.txt alongside a paired llms-full.txt that contains full API references, parameters, and working code samples. Mastercard's own line on it:
The llms.txt file is automatically updated whenever the documentation changes.
Source: Mastercard Developers documentation
Mintlify rolled out llms.txt across all hosted docs sites on November 20, 2024, which gave thousands of platforms (Anthropic, Cursor, Coinbase, Pinecone, and Windsurf among them) the file overnight. Stripe, Vercel, Cloudflare, Supabase, LangGraph, and the Microsoft Teams SDK all ship one on their developer docs.
A single case study worth knowing comes from dev5310, a German digital agency. In February 2026, they shipped a static llms.txt via Cloudflare Workers alongside JSON-LD structured data, then submitted the URL through Google Search Console. Three days later, Google AI Mode cited the file as the primary source for a brand query about their services. The author concedes the direct SEO impact remains zero, and notes that the file became what they call the “authoritative identity layer” Google used to anchor other sources for that brand query. That said, this is one site that already had strong JSON-LD structured data in place. Treat the result as suggestive, not as proof.
The shared pattern: llms.txt is doing real work today as a routing layer for AI coding agents (Cursor, Claude Code, GitHub Copilot, Cline, Windsurf, Aider) and for any agent that needs a clean map of a site. People sometimes call this a Business-to-Agent (B2A) use case. It is small today and growing.
Why does Google now check for it in Chrome Lighthouse?
Two Google product teams published guidance pointing in different directions, five days apart, in May 2026.
On May 15, Google Search said you do not need llms.txt for AI Overviews or AI Mode. On May 20, Search Engine Land reported that Chrome Lighthouse had added an “Agentic Browsing” category that audits for the file's presence. The official Chrome developer documentation describes the file as "a machine-readable summary of a website's content, specifically designed for LLMs and AI agents" and adds:
Without llms.txt, agents may spend more time crawling the site to understand its high-level structure and primary content.
Source: Chrome Lighthouse, Agentic Browsing audits, 2026
Lighthouse 13.3 (released May 7, 2026) moved Agentic Browsing from experimental into the default config, so the category now appears in regular Lighthouse runs and is rolling out to PageSpeed Insights and Chrome DevTools. The audit returns pass/fail signals per check, not a single 0-to-100 score.
Both positions can be true at once. Search is talking about ranking signals for AI Overviews. Chrome is talking about discoverability for the agentic browsing layer it is building. Two products, two jobs.
Where does llms.txt fit in a real AEO program?
This is the part most guides skip past. llms.txt is one layer of infrastructure inside a broader program. The drivers of AI visibility are well-documented and reward serious work on content, data, and listings.
The Princeton and Georgia Tech "GEO: Generative Engine Optimization" paper (KDD 2024) tested nine content tactics across 10,000 queries. Five worked. Four did not. The findings:
We propose several ways to optimize content for generative engines and demonstrate that these methods can boost source visibility by up to 40% in generative engine responses. Among other findings, we show that including citations, quotations from relevant sources, and statistics can significantly boost source visibility.
Source: Aggarwal et al., GEO: Generative Engine Optimization, 2024
In other words, the content patterns that win in AI answers are credible content patterns: cite real sources, include direct quotes, use numbers. Keyword stuffing and thin AI-only tricks did not move the needle in the paper.
Yext analyzed 6.8 million AI citations across ChatGPT, Gemini, and Perplexity in 2025. About 86% came from brand-controlled or brand-influenced sources: 44% from first-party websites, 42% from listings, 8% from reviews and social, 6% from uncontrolled news or forums. Yext CEO Mike Walrath:
When brands control their data, they control their visibility.
Source: Mike Walrath, CEO, Yext
The traffic side matters too. Adobe's Q2 2026 AI Traffic Report found AI-referred traffic to US retailers grew 393% year over year in Q1 2026, and peaked at 1,151% year over year in December 2025. Adobe also reported AI traffic converted 42% better than non-AI traffic in March 2026, a record high and an 80-point swing from March 2025, when AI traffic was actually converting 38% worse.
Consumer behavior is moving with it. Bain & Company's 2025 survey found:
About 80% of consumers now rely on 'zero-click' results in at least 40% of their searches, reducing organic web traffic by an estimated 15% to 25%.
Source: Bain & Company, 2025
BrightEdge's data keeps the priorities grounded: AI still represents less than 1% of referral traffic, while organic Search continues to drive the bulk of visits and stronger conversions. Both numbers can be true. AI traffic is small but growing fast and converting better. Organic still pays the bills, and good SEO work is the foundation that AEO sits on.
Putting it together: llms.txt is a low-cost layer in a stack that also needs strong first-party pages, accurate listings, structured data, semantic HTML, accessibility, and credible earned media. Yoast's Principal SEO Carolyn Shelby framed the broader idea well when Yoast SEO 25.3 shipped its llms.txt generator in June 2025:
Ranking is no longer the prize. Inclusion is.
Source: Carolyn Shelby, Principal SEO, Yoast
How do you implement llms.txt the right way?
The technical bar is low. Quality is what separates a useful file from a noisy one.
Step 1. Plan a curated link list, not a sitemap dump.
The proposal is explicit that the file is for "expert-level information gathered in a single, accessible location." An audit of 30 production llms.txt files in May 2026 found the most common failure was treating the file as a second sitemap with 800 to 1,200 unsorted links. Aim for roughly 20 to 50 high-signal links.
Useful candidates:
- Core product or service pages.
- Pricing and plan pages.
- Buyer-risk policies (privacy, security, returns, shipping, warranty, SLAs).
- Technical documentation, API references, and getting-started guides.
- Comparison and selector pages.
- Top FAQs and help-center articles.
- "About," contact, and location pages.
- Selected case studies (often in the "Optional" section).
Step 2. Write the file.
Match the spec structure. Keep descriptions short and factual. Drop marketing language and superlatives. AI systems filter them out, and they crowd out the signal.
# Acme Industrial Pumps
> Acme Industrial Pumps designs, sells, and services pumps for food
> processing, municipal water, and chemical plants across the US and Canada.
Use this file to find authoritative pages for product selection,
technical specs, pricing, service coverage, and policies.
## Products
- [Pump selection guide](https://acmepumps.com/pump-selection-guide): Compare pump types by use case, flow rate, and environment.
- [Food-grade pumps](https://acmepumps.com/products/food-grade): Product range, certifications, and cleaning requirements.
- [Chemical transfer pumps](https://acmepumps.com/products/chemical-transfer): Product range, material options, and safety notes.
## Technical resources
- [Specification library](https://acmepumps.com/specs): Datasheets, dimensions, motor options, and performance curves.
- [Installation guide](https://acmepumps.com/installation): Setup instructions and maintenance requirements.
## Commercial
- [Pricing](https://acmepumps.com/pricing): Pricing model and quote request process.
- [Service areas](https://acmepumps.com/service-areas): Regions covered and local response times.
- [Warranty](https://acmepumps.com/warranty): Coverage terms and claim process.
## Optional
- [Case studies](https://acmepumps.com/case-studies): Selected customer stories and implementation examples.
- [Blog](https://acmepumps.com/blog): Educational articles and industry commentary.Step 3. Serve it correctly.
- Place it at the root:
https://yourdomain.com/llms.txt. - Return HTTP 200 over HTTPS.
- Send
Content-Type: text/plain; charset=utf-8. - Use UTF-8 encoding.
- Use absolute HTTPS URLs inside the file.
- Keep the file size reasonable. Around 10 KB for the index file is a good ceiling, so it does not burn agent context budget.
Step 4. Consider llms-full.txt and Markdown twin pages.
The spec also suggests publishing a concatenated llms-full.txt for sites whose content fits in a context window, and Markdown versions of important pages (often at the same URL with .md appended). This pattern works best for documentation, API references, and knowledge bases. Per a 14-day Otterly.AI test, Markdown twin pages received zero citations from consumer AI search engines while HTML pages earned citations. Markdown twins are a play for AI coding assistants today, not for AI Overviews.
Step 5. Keep it current.
A stale file is worse than no file. Yoast SEO 25.3 auto-refreshes the file weekly and prioritizes cornerstone content. Shopify, GitBook, Mintlify, and Fern provide similar automation. Quarterly manual review is a reasonable minimum if you build it by hand.
Anti-patterns to avoid
- Dumping your full sitemap into the file.
- Using marketing copy in descriptions instead of factual one-liners.
- Letting links go stale or pointing to dead pages.
- Steering agents to pages that robots.txt or auth blocks.
- Skipping the H1 or blockquote summary the spec calls for.
How should you handle AI bots in robots.txt?
llms.txt is not a permission system. Crawler control still belongs in robots.txt. The major providers now run separate bots for different jobs, and you can manage them independently.
OpenAI documents three independent crawlers:
- GPTBot is used for foundation-model training. OpenAI: "Disallowing GPTBot indicates your site's content should not be used in training generative AI foundation models."
- OAI-SearchBot indexes pages for ChatGPT search. OpenAI: "Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers."
- ChatGPT-User fetches pages when a user asks ChatGPT to read a specific URL.
Anthropic documents three primary crawlers (updated February 20, 2026):
- ClaudeBot is used for training. Anthropic: "When a site restricts ClaudeBot access, it signals that the site's future materials should be excluded from our AI model training datasets."
- Claude-SearchBot indexes for Claude's search results. Anthropic: "Disabling Claude-SearchBot on your site prevents our system from indexing your content for search optimization, which may reduce your site's visibility and accuracy in user search results."
- Claude-User fetches pages on direct user request inside Claude.
Anthropic confirms it honors robots.txt and supports the non-standard Crawl-delay directive.
A common policy pattern allows search visibility while blocking training collection. Adapt to your own preferences:
# OpenAI
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Disallow: /
# Anthropic
User-agent: Claude-SearchBot
Allow: /
User-agent: ClaudeBot
Disallow: /Two nuances to watch. First, blocking a crawler is not the same as keeping your page out of an AI answer. OpenAI has noted that if a disallowed URL is returned by a third-party search provider, ChatGPT search may still surface the title and link. To keep content out of results entirely, use noindex or auth controls. Second, do not contradict yourself. If you disallow a bot in robots.txt but link to the same paths from llms.txt, you are sending a mixed signal.
What about WebMCP, AGENTS.md, and the broader agent stack?
llms.txt is one file in a wider machine-readable layer forming around the agentic web. A few pieces worth knowing.
WebMCP (Web Model Context Protocol) is a W3C draft from the Web Machine Learning Community Group, co-developed by Google and Microsoft engineers, published on February 10, 2026. It adds a navigator.modelContext browser API that lets a site register callable tools, so an in-browser AI assistant can complete actions like search, book, configure, or check out without screenshot-and-click guessing. It is available in Chrome 146, which shipped to stable on March 10, 2026, still behind a feature flag. The relationship to llms.txt is complementary. llms.txt is for content. WebMCP is for actions. Lighthouse 13.3 audits both.
AGENTS.md is a separate open format launched in August 2025 by OpenAI (Codex), Amp, Google (Jules), Cursor, and Factory. It lives inside code repositories and gives coding agents project context: build commands, lint and test rules, style conventions, and “do not modify” boundaries. Per Addy Osmani, more than 60,000 open-source repos shipped an AGENTS.md by early 2026. AGENTS.md is execution context inside a repo. llms.txt is content discovery for a public site.
Addy Osmani, Director of Engineering at Google Cloud AI, defines the broader practice this way:
Agentic Engine Optimization (AEO) is the practice of structuring, formatting, and serving technical content so that AI coding agents can actually use it.
Source: Addy Osmani, Director of Engineering, Google Cloud AI
His framework calls out five practical levers: discoverability, parsability, token efficiency, capability signaling, and access control. He recommends front-loading the core answer in roughly the first 500 tokens of a page, with token budgets of around 15,000 for quick-start guides, 20,000 for conceptual guides, and 25,000 per API endpoint reference. For business sites, the version of that advice is the same: put your bottom line up front, keep pages focused, link out for depth.
What should you measure after you ship llms.txt?
Set realistic expectations. Measure with the same discipline you would apply to any technical change.
- Server logs: track fetches of
/llms.txtand the pages it links to, by user agent. Expect direct llms.txt fetches to be a small share of AI bot traffic today. - AI referrals in analytics: watch for
utm_source=chatgpt.comand similar referrers from OpenAI, Perplexity, and others. - Citation tracking: run recurring prompts on your top commercial questions across ChatGPT, Claude, Gemini, and Perplexity. See which sources appear in answers.
- Lighthouse Agentic Browsing: confirm the llms.txt audit passes. The category returns pass/fail per check, not a 0-to-100 score.
- Conversion data: AI-referred sessions tend to be higher intent. Adobe's 42% conversion lift figure is a benchmark to compare your own data against.
If you see large movements in AI citations right after shipping llms.txt, look hard at confounding work (new content, schema additions, earned media, PR pushes). Most of the gains other teams have reported track back to those, not the file.
So is llms.txt worth implementing or not?
For most businesses, yes. The cost is hours, not weeks. The downside is near zero when the file is well-curated and consistent with robots.txt. The upside is real for any company whose users include developers using AI coding assistants, any site that wants to pass Chrome's Lighthouse Agentic Browsing audit, and any business preparing for a web where autonomous agents do more of the browsing.
What llms.txt does not do today is single-handedly drive AI Overviews citations. The data is consistent on that point. Real AEO value gets built across several layers:
- High-quality first-party pages with original data, statistics, and direct quotes. The Princeton GEO paper measured up to 40% visibility gains from this pattern alone.
- Accurate, consistent listings and entity data. Yext shows about 86% of AI citations come from brand-controlled sources.
- Structured data (schema.org JSON-LD) across the site.
- Semantic HTML and a clean accessibility tree, which are the same signals the new Lighthouse Agentic Browsing audit checks for.
- Earned media coverage. Muck Rack's analysis put 85.5% of AI citations on earned-media sources.
- Crawl access that matches your intent, set in robots.txt.
- llms.txt and WebMCP as the agent-ready infrastructure on top.
Ship llms.txt. Treat it as one piece of an AEO program, not the program itself. The teams pulling ahead in AI visibility do the work across all of those layers, and that work is what turns AI traffic (which now converts 42% better than non-AI in US retail, per Adobe) into real pipeline.
If you want help building that full stack, this is the work we do every day for clients in 3PL providers, retail, SaaS, B2B services, and local multi-location brands. Ship the file. Then build the program that makes it pay off.
