Free Tool

Free AI Crawler robots.txt Generator

Generate a robots.txt that blocks or allows AI crawlers like GPTBot, ClaudeBot, and Google-Extended. Control AI training while keeping answer-engine citation traffic. Free, no signup.

training scrapes content for model training. search is live retrieval that can send citation traffic - many sites allow these.

# robots.txt - AI crawler rules
# Generated with U2L AI (u2l.ai/tools/ai-robots-txt-generator)

# --- Blocked AI crawlers ---
# OpenAI (training)
User-agent: GPTBot
Disallow: /

# Anthropic (training)
User-agent: ClaudeBot
Disallow: /

# Google Gemini (training)
User-agent: Google-Extended
Disallow: /

# Common Crawl (training)
User-agent: CCBot
Disallow: /

# ByteDance / TikTok (training)
User-agent: Bytespider
Disallow: /

# Meta AI (training)
User-agent: Meta-ExternalAgent
Disallow: /

# Amazon (training)
User-agent: Amazonbot
Disallow: /

# Apple Intelligence (training)
User-agent: Applebot-Extended
Disallow: /

# Cohere (training)
User-agent: cohere-ai
Disallow: /

# Diffbot (training)
User-agent: Diffbot
Disallow: /

# --- Explicitly allowed AI crawlers ---
# OpenAI (search)
User-agent: OAI-SearchBot
Allow: /

# OpenAI (user fetch)
User-agent: ChatGPT-User
Allow: /

# Anthropic (search)
User-agent: Claude-SearchBot
Allow: /

# Anthropic (user fetch)
User-agent: Claude-User
Allow: /

# Perplexity (index)
User-agent: PerplexityBot
Allow: /

# Perplexity (user fetch)
User-agent: Perplexity-User
Allow: /

# Standard crawlers (Googlebot, Bingbot) are unaffected by the rules above.

Place this at the root of your site (yourdomain.com/robots.txt). robots.txt is advisory - compliant crawlers honor it, but it does not technically block access.

No signup required
Free forever
GDPR compliant
Powered by U2L

Quick Answer

An AI robots.txt generator builds the User-agent and Disallow rules that tell AI crawlers whether they may access your site. Toggle each crawler - block the training scrapers (GPTBot, ClaudeBot, CCBot, Google-Extended) while allowing live-retrieval search agents that drive citation traffic - and copy the resulting robots.txt. The U2L AI Robots.txt Generator runs in your browser. Free, no signup.

Quick Facts

  • robots.txt lives at the root of your site (yourdomain.com/robots.txt) and lists which user-agents may crawl which paths.
  • Block a crawler with two lines: User-agent: GPTBot then Disallow: /. Allow with Disallow: (empty) or Allow: /.
  • Critical 2026 distinction: training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) feed model training; search crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot) drive citation traffic.
  • Many sites block training but allow search, keeping their content out of training data while staying citable in AI answers.
  • Each operator runs separate bots that need separate rules - blocking ClaudeBot does not block Claude-User or Claude-SearchBot.
  • robots.txt is advisory: compliant crawlers honor it, but it is not a technical access block.
  • Browser-only and instant - the rules are built locally and never sent to U2L servers.

How to make an AI robots.txt

Choose what to block, copy the file, deploy it.

  1. 1

    Pick a strategy or toggle crawlers

    Use a preset like 'Block training, allow search', or toggle each AI crawler individually between Blocked and Allowed based on your goals.

  2. 2

    Review the generated rules

    The robots.txt updates live, grouping blocked and allowed crawlers with comments showing each operator. Confirm the rules match your intent.

  3. 3

    Deploy robots.txt to your site root

    Copy the file and place it at yourdomain.com/robots.txt. If you already have a robots.txt, merge these AI rules into it rather than replacing it.

What is a AI Robots.txt Generator?

AI Robots.txt Generator is a tool that builds the robots.txt rules controlling AI crawlers - the bots that scrape your site for model training and for AI answer engines. You decide, per crawler, whether to block or allow it, and the generator outputs a ready-to-deploy robots.txt with the correct User-agent and Disallow directives.

robots.txt is a decades-old standard: a plain-text file at your site root that tells well-behaved crawlers which parts of your site they may access. The Robots Exclusion Protocol matches rules by user-agent, so you can set different policies for Googlebot, GPTBot, and any other named crawler.

AI changed the stakes. Companies like OpenAI, Anthropic, Google, and Perplexity run crawlers that ingest the open web - some to train models, others to fetch live pages for answer engines. Publishers increasingly want to opt out of training (to protect their content) while staying visible in AI answers (which can send referral traffic). Expressing that nuance requires knowing which bot does what.

Site owners, SEO and GEO specialists, and developers use this to draft an AI policy quickly and correctly - blocking training scrapers, allowing citation crawlers, or blocking everything - without memorizing the ever-growing list of AI user-agent strings.

How does a AI Robots.txt Generator work?

The generator keeps a curated list of the major AI crawlers and, for each, whether it is a training scraper or a live-retrieval search agent. You toggle each one to Blocked or Allowed. For every blocked crawler it emits a User-agent line plus Disallow: /, which tells that bot not to crawl any path. For allowed crawlers it can emit an explicit Allow: / so your intent is documented.

The training-versus-search distinction is the heart of a good AI policy. Training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider, and others) gather content used to train models. Search crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot, and user-fetch agents like ChatGPT-User and Claude-User) retrieve pages to answer live queries and can cite you, sending traffic. The presets let you block one group while allowing the other.

Because each operator runs multiple independent bots, the rules are per user-agent. Blocking ClaudeBot does not block Claude-SearchBot or Claude-User - each token needs its own directive. The generator lists them separately so you do not accidentally block citation traffic while trying to opt out of training, or vice versa.

Everything runs in your browser; nothing is sent to U2L. The output is standard robots.txt syntax you deploy at your site root. Remember that robots.txt is advisory - the major, reputable AI crawlers document that they honor it, but it expresses a preference rather than enforcing a technical block, so combine it with server-side controls if you need hard enforcement.

Use Cases

How marketers, businesses, and developers use ai robots.txt generator.

Opt out of AI training but stay citable

Block GPTBot, ClaudeBot, CCBot, and Google-Extended while allowing the search agents, so your content stays out of training data but still appears in AI answers.

Block all AI crawlers

Publishers who want zero AI access can block every listed crawler in one click, producing a robots.txt that opts out of training and answer-engine retrieval alike.

Protect premium or paywalled content

News sites and subscription publishers block training scrapers to keep paid journalism out of free model training while preserving their search visibility.

GEO strategy for citation traffic

Generative Engine Optimization teams deliberately allow answer-engine crawlers so their brand gets cited in ChatGPT, Perplexity, and Gemini answers.

Reduce server load from aggressive bots

Block high-volume scrapers like Bytespider that hammer servers, cutting crawl-budget waste and bandwidth costs without touching legitimate search bots.

Set an AI policy for a new site

Spin up a clear, documented AI crawler policy at launch instead of leaving the default 'anything goes' that lets every model ingest your content.

Agencies standardizing client policies

Agencies generate a consistent AI robots.txt template across client sites, applying the same block/allow stance everywhere with per-client tweaks.

Comply with content-licensing decisions

When legal or licensing teams decide content cannot be used for AI training, encode that decision as enforceable-by-convention robots.txt rules.

Audit and update an existing robots.txt

Review which AI bots you currently allow, then regenerate an up-to-date policy that includes newer crawlers you may not have known existed.

AI Robots.txt Generator vs Alternatives

Side-by-side feature and pricing comparison with the top alternatives.

FeatureU2LManual robots.txtGeneric generatorDefault (no rules)
Free, no signup
Training vs search crawler distinctionManual
Up-to-date 2026 AI user-agent listManualVaries
One-click block/allow presetsLimited
Per-bot allow to keep citation trafficManual
Browser-only (no data sent)VariesN/A

AI Robots.txt Generator vs Writing robots.txt by hand

You can write AI crawler rules by hand if you know every current user-agent and which ones train versus retrieve. It is free and fully under your control.

The challenge is the moving target: operators add and rename bots regularly, and the training-versus-search split is easy to get wrong. The generator keeps a curated list and labels each bot, so you do not accidentally block the citation traffic you wanted to keep.

AI Robots.txt Generator vs A generic robots.txt generator

General robots.txt generators handle sitemap directives and standard crawler rules, which is useful for classic SEO configuration.

Most do not model the AI-specific nuance - the separate training and search agents per operator, or the strategy of blocking one while allowing the other. This tool is purpose-built for that AI policy decision, so the output reflects a real 2026 stance rather than a generic block list.

Best Practices

Decide training vs citation first

Pick your goal before toggling: keep content out of training, stay visible in AI answers, or both. The 'Block training, allow search' preset encodes the common middle ground.

Set rules for every bot an operator runs

Blocking GPTBot does not block OAI-SearchBot or ChatGPT-User. List each user-agent explicitly so your policy is complete and unambiguous.

Merge into your existing robots.txt

If you already have a robots.txt with sitemap and crawler rules, add these AI sections to it rather than overwriting and losing your current directives.

Keep the file at the site root

robots.txt only works at yourdomain.com/robots.txt. A file in a subdirectory is ignored by crawlers.

Revisit the list periodically

AI operators launch new crawlers often. Regenerate every few months so newly introduced bots are covered by your policy.

Pair with server-side enforcement if needed

robots.txt is advisory. If you need hard blocking, combine it with firewall, WAF, or user-agent rules at the server or CDN layer.

Document your intent with comments

Keep the operator comments the generator adds. Future maintainers (and you) will know why each bot is blocked or allowed.

Allow citation crawlers if you want AI traffic

If appearing in ChatGPT, Perplexity, or Gemini answers matters to you, leave the search and user-fetch agents allowed - blocking them removes you from those answers.

Common Mistakes to Avoid

Assuming one rule covers an operator

OpenAI, Anthropic, and others run several bots. Blocking only the training bot still leaves search and user-fetch agents free - and vice versa.

Accidentally blocking citation traffic

Blocking every AI bot also removes you from AI answer engines. If referral traffic from AI matters, keep the search agents allowed.

Placing robots.txt in the wrong location

It must be at the domain root. A robots.txt inside a folder or subpath is never read, so the rules silently do nothing.

Overwriting an existing robots.txt

Replacing your whole file with only AI rules can drop your sitemap line and existing crawler directives. Merge instead of replace.

Treating robots.txt as a security control

It is a preference, not a lock. Non-compliant scrapers can ignore it, and the file itself is public. Use server-side controls for true enforcement.

Using a wildcard expecting it to catch AI bots

User-agent: * sets a default but named AI bots often follow their own specific rules. Name each AI crawler explicitly to be sure.

Technical Specifications

File locationRoot of the domain: yourdomain.com/robots.txt
Block syntaxUser-agent: NAME then Disallow: /
Allow syntaxUser-agent: NAME then Allow: / (or empty Disallow)
Training crawlersGPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider, and more
Search crawlersOAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User, Claude-User
StandardRobots Exclusion Protocol (matched by user-agent)
EnforcementAdvisory - honored by compliant crawlers only
PrivacyBuilt in your browser. No data sent to U2L.

Frequently Asked Questions

How do I block AI crawlers in robots.txt?

Add a block per crawler: User-agent: GPTBot on one line, Disallow: / on the next. Repeat for each AI bot you want to block, such as ClaudeBot, CCBot, and Google-Extended. The generator builds all of these for you.

What is the difference between training and search AI crawlers?

Training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) gather content to train AI models. Search crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot) retrieve pages live to answer queries and can cite you, sending referral traffic.

Should I block GPTBot?

Block GPTBot if you do not want your content used to train OpenAI models. To still appear in ChatGPT's answers, leave OAI-SearchBot and ChatGPT-User allowed - they handle live retrieval rather than training.

Does blocking ClaudeBot block all of Anthropic's bots?

No. ClaudeBot is the training scraper. Claude-SearchBot (search indexing) and Claude-User (user-initiated fetches) are separate user-agents that need their own rules. Block each one you want to exclude.

Will robots.txt actually stop AI companies?

It is advisory. The major, reputable AI crawlers publicly state they honor robots.txt, so blocking them is effective for those. It does not technically prevent access, so non-compliant scrapers can ignore it - use server-side controls for hard enforcement.

Where do I put the robots.txt file?

At the root of your domain: yourdomain.com/robots.txt. Crawlers only check that exact location. A file placed in a subfolder or subpath is ignored.

Can I block AI training but still appear in AI search results?

Yes - that is the most common strategy. Block the training crawlers and allow the search and user-fetch agents. The 'Block training, allow search' preset sets this up automatically.

What is Google-Extended?

Google-Extended is the control for whether Google may use your content to train its Gemini models and related AI features. It is separate from Googlebot, so blocking Google-Extended does not affect your normal Google Search ranking.

Does blocking AI crawlers hurt my SEO?

Blocking AI-specific crawlers (GPTBot, Google-Extended, etc.) does not affect traditional search ranking, because Googlebot and Bingbot are separate. Only block Googlebot or Bingbot if you truly want out of those search engines.

What if I already have a robots.txt?

Merge these AI rules into your existing file rather than replacing it, so you keep your sitemap directive and current crawler rules. Add the AI User-agent blocks alongside what you already have.

How often should I update my AI robots.txt?

Every few months. AI operators introduce and rename crawlers regularly, so periodically regenerating ensures newer bots are covered by your block or allow policy.

What is GEO and how does this relate?

GEO (Generative Engine Optimization) is optimizing to appear in AI-generated answers. If GEO is a goal, you allow the answer-engine crawlers so AI systems can retrieve and cite your pages - this tool lets you allow exactly those.

Can I block AI bots but allow Googlebot?

Yes. robots.txt rules are per user-agent. Leave Googlebot and Bingbot unrestricted while adding Disallow rules for the AI crawlers you want to exclude. They operate independently.

Is Common Crawl (CCBot) an AI crawler?

CCBot builds the Common Crawl dataset, which many AI models train on. Blocking CCBot is a common way to keep your content out of a widely used training corpus, even though Common Crawl itself is not an AI company.

Does this generator send my choices anywhere?

No. The robots.txt is built entirely in your browser from your toggles. Your selections and the resulting file never leave your device.

Can a wildcard rule block all AI bots at once?

User-agent: * sets a default, but named AI crawlers typically follow their own specific user-agent rules and may ignore the wildcard. List each AI bot explicitly for reliable coverage, which is what this generator does.

Key Terms

robots.txt
A plain-text file at a site's root that tells crawlers, by user-agent, which paths they may access. Part of the Robots Exclusion Protocol.
User-agent
The identifier a crawler sends (GPTBot, ClaudeBot, Googlebot). robots.txt rules are matched against it to apply per-bot policies.
Training crawler
An AI bot that collects web content to train models, such as GPTBot, ClaudeBot, CCBot, and Google-Extended.
Search / retrieval crawler
An AI bot that fetches pages live to answer user queries and can cite sources, such as OAI-SearchBot, PerplexityBot, and Claude-User.
GEO
Generative Engine Optimization - optimizing content to be retrieved and cited by AI answer engines like ChatGPT, Perplexity, and Gemini.

Measure your AI referral traffic

Allowing answer-engine crawlers means AI citations can send you visitors. Use u2l.ai short links on the pages you promote to see exactly which clicks come from AI answers. Sign up free for link analytics.

Sign up free