✦ Generator · Validator · Checker

Build a Perfect Robots.txt
File in Seconds

Generate, validate, and test your robots.txt with one free tool. No coding or experience needed.

🛠️ Generator

✅ Checker and Validator

Build Your Directives

🔧

Rule Builder

Add Allow and Disallow directives

Type

User Agent

Path

Added Rules

No rules yet. Add your first directive above.

⚙️

Global Options

Sitemap, delays and quick toggles

Sitemap URL (recommended) Crawl Delay seconds, 0 = none

Block AI Crawlers GPTBot, Claude-Web, CCBot, Google-Extended

Block Bad Bots SemrushBot, AhrefsBot, MJ12bot, DotBot

Block wp-admin Standard WordPress admin protection

Allow All by Default Add explicit Allow: / for all bots

Your Generated File

📄

Configure your rules and click Generate

💡 Paste your existing robots.txt content below (or fetch it from a live URL) and click Validate to get a full diagnostic report.

📥

Input

Paste content or fetch from a URL

Fetch from URL

Or Paste Content

Test a Specific URL

🔍

Paste content and click Validate

🗺️

// Free SEO Tool

XML Sitemap Generator

Generate sitemaps for WordPress, Blogger, Wix, Shopify and more. Export XML, RSS, or TXT then submit to Google Search Console in minutes.

→

// What You Need to Know

The Complete Guide to robots.txt

A robots.txt file is one of the most powerful and most misunderstood files on your website. Sitting quietly at yourdomain.com/robots.txt, it acts as a set of instructions for search engine bots, telling them which pages they can crawl, which to skip, and how quickly to do it. Get it right and you will have a leaner, more efficiently indexed site. Get it wrong and you could accidentally hide your entire website from Google.

How robots.txt Actually Works

When a search engine bot like Googlebot visits your site, the very first thing it checks is your robots.txt file. It reads the rules top to bottom, matching itself against User-agent directives. The * wildcard applies to all bots, while named agents like Googlebot or Bingbot get their own specific rules. Critically, robots.txt is a request, not a lock. A well-behaved bot respects it, but malicious scrapers may not.

✅ GOOD — Basic Setup

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /private/
Disallow: /?s=

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

❌ BAD — Blocks Everything

This blocks ALL crawlers!
User-agent: *
Disallow: /

No Sitemap directive
No Allow rules
No per-bot rules
This will de-index
your entire website!

Key Directives Explained

🤖

User-agent

Specifies which bot the rules apply to. Use * for all bots, or name specific ones like Googlebot.

🚫

Disallow

Disallow: / blocks everything. Disallow: with no value allows all. Use sparingly.

✅

Allow

Overrides a Disallow for a specific path. Useful when blocking a folder but permitting one file inside it.

🗺️

Sitemap

Points bots to your XML sitemap. Speeds up indexing significantly. Always include this directive.

⏱️

Crawl-delay

Tells bots how many seconds to wait between requests. Note: Google ignores this directive entirely.

🔤

Wildcards

Use * to match any string and $ to match end of URL. Example: Disallow: /*.pdf$

What You Should Always Block

A well-configured robots.txt focuses Google's crawl budget on your most valuable pages. Blocking these common paths prevents wasted crawl budget on low-value or duplicate content: admin areas (/wp-admin/), search result pages (/?s=), duplicate parameters, staging subfolders, and any private or login-gated content.

As of 2024, many site owners are also choosing to block AI training crawlers like GPTBot, Claude-Web, and CCBot to prevent their content from being used as training data. The generator above includes a one-click toggle to block all major AI bots instantly.

robots.txt vs noindex — What Is the Difference?

robots.txt prevents bots from visiting a page entirely. noindex (a meta tag) allows the bot to crawl the page but tells it not to include it in search results. If you block a page with robots.txt and also add a noindex tag, Google cannot see the noindex tag, so the page might still appear in search results based on external links. For most cases, noindex is the safer, more precise choice. Use robots.txt only when you genuinely do not want a page crawled at all, such as admin panels or internal APIs.

🟢 robots.txt Best Practices

Always include your Sitemap: directive — it speeds up crawling dramatically
Never use robots.txt to hide sensitive data. Use password protection instead
Test every change in Google Search Console's robots.txt Tester before going live
Avoid blocking CSS and JS files — Google needs them to render your pages correctly
Use noindex meta tags for pages you want crawled but not indexed
Keep one robots.txt per domain and place it exactly at the root level

// Keep Learning