Robots.txt

A plain text file at the root of a website (e.g., example.com/robots.txt) that instructs search engine crawlers which pages or sections they are and are not allowed to crawl.

Updated June 8, 2026

TL;DR

Robots.txt is a text file that tells crawlers what to access and what to skip. Misconfigured robots.txt files are one of the most common causes of pages not appearing in search results.

Key Points

Robots.txt controls crawl access, not indexing — blocking a URL in robots.txt doesn't guarantee it won't be indexed if it has external links pointing to it

The 'Disallow: /' directive blocks all crawlers from all pages — a common misconfiguration that can accidentally de-index an entire site

Different crawlers can be targeted with different rules using the User-agent directive (e.g., Googlebot, Bingbot)

Robots.txt can specify the location of your XML sitemap to help crawlers discover your content structure

Robots.txt Syntax and Structure

A robots.txt file consists of user-agent blocks and directives[1]. User-agent specifies which bot the rules apply to (* means all bots). Disallow specifies paths the bot should not crawl. Allow overrides Disallow for specific sub-paths. Sitemap specifies the location of your XML sitemap. Example: User-agent: * / Disallow: /admin/ / Sitemap: https://example.com/sitemap.xml. Google Search Console's robots.txt tester validates your file and shows how Googlebot would interpret it — always test before deploying changes to avoid accidental Crawlability blocks.

What to Block in Robots.txt

Good candidates for blocking include internal search result pages (which create infinite URL variations that waste crawl budget), admin/login pages, duplicate content generated by filters and sorting parameters, and confirmation pages[1]. Blocking these preserves crawl budget for your most important pages and prevents low-value URLs from diluting your search index. Critically, avoid blocking CSS, JavaScript, or image files — Googlebot needs to render pages fully to assess their quality and your E-E-A-T signals.

The Robots.txt vs. Noindex Distinction

A critical distinction: robots.txt blocks crawling, while a noindex meta tag controls Indexing. If you block a page in robots.txt, Googlebot can't read the noindex tag either — meaning the page may still appear in search results if other sites link to it (Google will just know less about its content)[1]. For pages you want excluded from search results, use a noindex meta tag (and don't block it in robots.txt). Use robots.txt to manage crawl budget, not to hide content from search results.

Put it into practice

Skribra automates your SEO content pipeline — from keyword research to published articles — so you can apply these concepts at scale.

Try Skribra Free