TL;DR
Robots.txt is a text file that tells crawlers what to access and what to skip. Misconfigured robots.txt files are one of the most common causes of pages not appearing in search results.
Key Points
✓
Robots.txt controls crawl access, not indexing — blocking a URL in robots.txt doesn't guarantee it won't be indexed if it has external links pointing to it
✓
The 'Disallow: /' directive blocks all crawlers from all pages — a common misconfiguration that can accidentally de-index an entire site
✓
Different crawlers can be targeted with different rules using the User-agent directive (e.g., Googlebot, Bingbot)
✓
Robots.txt can specify the location of your XML sitemap to help crawlers discover your content structure
Robots.txt Syntax and Structure
User-agent: * / Disallow: /admin/ / Sitemap: https://example.com/sitemap.xml. Google Search Console's robots.txt tester validates your file and shows how Googlebot would interpret it — always test before deploying changes to avoid accidental Crawlability blocks.What to Block in Robots.txt
The Robots.txt vs. Noindex Distinction
SOURCES
Last updated: June 8, 2026
Related Terms
Crawlability
The ability of search engine bots to access, navigate, and read the pages on your website without encountering technical barriers.
Indexing
The process by which a search engine stores and organizes crawled web pages in its database so they can be retrieved and displayed in search results.
Canonical URL
An HTML tag that tells search engines which version of a page is the preferred, authoritative URL when multiple URLs serve the same or very similar content.
XML Sitemap
A file (typically in XML format) that lists all the important URLs on a website, helping search engines discover and crawl content more efficiently.
Put it into practice
Skribra automates your SEO content pipeline — from keyword research to published articles — so you can apply these concepts at scale.
Try Skribra FreeMore in Technical SEO
Categories