Glossary

/

SEO Fundamentals

/

Duplicate Content

Duplicate Content

Substantively identical or very similar content that appears on multiple URLs, either within the same site or across different sites, which can confuse search engines about which version to index and rank.

Updated June 9, 2026

TL;DR

Duplicate content means the same (or near-identical) content exists at multiple URLs. Google doesn't penalize it directly, but it forces Google to choose which version to rank — and it may pick the wrong one. Fix it with canonicals or 301 redirects.

Key Points

Duplicate content is more often a technical accident (URL parameters, www vs non-www, HTTP vs HTTPS) than deliberate manipulation

Google selects one 'canonical' version of duplicate content to index and rank — it may not choose the version you prefer

Cross-site duplication (content scraped or syndicated from other sites) can cause Google to attribute the content to the wrong source

Near-duplicate content (the same article on 50 location pages with only the city name swapped) is treated similarly to exact duplicates

Common Causes of Duplicate Content

Most duplicate content is unintentional[1]. Common technical causes include: URL parameter variations (page.com/product?color=red and page.com/product?color=blue showing the same content), www vs non-www versions both being accessible, HTTP and HTTPS versions both resolving, printer-friendly page versions, and session ID parameters creating unique URLs for the same page. E-commerce sites are particularly vulnerable — a product available in multiple categories creates duplicate pages with different URL paths but identical content. CMS platforms that auto-generate tag, category, and archive pages can also create dozens of near-duplicate content collections. The solution for most technical duplicates is the canonical tag or a 301 Redirect.

How Google Handles Duplicates

Google groups duplicate pages into clusters and selects one as the canonical — the version it shows in search results[1][2]. This selection uses signals including: which version has more backlinks, which uses HTTPS, which is specified in a XML Sitemap, and which version Google has crawled more. You can guide this choice with the `rel=canonical` tag, but Google treats it as a hint, not a directive. When canonical selection goes wrong, you may find an internal duplicate outranking your preferred landing page, or an old AMP version being served instead of your main page.

Fixing and Preventing Duplicate Content

The primary fixes for duplicate content are: (1) canonical tags — add `` to all duplicate versions, pointing to the main page; (2) 301 redirects — permanently redirect all variants to the canonical URL; (3) consistent internal linking — always link to the canonical URL throughout your site, never to parameter variants; (4) `robots.txt` or `noindex` for URL parameter pages that should never be indexed[2]. For cross-site syndication, request that syndication partners add a canonical tag pointing back to your original. Run a Content Audit to find existing duplicates — crawl tools like Screaming Frog detect near-duplicate content by comparing content similarity scores across all your URLs.

Put it into practice

Skribra automates your SEO content pipeline — from keyword research to published articles — so you can apply these concepts at scale.

Try Skribra Free