As with a lot of information on the Internet, there are too many myths on duplicate content for SEO. This blog post aims to dispel, squash, and dismiss any of these myths you may have seen before on other blog posts for duplicate content and SEO in 2021. More importantly, we’ll present to you the truths of duplicate content, so that you’ll know how to tackle duplicate content issues for your own site. Without further ado, let’s get right into it.
What is duplicate content?
Duplicate Duplicate Content. Spotted it?
To the human eye, spotting an identical piece of content can be easy. For search engines, it’s even easier: they crawl through pieces of content within or across domains, spotting any exact matches (or extremely similar — too similar).
If a piece of content is in more than one location on the internet, Google will spot it. Google and other search engines designate the location of that content to the URL on which the content sits. Duplicate content issues arise when there is more than one version of the content, in more than one location — a second URL. You’re probably here because there are a lot of myths regarding a duplicate content penalty. Let’s address this.
Duplicate Content Truths
Google’s official guidelines state that there are 2 types of duplicate content: malicious and non-malicious. Let’s start with the non-malicious duplicate content. This refers to a text that has indicated a similarity to another version of a block of content, or perhaps an exact match, on two or more domains. If this is the case for you; don’t panic. A lot of identical content can come through:
- Contact forms
- Discussion forums
- Blog posts
- Products (including product descriptions) that are being linked to by more than 1 URL
- Printer-only webpages
Now, if your text doesn’t fit into the above criteria, Google may consider the text as malicious, leading you to run into some duplicate content issues. Again, don’t panic! We’ll explain how to get around these tricky duplicates further down. Let’s first see the myths about duplicate content, and how Google and the other search engines may see it (it might look something like the photo below).
Duplicate Content Myths
3 statements, 3 myths. If you believe the below, think again.
“Google Has Duplicate Content Penalties”
In some rare cases, multiple exact versions of a text are used to purposely deceive Google et al., in an audacious bid to gain search engine rankings and increase traffic. Naughty, naughty.
To combat this, Google crawls through all the URLs looking for quality content, that is fresh, original, and relevant to the page. However, every so often, Google will find similar versions of content across more than one URL and perceive this as an attempt to manipulate their ranking algorithms, particularly if the websites in question seem to be owned by the same company.
Google may then change their indexing and search engine ranking protocol and choose one of these websites to rank. This could lead to your site losing organic rankings and not being indexed (in ergo, no one will find it — it’ll be removed from all search engine results pages, without a trace.)
To answer the question: “Duplicate content penalty: does it exist?” Yes — but only to wrong-doers. If you try and deceive Google by using multiple sites to boost visibility for the same key phrases, you may get penalised.
A more common problem for most site owners is when Google indexes the wrong page for your search phrase. Where you have multiple pages with similar content, Google will commonly try to choose the most relevant page to rank, mistakes however are not uncommon which could lead to a drop in rankings if they choose the wrong page.
For example, if you have two pages such as:
And each of these has similar content then Google may find it hard to know which one you’d like to rank, so will pick the one it thinks is most relevant. This can be problematic if for instance page-b is a newer version of the page that you want to rank and Google thinks that page-a is more relevant.
“My website has to include completely unique content”
The majority of search engines look for variety. Users don’t want to see several domains listed on the SERPs with exactly the same content. When Google crawls through a domain and finds a match in content to another domain, it will assemble the URLs into a single group, then select which URL it considers to be the most relevant match to the group.
If you have multiple sites with identical content, Google will only rank one of them, applying filters to the other domains. If this is one of the duplicate content issues you are experiencing, you may want to revise your marketing strategy; perhaps reconsidering how your content is structured across your domains. Below is a photo of someone making some notes — take heed.
In reality, having exclusively unique content across a domain or several domains is not always possible. For example, e-commerce sites can create duplicate versions of products and product descriptions. It’s sometimes difficult to have unique product descriptions… because they’re describing the same thing! Using product descriptions from manufacturers will not penalise you — in the way you are thinking. You won’t get a telling off from Google, but the duplicate content will not bring you any advantage in rankings, because it’s not original!
If you want to use the content to your advantage, create a unique product description and product URL. Any page is an opportunity for appearing in an organic search, so use this to your advantage.
“Scrapers can harm your rankings”
What are SEO content scrapers? They’re programs that crawl through a site, giving you all the information, they have on a certain URL. Google themselves scrape your URLs in order to incorporate your content into its index.
Search engine indexing is undertaken through scraping, but, as always, some people want to watch the world burn, scraping other people’s content to post on their site. Why? To outrank you!
This may seem cut-throat, but thankfully, search engines aren’t fond of it either: Google will penalise the scrapers for it, and not the site that had the content first. You should indicate to Google if you have reason to believe that your content has been scraped, as you don’t want any unwanted duplicate content issues that aren’t your fault.
Duplicate Content & Inbound Links
Let’s also take into account the potentially harmful effect content duplication has on the user. Imagine finding more than one version of the same content across multiple sites — annoying, right? This may affect a user’s site experience and lead them to go to a website where the content is unique and helpful.
And if that wasn’t enough: if other sites want to link to some quality copy but spot the duplicate pieces of content, they have to choose which one to link to, further reducing the link juice that is benefitted from an inbound link. And here’s why you should care about inbound links as much as Google does: they directly affect your organic ranking on search engine results pages. If you want to be as smiley as the fella below, keep on reading.
URL Parameters for Duplicate Content
Another key type of duplication is through URL parameters. It may be the case that URL parameters cause duplicate content, indicating to Google that there is a carbon copy of the page — and we don’t want Google to think that at all. This can happen on e-commerce sites, which use content filtering on a product URL. These can cause content duplication and may detract your product URL from appearing in an organic search, and the wrong URL being chosen by search engines. For more information, use this great article on how to handle duplicate content.
What can you do to combat duplicate content?
The Official Google Guidelines state that duplicate content itself is not immediate grounds for action. But, if you want to ensure your site isn’t mistakenly perceived as trying to manipulate search engine rankings, it would be wise to investigate.
There are 3 ways to clearly indicate to Google which URL you want it to pick up:
- Using the rel=”canonical” tag
- URL Parameter handling tool
- Implement 301 redirects
How to fix duplicate content issues? Let’s start with number 1:
- Canonical tag: perfect for telling search engines which version of a page is the original. Great for paginated results on an ecommerce store in which you have more than one URL for a page. For example:
All of these are the same page, but to search engines, each of these is a unique URL. The canonical tag allows you to show which is the original and preferred version e.g., http://www.website.com/
2. URL Parameter handling tool: this blocks search engines from crawling parameterised duplicate content. This can be a big problem with e-commerce stores that use URL parameters to filter products on pages that are otherwise the same.
So, for example:
In this case, the URL parameters are being used to filter the products on this page to only show women’s dresses which are green. However, any text content on this page would be the same as the page without these filters in place.
3. 301 Redirects: a 301 redirect is a way to tell the search engines that a page has permanently moved. If you have old versions of a webpage or lots of different versions of the same page, then it can be beneficial to use a 301 redirect to the most relevant page on your site. This can stop search engines from picking up the wrong version and confirm which page is the most important.
As the example below, you can use 301 redirects from the following pages.
to the preferred version, e.g.
Here’s another smiling fella below, to show you how happy you could be if you listen to my advice.
Conclusion: Google knows best.
To maintain the correct version of URLs being crawled and linked to, follow the 3 solutions above, indicating to Google which URL to pick up, through canonical versions, 301 redirects, and the URL Parameter handling tool. Google Search Console, as always, can help you check which of your pages is being picked up.
To prevent any potential penalty arising:
- Maintain consistency in your internal linking. You want your internal links to follow the same structure, making it easier for you to organise your URL and site structure. This can be done through the next piece of advice:
- Familiarise yourself with your content management system. Learn how to use canonical & 301 redirects, but also ensure that you understand how content appears on your site. Blogs and boilerplate text (e.g., copyright information) may be displayed in more than one format, indicating duplicate content to Google.
Not as difficult as it seems, is it?