Why is duplicate content an issue for SEO?

GET IN TOUCH

The UK's leading WordPress agency with 250+ stunning websites live

Flexible and affordable packages for every business, brand & budget

WordPress sites laser-focused on impact conversions, revenue & results

Hands-on guidance, support & ideas from our team of UK WordPress experts

The UK's leading white label development partner with 250+ stunning web projects live

Seamless collaboration, technical excellence, cost efficiency & on-time delivery

Complete confidentiality – your clients remain yours and can operate as a completely silent partner

Relax with our 10+ years of industry experience, exceptional service and 80+ 5-star Google reviews

The UK's leading WordPress agency with 250+ stunning websites live

Flexible and cost-effective plans for all business needs and budgets

High-impact WooCommerce sites designed for maximum conversions and revenue

Expert advice, support, and innovative ideas from our dedicated UK WooCommerce team

The UK's leading Laravel agency with 250+ stunning websites live

Flexible and affordable solutions for every business requirement and budget

Cutting-edge Laravel sites built to enhance conversions and boost revenue

Comprehensive support, expert insights, and inventive ideas from our dedicated UK Laravel team

SEO

Drive growth with expert SEO services from Yellowball. Our proven strategies boost visibility, leads, and conversions. Get a quote today.

PPC

We are an award-winning PPC agency in London and an official Google Partner. Propel your business forward with our expert PPC strategies.

Branding & Graphic Design

We are an award-winning branding and logo design provider in London. Drive your business forward with our expert branding strategies.

What is Duplicate Content?

Duplicate content is where identical content appears on two or more different web pages on the web. In fact, the content does not have to be exactly the same. As Google and other search engines become more complex, they are getting better at spotting spun content and treating it as duplicate content. For example, if you have simply reworded an article, this can also count as duplicate content even though it is not an exact replica of another page. Either way, duplicate content can reduce the authority of the individual page and if widespread, the credibility of an entire website. It can then result in loss of rankings for individual pages or the entire site. Google’s definition of duplicate content is “refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar”.

Why is Duplicate Content Bad for SEO?

Google’s webmaster guidelines states that for sites which have seeming manipulative intent with their duplicate content “the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results”. Sounds like a penalty to us! So who is right? In the vast amount of duplicate content cases, there is very little malicious intent and as such Google simply tries to figure out which version of the content is the original, therefore not incurring a penalty. However, if the duplicate content is highly spammy then it can incur a penalty. Penalty or no penalty, solving your website’s duplicate content issues is a necessary part of SEO. Here are the potential ramifications of duplicate content:

Google has to decide which page to rank for a given search term. If you have multiple instances of duplicate content on your site, Google may choose a less preferred page to return to searchers as a search result.
Search Engines want to offer a variety of results for a given search term. Duplicate content can therefore reduce the ranking power of a given webpage or website.
In spammy, manipulative cases Google can penalise the website by reducing rankings or de-indexing the site.
Duplicate content (especially duplicate URLs) can cause ambiguity for third party websites. As a result, they may link to a duplicate version which will result in a loss of link juice for your website.
Excessive duplicate content on a website can be an indication of poor quality and therefore reduce the authority of your website. In turn this will affect the ability of your website to rank for given search terms.
The concept of a crawl limit (i.e the amount of pages that Google will crawl on your website dependent on your authority) and a limit to the amount of pages that Google is willing to index for your site does throw interesting questions around whether duplicate content will significantly impair the regularity with which Google crawls your entire site and/or indexes your site. Regardless of whether these limits are fixed and true, it really isn’t worth the risk.

What are common duplicate content issues?

Pages with similar content

This is also referred to as keyword cannibilisation. It occurs when two or more webpages on the same site have very similar content – or in this world of latent semantic indexing and the knowledge graph, have content that tackles the same issue or point! This type of duplicate content can be very confusing for Google and as such it is important that your content creation strategy is well structured, along with clear hierarchy for the existing pages on your site. For pages with similar content it is worth merging them, using rel=prev or next, or simply rethinking the structure of your site. Pages with similar content can not only cause duplicate content issues for search engines but can also create a more difficult user flow through the website and/or dilute link juice throughout the site as external websites are unsure which page to link to.

Printer Friendly Versions

Printer friendly versions of webpages can sometimes be indexed by Google and therefore appear as a direct replica of a webpage. Whilst this is pretty obvious to Google that it is a printer version of the page it is not worth the risk of leaving it available to robots.

Session IDs, Search Filters and URL Parameters

URLs will often be generated for different purposes, whether that be to track a particular session for a user or as a result of the search or filter function being used on the website. These generated URLs can sometimes count as duplicate content and should therefore be addressed.

Duplicate URLs

Duplicate URLs are a very common type of duplicate content. If you have not chosen a canonical URL (or preferred domain in Google’s Search Console) it effectively means that there will be one or more duplicate versions of every page on your website. Luckily Google is reasonably good at choosing a canonical URL. However, duplicate URLs can result in loss of link juice as other websites link to the duplicate versions rather than the canonical URLs. Duplicate URLs should be 301 redirected to the canonical and a canonical tag applied to the URL, Google do not recommend no indexing the duplicate content but instead simply permanently redirect the page to the canonical (original) version. We would also recommend indicating the canonical URL by choosing a preferred domain on Google’s Search Console.

Default Server Settings & Duplicate URLs

Servers can often create multiple versions of a webpage through their default settings. Most are easy to simply 301 redirect to the canonical, although be careful not to create an infinite loop with /index.html.

Multiple Categories and Products

Duplicate URLs can be created through complex information architecture (site structure) or websites with hundreds of products. For example, e-commerce sites can often suffer from duplicate content issues with their products that appear in multiple categories. For example, a pair of red Nike trainers could be returned for male trainers, running trainers, nike trainers, red trainers, size searches, etc. All of these searches could have their own unique URL and therefore create duplicate content.

Boilerplate content and widgets

Excessive amounts of boilerplate content or widgetized content can count as duplication. After all, each web page should have its own unique content and if too much of the content appears on multiple other pages via widgets or boilerplate content it makes sense that it would count as duplicate content. Google advises that boilerplate content be reduced by providing a synopsis and link to a page with more detailed content such as terms and conditions or copyright. Furthermore, they recommend adding the No Content Class Attribute to the boilerplate content.

Different TLDs (usually with international sites)

International companies can often have separate domains for each country in which they operate, whether that be .co.uk for the United Kingdom, .es for Spain, .de for Germany, .fr for France, etc. Duplicate content issues occur when each website uses the same or similar content. As with any content we advise that each country has as much unique content as possible.

Languages

Website with content in multiple languages do not automatically count as duplicate content. It is completely fine if the website’s content has been professionally translated by a human being because there will be certain linguistic nuances that will make the difference between the languages. However, if you have used an automated system to translate the content (such as Google translate) this can not only count as duplicate content but will also be of a poorer quality than professionally translated text, therefore doubly impacting your website’s ability to rank.

Duplicate Meta Data

This is very common indeed, especially with larger sites. Best practice dictates that each page has its own unique title tag and meta description. It can be a laborious job but if you want your SEO to be perfect then it is a necessary job. We would advise starting with ensuring that there are no duplicate title tags due to the fact that Google claim to no longer take meta descriptions into account.

CMS Issues

Some content management systems automatically generate multiple versions of a webpage and therefore automatically generate duplicate content. It is worth identifying if this is occurring and making the necessary changes to prevent this from automatically occurring.

Paginated Content

This usually occurs with search results or articles with multiple pages. Google has difficulty telling that the pages are connected and therefore (in the case of an article) it can count as duplicate content because there are multiple pages talking about the same topic.

How to Identify Duplicate Content

Know the rules

Unfortunately for some of the duplicate content issues you simply need to know the rules. Lucky for you this article should give you all the information you need to make sure that you are duplicate content free!

Use Google!

There are loads of sites that will scrape your content and repost them. This is not something to worry too much about because the sites are usually of low authority and also repost with links, as such it is fairly obvious to Google that yours is the original. However, if you are scared of higher authority websites reposting an article that you wrote and outranking you for it then use Google to keep an eye out for the content!

Search Console

Google’s Search Console is a quick way to identify duplicate meta data. Simply sign in, go to ‘Search Appearance’ and click on ‘HTML Improvements’.

Screaming Frog

Screaming Frog’s SEO Spider can filter its results by duplicate Title Tags, Meta Descriptions and URLs.

Solving Duplicate Content Issues

Don’t Scrape or Duplicate Content

The most obvious way to prevent duplicate content issues is not to copy content from another website and publish it on your site in the first place! This not only creates duplicate content issues but will also damage the credibility of your website in both the users and Google’s eyes – high quality unique content should always be the aim! Make sure that you are not creating similar content to that which already exists on your site and therefore falling foul of keyword cannibalisation.

Canonical Tag

If you do come across an article which is simply just too good not to include on your own blog, you should give credit where credit is due. A canonical tag on a piece of content effectively tells Google that you have full knowledge that it is a duplicate and indicates where the original piece is. For more information on canonical tags see here.

301 redirect

The 301 redirect is the most effective tool for solving duplicate content issues. A 301 redirect does not only indicate to Google that a page has permanently moved (essentially merging the two pages) but also automatically redirects the user to the correct page and passes the majority of any misdirected link juice to the correct page. Moving forward then, users are highly unlikely to link to the duplicated page and Google will identify the canonical page as the page that each duplication has been redirected to.

Merge Similar Content

For pages with keyword cannibilisation issues it is advisable to merge the two pages by 301 redirecting one to the other. This may require some thought as to the rest of the user flow but in the long run will be far more beneficial for the site as it should streamline your user journey and improve UX.

Parameter Handling Tool

Note that the Parameter Handling Tool is much like the Data Highlighter in that it is a Google tool and therefore only for Google. Rules set out in the Parameter Handling Tool will have no effect for Bing and Yahoo! The parameter handling tool allows webmasters to identify which URL parameters they want indexing and which ones they do not want indexing, for example for filtered searches. For more information on the Parameter Handling Tool for duplicate content see Google’s guide (https://support.google.com/webmasters/answer/6080548?hl=en).

No content Class Attribute

The no content class attribute is recommended by Google to be used on Boilerplate content. It is very similar to the no index attribute placed on whole pages, but instead can indicate to both Google and Yahoo! that a particular piece of content within a page is not to be indexed. Pretty neat stuff.

Preferred Domain

Again, much like the Parameter Handling Tool and Data Highlighter, setting your preferred domain through Google’s Search Console will have no effect on your search rankings with Yahoo! or Bing. Regardless though, even if you have 301’d and placed canonical tags on all of your duplicate URL’s to identify your canonical domain, there is no harm in setting your preferred domain on the Search Console.

Complete Removal / 404

Completely removing a duplicate page is an option, especially for content which users ever visit. However, we would still advise 301 redirecting the removed duplicate page to the original because you never know who might stumble across a link in an email or may have previously linked to that page. Also, 404s are just annoying to come across so let’s try to keep the UX as good as possible!

Search Console URL Removal

The URL removal request has changed somewhat since the days of Webmaster Tools and having to 404 and no index the page prior to requesting the removal a URL. Nowadays, you simply have to log into the Search Console click on the ‘Google Index’ option, go to ‘Remove URLs’ and enter the URL you desire to remove. However, this only lasts for 90 days and we would always advise 301 redirecting the page to the canonical anyway.

Rel=Prev/Next

This can be used to link paginated pages and tell Google that they are part of the same article/search.

Looking for an SEO agency?

Our team of SEO experts will identify and resolve any duplicate content issues, as part of a fully managed digital marketing campaign.

Search Engine Optimisation