What is Google indexing?
What distinguished Google from other search engines pretty early on is that, rather than periodically scanning the world wide web, its bots are constantly out there crawling the widest corners of the internet. Once a webpage is crawled it is filed into an ever-expanding library known as Google’s ‘index’. This index is constantly refreshing and refining itself so that it can consistently deliver the most relevant and highest quality web content as simply and quickly as possible. This process of Google crawling webpages, retrieving all the salient information and filing it away in its library is known as ‘indexing’. Sounds pretty straightforward right? Well not exactly. A crucial part of SEO is making it as easy as possible for Google to effectively crawl and index websites. Part of this involves identifying a range of common technical issues that act as roadblocks for all of those little bots scuttling around the finer details of your website.What are some examples of common indexing issues?
Broken Links
This term is used to refer to links that are in essence, dead ends. This could be caused by incorrectly entering a URL, an error in a webpage’s HTML code or the fact that the page simply no longer exists.Canonical URLs
This is the assigned URL for a page and acts as a signal for Google that it should be prioritised over other variables (e.g. HTTP vs. HTTPS). This minimises the risks associated with duplicate content, which is when two pages are so alike that Google doesn’t know which one to prioritise in search results.No Index & No Follow Tags
These are tags that signal to Google that a page shouldn’t be indexed, whether to limit webspam or duplicate content. These are usually added deliberately, but it’s still something to check for if Google isn’t indexing your website.Redirect Loops
A 301 redirect is used to fix broken links, but there may be instances where one page redirects to another, which then links back to the original URL. This creates a closed loop where a page is constantly redirecting.Robots.txt
This file is added to a website to essentially act as a point of reference for Google’s bots. This ensures that only user-facing pages on your website (and none of that back-end stuff) are crawled and indexed. So be sure to check that no important pages have managed to inadvertently find their way onto your robots.txt file.Mobile-Related Issues
As you probably know, Google uses mobile-first indexing to address the drastic shift from desktop to mobile as the dominant way people use the internet. While this has been the case for some time, some websites are yet to catch up. If a website loads at a glacial pace or doesn’t have a responsive design for mobile users, Google isn’t exactly incentivised to index or rank your page well.How does Google’s URL Inspection Tool Work?
If any of the above issues are affecting your website, you should be able to identify these using Google’s URL Inspection Tool. Launched in 2018, this tool is a fantastic example of Google seeking to make its processes more transparent. In this section, we’ll be breaking down each of the features in the URL Inspection Tool and how they work together to ensure your pages are indexed properly.URL Presence on Google
If your URL is marked as being ‘on Google’ then huzzah, you’re indexed! This is one of five indexability categories offered by Google, the remaining can be found below:- URL is on Google, but has issues: While the URL has technically been indexed, there are problems with the enhancements that are considered best practice for search engine visibility. For example, the structured data (commonly considered the “language” of browsers and search engines) may be incorrect, or perhaps the website is considered unfriendly to mobile users.
- URL is not on Google: Google has taken heed of a notice that a page shouldn’t be crawled, such as a URL being listed on the website’s robots.txt file. There may also be less direct signifiers that a page shouldn’t be indexed. For example, it may be an orphan page, meaning that no other internal links are pointing to that page on the website.
- URL is not on Google (Indexing): Google has encountered one of the indexability issues mentioned earlier in this article, such as a broken link or a no-index tag.
- URL is an alternate version: This is when Google flags that the URL you have entered is either the AMP version or perhaps the desktop version of a URL when your website is understood by Google as being mobile-first.
View Crawled Page
This part of Google Search Console allows you to understand the three main ways Google crawls your website and evaluates its overall quality and user experience. Firstly, the ‘HTML’ tab presents the rendered page code, which can allow you to diagnose issues such as incorrect canonical URLs or misplaced structured data. Secondly, a screenshot of the crawled webpage (as it would be seen on a smartphone) can help you visualise any potential problems with mobile usability. Finally, a handy ‘More Info’ section allows you to dive into the technical nitty-gritty of a webpage, such as the specific type of content that has been crawled (usually text and HTML) and HTTP status codes.Request Indexing
Made changes to a URL that you’re eager for Google to crawl? If you’ve added some new content to a page, perhaps optimised for a high-priority keyword or fixed some pressing technical issues, the Request Indexing tool prompts Google to crawl and re-index your page. The act of submitting URLs for crawling is simple enough, but there are a few key points to consider:- Before you get too trigger-happy you should know that requesting the same page to be indexed multiple times does not make Google re-index your URL faster.
- You can submit a maximum of 10-12 URLs from the same website to be indexed in a day.
- Regardless of how many times a page is requested to be indexed, Google won’t turn a blind eye to any issues such as no index tags or misused canonical tags.