Getting search engines to index your website’s pages is an important part of search engine optimization (SEO). When search engines index a page, they’ll add it to their search results as an organic listing. Indexed pages serve as a channel through which users can discover and visit your website. You should be conscious, however, of which pages search engines index. If they index the wrong pages, your website may suffer from index bloat.
What Is Index Bloat?
Index bloat is a phenomenon in which search engines index undesirable pages on a website. Despite what many webmasters believe, not all pages on a typical website need to be indexed. If a page has duplicate content, low-quality content or no content at all, you should keep it out of the search results. With index bloat, search engines index undesirable pages such as these.
How Index Bloat Impacts Search Rankings
Index bloat can cause your website’s desirable pages to rank lower in the search results. Search engines, for instance, may rank a page of duplicate content higher than the original page. With the duplicate content page outranking it, the original page’s performance will suffer.
Even if it doesn’t harm the search rankings of your website’s desirable pages, index bloat will cannibalize your site’s organic search traffic. Once indexed, the undesirable pages will essentially steal the organic search traffic of your website’s desirable pages.
When indexed alone, a high-quality page may attract 5,000 visitors from Google per month. If Google indexes three other, undesirable pages for the same query, all those pages will compete for the same organic search traffic. Instead of attracting 5,000 visitors per month, the high-quality page may only attract about 1,250 visitors per month. Index bloat cannibalizes your website’s organic search traffic by diverting visitors away from your site’s desirable and high-quality pages.
Another problem posed by index bloat is the depletion of your website’s crawl budget. There are now over 1.7 billion websites on the internet. While some of these websites consist of just a few pages, others feature hundreds or thousands of pages. To ensure all websites have an opportunity to rank, search engines will crawl a limited number of pages on any given site. Crawl budget is the total number of pages that search engines are willing to crawl.
If search engines index undesirable pages on your website, they’ll spend less time and fewer resources on other pages. Each undesirable page that search engines crawl will deplete your website’s crawl budget. And once they’ve hit this limit, search engines won’t crawl any more pages on your website for an undisclosed period.
Identify Undesirable Pages
To prevent index bloat from negatively impacting your website’s rankings and traffic, you’ll need to either delete your site’s undesirable pages or block search engines from indexing them. So, start by identifying your website’s undesirable pages while making a note of their exact URL.
You can use Google Search Console to see which pages Google has indexed. Alternatively, you can search for your website’s domain prefixed with “site:” on Google. While reviewing your website’s indexed pages, take note of any duplicate content or low-quality pages that you want to be removed from the search results.
Delete Or Block Search Engines From Indexing
Once you’ve identified your website’s undesirable pages, you can either delete them or block search engines from crawling them. Both methods will completely remove a page from the search results. If you delete a page, though, visitors won’t be able to access it. The deleted page will be removed from your website, and after search engines have refreshed their listings, it will be removed from the search results as well.
If you block search engines from indexing a page, search engines will remove it from the search results but visitors will still be able to access it directly from their web browser or by clicking links to the blocked page.
Only delete pages that produce little or no value to your website’s visitors. The default “Hello World” page created by WordPress, for example, fails to produce value. Therefore, you should delete it. A category archive page, on the other hand, may prove valuable because it helps visitors find relevant posts. Category archive pages still contain duplicate content, so you should typically block search engines from indexing them.
Deleting an undesirable page on your website only takes a few clicks of your mouse. In WordPress and most other content management systems (CMSs), you can delete pages straight from your web browser. Just locate the page in the CMS’s admin interface and choose the “delete” option.
For all types of websites, including those that don’t use a CMS, you can delete a page manually by locating the page’s HTML file on your site’s server. Just connect to your website’s server with a File Transfer Protocol (FTP) app, and after identifying the page’s HTML file, right-click it and choose “delete.” The page will then be deleted from your website’s server, followed by the search results soon thereafter.
To block search engines from indexing an undesirable page on your website, you’ll need to add the “noindex” meta tag to the header section of the page’s HTML. This meta tag tells search engines not to index the page to which it’s added. The page may still be accessible through other links, but it shouldn’t appear in the search results.
Keep in mind, you can no longer use the “noindex” directive in a robots.txt file. In 2019, Google stopped supporting the “noindex” directive in robots.txt files. Instead, Google encourages webmasters to use the aforementioned header-based meta tag, which works in the same way.
It’s not the number of indexed pages that matters most when optimizing your website for organic search traffic, it’s the quality of those indexed pages. If search engines liberally index your website, undesirable pages may emerge in the search results. The end result is known as index bloat. Don’t let this lower the rankings of your website’s high-quality pages and jeopardize your organic search traffic in the process.