📘 This article explains the most common reasons why too many website pages are too far from the site's home page.
Overview
When websites have too many deep pages, that is, they are too many hops away from the home page, one or several of the following elements are usually the cause:
Once you know what to look for first, see how to find out which pages create excessive depth on your website.
Pagination
Any long, paginated list generates depth and is made worse with any combination of the following:
Very long lists.
A low number of items per page.
A pagination scheme that moves a few pages forward at a time. If the pagination scheme only allows you to click a few pages further down the list, the list will be very long. Advancing three or five pages at a time in a list of 50 pages creates a list that is 10 to 16 clicks deep.
❓ Where is this likely to happen?
In lists that present the website’s primary content (e.g., products, articles) or other lists that do not immediately come to mind, such as user-generated content (e.g., product ratings, article comments).
🎚 Severity
In the worst-case scenario, some key content is too hard to reach, so robots will never see it, especially if these deep lists are the only path to some primary content (e.g., products not listed elsewhere). Internal search boxes can help users find content, but not robots. You will know if that is the case by exploring the deepest pages in the URL explorer. Click on the max depth in the depth graph (in the Distribution section of the report) and check if it includes some of your website's core content pages (add a filter on the URL to match your product pages, articles pages, or any other key content).
Alternatively, these pages pollute your valuable content and are useless for SEO. You should prevent robots from seeing deep pages such as:
Pagination for long, hardly-segmented lists (e.g., the “list all" type) that present products found elsewhere in more qualified, shorter lists.
Pagination that leads to content that is not key for organic traffic (e.g., user comments on articles).
Since a small part of this content may have some traffic potential, you must determine how to detect this higher-quality content and place it higher in the website structure. For example, you may consider the following elements for rating user-generated content: other users rating a comment as useful, the number of replies to each comment, a minimum length, etc.
✅ Recommendation
If deep lists are the only path to some important content:
You will need to work on the website navigation structure. Add finer categories or new types of filters to create additional, shorter lists that will also create new SEO target pages for middle-tail traffic.
List more items per page, and on each page, link to more pages (e.g., not to the next three or five pages, but the next 10, plus multiples of 10).
Ensure that you have added tags to your pagination links. While this will not affect depth, it will help robots get the sequence of pages right.
If deep pages are mainly low-quality/useless pages:
The recommended action depends on how many pages there are compared to your core, valuable content:
Many useless pages: Ensure robots do not visit these pages since they are bad for your website’s image, and even if robots only crawl a small portion of deep content, these pages could end up wasting significant crawl resources that would be best used on valuable content.
Few useless pages: There is no need to take action on these since the small number of deepest pages will not be crawled much, so it is unlikely to warrant any effort.
Too Many Navigation Filters
If your website’s navigation includes filters that are accessible to robots, and if those filters can be combined at will, then robots keep finding new filter combinations as they find new pages, which creates depth and a high volume of pages.
❓ Where is this likely to happen?
Any navigation scheme with multiple filters that create new pages (as opposed to a refresh on the same page).
🎚 Severity
This can get serious quickly since it creates exponential volume. Some filter combinations have organic traffic potential, but most do not, especially those with many filters that generate most of the volume.
✅ Recommendation
The best practice is to limit systematic links to filtered lists that are crawlable by robots to a single filter or a maximum of two filters simultaneously. For example, if you have a clothing store with filters by type/brand/color, you can allow the systematic combination of two filters. In this case, type + brand (jeans + diesel), color + type (red + dress), or brand + color (UGG + yellow). If you know a combination of more filters has traffic potential (‘black’+ ‘GStar’ ‘t-shirt’), a link can be added manually, but you do not want all combinations of three filters. That would create something like Armani + boots + yellow, which probably has no traffic potential and could well be an empty list.
Tracking Parameters
A tracking parameter (e.g., ‘?source=thispage’) is added to a URL to reflect the user’s navigation. This can create a huge number of URLs (all combinations: all pages the user can come from, for each tracked page) or even an infinite number if several parameters can be combined and the full path is tracked (which is considered a spider trap).
❓ Where is this likely to happen?
Common occurrences of this are in a “similar products” or a “related stories” block, where links to other products or articles include a parameter to track which page the user was coming from (not to be confused with parameters that track ads or email campaigns).
🎚 Severity
This can be very serious if many pages are tracked. The tracking parameters create many duplicates of important pages.
✅ Recommendation
Transmit the tracking information behind a ‘#’ at the end of the URL, where it will not change the URL, as opposed to a URL parameter behind a ‘?’, which is part of the URL. Redirect URLs with tracking parameters to the version of the URL without tracking.
Malformed URLs
Some pages include malformed links that create new pages when they should link to existing pages. Malformed links often return a 404 HTTP status code (Not Found), but it is also common for a malformed URL to return a page that appears to be normal to the user. In this case, malformed links typically still include an identifier used to populate the page content.
The most common problems include :
A missing human-readable element in the URL. For example:
http://www.mywebsite.com/products/_id123456.html instead of http://www.mywebsite.com/products/product-description_id123456.htmlRepeated elements in the URL. For example:
http://www.mywebsite.com/products/products/product-description_id123456.html
❓ Where is this likely to happen?
Potentially anywhere on the website.
🎚 Severity
The impact of malformed URLS depends on the volume. It is worse when malformed URLs return an HTTP 200 (OK) status code because there is a potentially large number of duplicate pages, most probably duplicates of key content.
✅ Recommendation
Replace malformed links with correct links. Redirect malformed URLs to the correct URL (HTTP 301 permanent redirection) or return a 404 HTTP status code (Not Found) since robots will continue crawling them for a while.
Perpetual Links
Perpetual links create an infinite number of pages through a link on every page for a given template that always creates a new URL.
❓ Where is this likely to happen?
The textbook example is ‘next day’ or ‘next month’ in a calendar. Some malformed URLs also have the same effect (a repeated element that is added again to the URL in each new page). The impact on the website structure is similar to that of pagination but is worse because you can only go forward one page at a time, and there is no end.
🎚 Severity
The impact of perpetual links depends on the volume (do these pages create new content pages as well?) and whether the template of these pages corresponds to important pages.
✅ Recommendation
For a perpetual ‘next’ button, implement an ‘end’ value that makes sense for your website. In the calendar example, that would be the last day that has events, and it would be updated with each calendar content update.
See also: