π This article explains where to find your site's non-HTML pages crawled by Botify to help you identify if these pages can be optimized..
Overview
Optimizing your website's non-HTML pages that include important content ensures they can be discovered and indexed by search engines, potentially driving traffic through different channels. This includes better promoting your content for an improved user experience and removing duplicates to improve your ranking.
Examples of non-HTML pages include:
Plain text: Some of these may be content-rich pages that include content not found on other pages, but they are not user-friendly and do not include any links. Consider migrating the pages with interesting content to HTML pages.
PDF files: If these are duplicates of HTML content, they should not be accessible to search engine robots since the duplication will affect your site ranking. If their content is unique, consider converting them to HTML pages to make them more accessible to search engines. While search engines can crawl and rank PDF pages, they often lack the metadata search engines look for, which may affect your ranking.
RSS feeds: There is no action needed for these.
Creating a Non-HTML Page Report
Non-HTML pages are identified in SiteCrawler reports via their HTML content-type tag (<meta http-equiv>) or HTTP header. To find the non-HTML pages crawled by Botify, navigate to SiteCrawler's Distribution report:
Click the "Not Set" segment of the chart to view a URL Explorer report of the non-HTML pages.
To check and view these pages' content on your website, click the arrow by the URL.
In this example, these are plain text pages containing much information that could bring organic traffic for very specific long-tail queries. While each page is likely to generate very little traffic on its own, optimizing many of these pages can generate non-negligible traffic volume.
If you migrate some of this content to user-friendly HTML pages, you should implement redirects and update links to these pages on your website since the URLs will change, at least the file extensions (i.e., .txt to .html).
To find out where these plain text pages are linked from, add the following columns to the URL Explorer report:
No. of Inlinks (Total): The number of links to this page.
No. of Inlinks (Unique): The number of pages that link to this page (links found several times on the same page are counted just once).
Sample of Inlinks: Pages that link to this page.
You can export this report to get the list of all pages where you need to update links to the optimized pages.