Skip to main content

Comparing Botify Crawled URLs to Sitemaps

Updated over a year ago

📘 This article explains how Botify can compare the pages crawled by the SiteCrawler bot to those identified in your sitemaps.

Overview

You can identify your sitemaps in Botify crawl settings to enable SiteCrawler comparison reports of the URLs Botify crawled to those in your sitemaps. These reports identify whether Botify’s crawler found all URLs from the sitemaps, whether they included redirects or errors, and technical details such as depth distribution, speed, and linking.

Access Sitemap reports by navigating to SiteCrawler > Sitemaps:

sc_sitemaps.jpg

Enabling Sitemap Analysis

To enable a comparison of the URLs in your sitemaps to Botify crawls, identify your sitemap locations in the main crawler settings. Do this for all sitemaps, even if they are included in your Robots.txt:

  1. Navigate to Settings > Crawler and scroll to the Sitemaps section.​

    sc_crawlsettings1.jpg

  2. Identify the location of all sitemap files to be included in the comparison, one per line.

    sc_settings_sitemap.jpg


    Alternatively, click Automatically find sitemap(s) to have Botify detect your sitemaps.

    sc_findsitemap.jpg

  3. Click Save to save and stay on the settings page, or Save & Back to Project to go to the Crawl Manager.

Enabling sitemap analysis does not mean Botify will crawl the URLs from your sitemaps since the goal is to discover which URLs found by Botify’s crawler through links or redirects are present in your sitemaps.

Supported Sitemaps

Botify can refer to up to 5,000 sitemaps in the following formats:

  • XML: An individual sitemap or sitemap index.

  • RSS and Atom: These formats are typically generated by content management systems.

  • Text: This format can only include URLs to HTML and other indexable pages. Each line must start with "http://" or "https://" and lines must not exceed 4,096 characters.

Sitemaps can be compressed, and we can decompress them if needed.

Determining the Sitemap Version

Botify analyzes the most recently available sitemaps during a scheduled or manual crawl. To find the version of your sitemap that was referenced in a crawl, add the "Sitemaps Retrieval Date" field as a report filter or column:

crawls_sitemapdate.png

This is the date and time when Botify retrieved the sitemap file referenced in your project settings.

Finding a Source Sitemap

To determine the name of the sitemap file in which a page exists, use the Sitemaps metric as a URL Explorer report column:

sitemap_column.jpg

Filtering Reports by Sitemaps

Since sitemaps contain all your strategic pages, it can be useful to filter many Botify reports based on whether a page exists in sitemaps, regardless of whether Botify crawled it. To filter reports to only display pages that exist in sitemaps, use the "In Sitemap" metric:

sitemap_exists.jpg

You can filter reports by specific sitemap to view performance data for all pages contained in a specific sitemap:

sitemap_filter.jpg

See also:

Did this answer your question?