Skip to main content

Managing Advanced Crawl Settings

Updated over a year ago

πŸ“˜ This article explains the advanced settings available to customize your Botify crawls.

Overview

In addition to the basic crawl settings, Botify provides advanced options to crawl your site according to your unique needs. Access advanced crawl settings by navigating to Settings > Crawler > Advanced Settings:

crawl_advancedset.jpg

The Advanced Settings page contains the following sections:

Report Features

The following options are available in the Report Features section:

crawl_reportfeatures.jpg
  • Content Quality: When enabled, this option reveals the Content section in SiteCrawler reports, which identifies low-quality content on your site and compares quality from the previous crawl. When this option is enabled, the analysis processing time is increased.

  • Indexability: A page with a canonical tag to another URL is considered non-indexable in Botify by default. This includes pages identified as the desktop version in the Allowed Domains section of your project settings. The canonical tags are expected to point to the desktop version on pages identified as the mobile version. Botify will not evaluate canonical tags in your analysis when this option is enabled.

  • Exports to Amazon S3: To enable a complete export of your crawl data to AWS, refer to these instructions, then contact Support to enable the export.

Access

Use the following fields to identify how the Botify crawler will access your site:

crawl_access.jpg
  • Desktop/Mobile User Agent: Use these fields to enable the crawler to present itself to the server as a custom user agent instead of the default Botify user agent.

  • HTTP Basic Authentication: Identify credentials for the Botify crawler to crawl a website with access control.

Behavior

Use the following fields to identify how the Botify will interact with your site:

crawls_behavior.jpg
  • Virtual Robots.txt: Identify robots.txt rules to override the robots.txt file discovered during the crawl. This allows you to add or remove restrictions to analyze only a section of your website or test planned changes to the robots.txt.

  • Follow rules: The Botify crawler will obey nofollow directives and follow canonicals, hreflang, alt, and amp tags by default. To override these defaults, slide the toggle to the left to disable the individual option.

  • Gzip: Gzip compression is enabled for Botify crawls by default. If your website uses Gzip compression, you can disable this option to see the difference it makes.

  • Custom HTTP Headers: Identify the names and values of any custom headers to use during the crawl, including cookies.

  • Remove Parameters from URLs: Identify parameters to ignore from URLs crawled by Botify to evaluate page performance without parameters.

  • Advanced URL Rewriting: Similar to "Remove Parameters from URLs", advanced URL rewriting enables Botify to analyze a modified version of your website by manipulating the URLs. This option enables you to force a specific value for a parameter using regular expressions.

Did this answer your question?