π This article provides an overview of the setting options available for Botify crawls.
Overview
Botify identifies basic crawl settings when creating projects to enable the crawler to discover your site's structure. You can modify these settings anytime and configure your project further with advanced settings.
Accessing Crawl Settings
You can access crawl settings from the Crawl Manager and project settings. In the Crawl Manager, click the Settings link to jump to crawl settings:
Access crawl settings directly by clicking the cog wheel in the global project navigation bar, then navigate to Project Settings > Crawler.
Crawl settings are organized into the following tabs:
Modifying Main Crawl Settings
The main crawl settings direct the Botify crawler where to start exploring your site to discover its structure, how far to explore, and limits to apply to the crawl. The following sections are included in the main crawl settings:
Scope
The Scope section includes the following fields:
Project Name: The unique name that identifies the project on the Welcome page and throughout Botify. You can update this name in project settings at any time. We recommend you create a naming scheme to help manage many projects. For example, consider including the website name, any integrations, the number of URLs in the crawl, and the crawl cadence (e.g., weekly).
Start URLs: Where the crawl begins, typically the website's home page, though you can identify several start URLs. Refer to the Identifying Crawl Start Location section below for more information. Botify will continue to crawl your site until it reaches either the maximum number of URLs or depth set in the Crawl Parameters section. If you do not want Botify to crawl other discovered URLs outside of those identified in one of the external file options, set the Max Depth in the Crawl Parameters section to 0.
Crawl Configuration: The type of crawl to run: Mobile/Responsive, Desktop, or Advanced. We recommend Mobile/Responsive since Google favors this configuration. The Advanced option enables you to identify the user agent you have identified in Advanced Settings.
Allowed Domains: The domain and protocol combinations to be allowed in the crawl. To include any subdomain, select the "and Subdomains" checkbox. Click the + icon to add a domain or the - icon to remove an existing domain.
Blacklisted Domains: The domain and protocol combinations you do not want Botify to crawl.
Timezone: The timezone to be used for scheduled crawls. Note that crawls always display the time in Central European time, not your selected timezone.β
Identifying Crawl Start Location
You can identify an unlimited number of start URLs; however, we recommend using one since all start URLs will be shown as depth 0 in the analysis.
Choose one of the following options to identify where the Botify crawler should start:
1 - To start the crawl from one to three start URLs, identify the full URLs, including the protocol (e.g., https://www.botify.com) in the first text field, one per line.
π Delete any text in the Start URLs box to enable the following external file options. Using an external file to identify start URLs is beneficial to analyze a sample of pages, verify new redirects, or analyze a list of orphan pages.
2 - To start the crawl from URLs in an external file, click the Choose File button to upload a plain text file with one full URL per line, including the protocol. The file must use UTF-8 encoding and be 300 MB or less. Compressed files must only contain one text file (GZIP format).
3 - To start the crawl from a sitemap or sitemap index, identify up to three sitemap URLs in the second text field in the Start URLs section. Sitemaps must follow the sitemaps protocol, contain one full URL per line, and use UTF-8 encoding. The sitemap file size and the number of sitemaps Botify can download from your sitemap index files are unlimited.
Using sitemaps for start URLs does not enable the sitemap comparison feature.
Crawl Parameters
The Crawl Parameters section identifies crawl limits in the following fields:
Max # of Analysed URLs: Identify a limit on the number of URLs to crawl. Botify will crawl your site until this limit, the maximum depth, or any subscription-based limit is reached, whichever comes first. The maximum number of URLs is required. Ensure this limit exceeds the number of start URLs you have identified.
Max Speed (URLs / s): Identify the number of pages you want Botify to crawl per second. Botify will attempt to get as close to this speed as possible, but it should be within the limit of your server's load. If your domain has not been validated, the speed is limited to three pages per second.
Max Depth: The highest depth, or number of clicks away from the start page, you want Botify to crawl. The crawl will stop when the maximum number of crawled URLs is reached, the maximum depth has been fully crawled, or a subscription-based limit is reached, whichever comes first.
π‘ When the Max Depth is undefined, Botify stops crawling after a depth of 100 since search engines will not crawl this deep. Excessive depths indicate a problematic site structure that needs to be addressed.