Skip to main content

Crawling with a Custom User Agent

Updated over a year ago

πŸ“˜ This article explains using a custom user agent in Botify crawls.

Overview

Botify's crawler uses the Mozilla/5.0 (compatible; botify;http://botify.com)user agent by default. When your website is verified in Botify, you can crawl with a custom user agent. If your project's website is not verified, the crawler adds a link in the user agent to allow the website owner to stop the crawl.

To analyze your website as the desktop version of Googlebot sees it, using Botify's default user agent is best. It may be necessary to customize the user agent to ensure the Botify crawler is treated by your web servers the way you expect, especially in the following situations:

  • Your website uses dynamic serving to deliver mobile-friendly pages to mobile devices.

  • Your website applies special treatment to Googlebot.

  • You have a pre-production website using an allowed user agent for internal use.

The custom user agent changes the user agent string in the crawler's HTTP request to the server to manage access rights via the user agent. It does not change the crawler's behavior, which is the same as with the default Botify user agent, so it will follow rules in robots.txt or virtual robots.txt first for Botify, or Googlebot if Botify is not found, then for any robot.

This blog post explains several situations where crawling with a custom user agent is useful.

Allowed User Agents

If you have verified your website, there is no restriction to the type of user agent you can identify. For non-verified websites, you cannot include leading search engine names (e.g., Google, Bing, Baidu, Yandex) in the custom user agent to eliminate attributing Botify crawls to these search engines in log file analysis.

Identifying a Custom User Agent

To identify a custom user agent for your crawls:

  1. Navigate to the Access section of Advanced Crawl Settings:
    ​

    crawl_advancedset_useragent.jpg

    ​

  2. Select Custom from the dropdown list, then identify the user agent's HTTP header in the text field. Include information any standard user agent contains, such as the client version it represents. If you do not own the website, include a link to contact information to enable the site owner to contact someone who controls the crawler.

    customuseragent.jpg

You can save a new user agent in your settings if your website still needs to be validated; however, this custom user agent will only be used once the website is validated.

Determining the User Agent Used in a Crawl

All crawl settings are visible during the crawl and a summary is shown in the Analysis Info section of all SiteCrawler reports after the crawl and analysis are complete. Access this by navigating to Analytics > SiteCrawler > Analysis Info:

analysisinfo.jpg

The emergency stop link added to the user agent for unverified websites is excluded.

Did this answer your question?