Skip to main content

Rewriting URLs in Botify Crawls

Updated over a year ago

πŸ“˜ This article explains how to get a different view from Botify crawls than what search robots see by removing parameters and rewriting URLs for Botify crawls.

Overview

URL Rewriting is useful for analyzing a modified version of your website, where the modification only involves URLs: you can analyze a virtual version of your website by telling Botify to do as if it had found a modified version of the URL, whenever it finds a URL that matches specific patterns.
You can achieve simple URL rewriting to remove URL parameters with the Remove Parameters from URLs option. The Advanced URL Rewriting option is for more complex rewriting.

Access URL rewriting options by navigating to the Behavior section of Advanced Crawl Settings:

crawl_advancedsettings_urlrewrite.jpg

Removing URL Parameters

Removing URL parameters allows you to analyze your website as if some parameters had been removed without actually removing them from the website. This enables you to analyze a virtual version of your website without involving your technical team to make any changes.

The following are the most common use cases for parameter stripping:

  • Get "clean URLs" when a website automatically adds session ID parameters to URLs when the user is not accepting cookies (the Botify crawler, as most robots, does not accept cookies): you can instruct Botify to remove these session parameters.

  • Remove URL parameters that create duplicates or near duplicates to see a leaner version of your website. While this is not what Google will see, it may be useful as it will help you focus on other issues:

    • Remove tracking parameters that indicate which page the user came from or which element he clicked.

    • Remove sorting or display parameters that create URLs for different versions of the same list.

  • On a website with faceted navigation, you can ignore a facet that is currently crawlable to find the impact on the website structure if this facet was not crawlable.

To remove URL parameters:

  1. In the Parameter type field, select one of the following:

    • "From Querystring" for query parameters, which occur after a ? in the URL (e.g., http://www.mywebsite.com/page1.html?filter=articles&tracking=id123). If the parameter to remove is *tracking*, then Botify will behave as if it had found http://www.mywebsite.com/page1.html?filter=articles instead.

    • "From Semicolon" for path parameters, which occur after a semicolon in the URL path (e.g., http://www.mywebsite.com/page1;jsessionid=5D1D8F2EA5F4EB6B7058B65). If parameter stripping is set to remove *jsessionid*, then Botify will behave as if it had found http://www.mywebsite.com/page1 instead.

  2. Enter the parameter name to indicate which URL parameters are to be removed. Wildcards are not allowed in this field. When the Botify crawler finds a URL with a parameter that should be removed, it will do as if it had found the URL without this parameter: the URL without the parameter will be crawled. For example, if you instruct Botify to remove the "tracking" parameter when it finds a link to: http://www.mywebsite.com/page1.html?tracking=id123
    it will crawl:
    ​http://www.mywebsite.com/page1.html

  3. Select the Case Insensitive checkbox if you want the parameter to be removed whether part of its name is capitalized or not. Parameter names are case-sensitive by default. Leave this option unselected if to only remove the parameter when its name is exactly as you entered.

    crawl_removeparams.jpg

Removing Individual Parameters

If the URL has several parameters, you can specify individual parameters to be removed. Botify will keep the other parameters. For example, if the settings indicate the "tracking" parameter should be stripped, and Botify finds a link to:

http://www.mywebsite.com/page1.html?filter=articles&tracking=id123

It will crawl:
​http://www.mywebsite.com/page1.html?filter=articles.

Removing Multiple Parameters

To remove several parameters from the same URL, click the + icon to enter a line for each parameter:

All parameters listed here will be removed from all URLs, including when they exist in the same URL.

Advanced URL Rewriting

Here are common use cases where Advanced URL Rewriting is needed:

  • To force a value for a specific parameter. For example, if there is a sorting parameter that is always present in listing page URLs, and you want to force it to the default sorting value (in the example for parameter stripping, we assumed that the default sorting was applied to listing URLs when they did not contain any sorting parameter).

  • To analyze a version of your website when you have implemented A/B testing: for example, adding a parameter to all URLs to ensure Botify finds version A.

The Advanced URL Rewriting option lets you instruct Botify to change URLs that were found on your website when the URL corresponds to a specific pattern (i.e., "when you see this, do as if you had seen this instead"). For example, you can change a URL parameter value, the domain name, remove something from the URL, or add to the URL.

To define advanced URL rewriting rules:

  1. Construct a regular expression (regex) to define the part of the URL to be changed in the Regex field.

  2. Define what to replace it with in the Replace with field.
    ​

    crawl_advurlrewrite.jpg

  3. Click Test these rules.

  4. Enter sample URLs and see if the result is as expected, then click Test:

    crawl_testrewrite.jpg

Defining Part of the URL in the Rewritten URL

Define the portion of the original URL you want to reuse by enclosing it between parentheses in the regex and reuse it in the "Replace with" field represented by $1.

For example, to replace a folder with another:

Regex: www.website.com/folder-a/(.*)$

where (.*) captures the rest of the URL (all characters until the end of the URL, identified by the $).

Replace with: www.website.com/folder-b/$1

You can define several portions of the original URL to reuse in the rewritten URL by enclosing each between parentheses. They will be represented by $1, $2, $3, etc., in the order they appear in the original URL.

Rewriting Examples

Consider the following use cases to help you get started with rewriting URLs:

Use Case

Regex

Replace With

Replace HTTP with HTTPS

^http://

https://

Rename a parameter

oldparametername=

newparametername=

Change a parameter value

parametername=oldvalue

parametername=newvalue

Change the domain name

^http://www.domain1.com

http://www.domain2.com

Replace a subdomain by a folder with the same name

http://([^\.]+).domain.com/

http://www.domain.com/$1/

Force a trailing slash on all URLs which don't have any

([^/])$

$1/

Add a parameter to all URLs. Two rewriting rules must be defined in this order: One for URLs that already have one or several parameters, and one for URLs that do not have any parameter.

domain.com/([^\?]*)\?(.+)$

domain.com/([^\?]*)$

domain.com/$1?$2&newparam=value

domain.com/$1?newparam=value

If you need help setting up your URL rewriting rules, contact us at support@botify.com.

Did this answer your question?