π This article explains managing the settings that define the SpeedWorkers page inventory. SpeedWorkers is part of Botify's Activation Suite, available as an option with a Botify Pro or Enterprise plan.
Overview
The SpeedWorkers Inventory settings define the scope of the URLs from which SpeedWorkers can cache and deliver pre-rendered pages to bots. Access the SpeedWorkers page inventory settings by navigating to Activation > Speedworkers > Settings > Inventory.
The Inventory Settings include the following sections:
List all URLs to cache: The source and rules for which pages to be included in the page inventory.
Inventory Optimization: Rules that filter the inventory based on your criteria.
Managing the URLs to Cache
The List all URLs to Cache section defines the sources for pages to be included in the inventory. In this section, you can do the following:
Viewing Configured Sources
The list of configured inventory sources in SpeedWorkers settings shows the following:
Source type
Summary of the source's defined rule
Fetch and cache status
Number of URLs coming from the source
A link to preview some of the URLs from the source
Deactivating Fetch and Cache
You can deactivate fetch or cache for all URL inventory sources except for Bots Discovery, where you can deactivate fetch but not cache.
To deactivate fetch or cache:
If settings are not in Edit mode, refer to the Defining SpeedWorkers Settings article to enable editing.
Slide the desired toggle to the left to deactivate, or to the right to activate.
Setting the Inventory Source Domain
Identify the domain from which the pages can be added to the inventory
In the Set Domain field, identify the domain for page inventory (e.g., www.example.com). To include subdomains, eliminate the "www" (e.g., example.com).β
Optionally, click the Add a Domain link to add another domain from which pages can be added to the inventory. When identifying multiple domains, SpeedWorkers will only evaluate one domain at a time.
Click Save.
β
Adding/Editing URLs to Cache
When you define URLs to cache you are adding pages to the SpeedWorkers page inventory.
To add or edit URLs to cache:
If settings are not in Edit mode, refer to the Defining SpeedWorkers Settings article to enable editing.
On the Inventory tab, click the Add a source of URLs link. Refer to the URL Source Types section for details on how to add URLs through various sources.
βOptionally, define Inventory Optimization rules.
URL Source Types
You can add URLs to the page inventory by any of the following source types:
Manual list of URLs
You can manually identify URLs to add to the page inventory as a list of individual URLs or by importing the list from a .TXT or .CSV file.
To manually identify URLs by list:
Click the Manual source button.
Click the List the URLs button.
In the Title field, provide a descriptive title to help identify the list.
In the URLs text box, type each URL to be included, one per line. The No. of URLs field at the bottom of the page increments every time you add a line.
Click Create.
To manually identify URLs by upload:
Click the Manual source button.
Click the Upload a .txt/.csv file button.
In the Title field, provide a descriptive title to help identify the list.
Click the Upload a file link, and then use your computer's system to locate and select the file.
Click Create.
Fetch from a distant file
This method allows you to upload URLs from a sitemap or text file from Amazon Web Services (AWS) S3 or Google Cloud Platform (GCP). The file must be in .CSV or .XML format and contain one URL per line. For your initial setup, we recommend using the most recently crawled URLs from your logs to import them with this method, and then change to the Bot Request source.
To upload a file from AWS or GCP:
Click the Fetch from distant file button.
In the Title field, provide a descriptive title to help identify the list.
From the Type dropdown, select the source file format: .CSV or .XML.
From the Download Strategy dropdown, select the method by which SpeedWorkers should fetch the file. Refer to the Download Strategy Types section below for descriptions of each option.
If your file includes a header that you do not want to be uploaded to SpeedWorkers, slide the Exclude Header toggle, and then choose the number of lines to skip.
In the "How Frequently should we download this file?" section, select the number and time increment that defines how often SpeedWorkers should download the file.β
βIn the "What should we do with the URLs found?" section, select an option to determine how SpeedWorkers will handle the URLs in the file. After you save this setting, it can not be changed:
Append to existing list: Add URLs from the file to the page inventory that have not yet been encountered. For URLs that already exist in page inventory, the URLs' "last seen" date is reset.
Overwrite existing list: The uploaded list will replace the current inventory list.
Force refresh: This option forces a refresh of the pages in the list. This option is helpful if you have updated content on the page and want SpeedWorkers to deliver the fresh page to search bots.
Purge from cache: Purges the URLs from the cache and stops SpeedWorkers from delivering the pages to search bots. The pages are not delivered again until they are refreshed.
Purge & refresh: Purge the URLs from the cache and then refresh the pages.β
In the "Remove URLs from the inventory" line, identify the number of days the URLs should stay in the inventory after they are last seen. The maximum is 90 days.
Click Create.
Download Strategy Types
File at URL: This option will upload the file located at the URL you identify in the URL field. The URL must include the protocol (e.g., http://www.example.com, https://www.example.com).
Last modified file matching prefix: SpeedWorkers will upload the file with the most recent modification date that includes the prefix you designate in the Prefix field.
Example: s3://your-bucket/folder/2021
Using this prefix will capture files such as:s3://your-bucket/folder/20210910.csv
All new files matching prefix: SpeedWorkers will upload all files that include the prefix you designate in the Prefix field that are not already in the inventory. In the "Download from date" field, set the earliest date from when SpeedWorkers should start downloading files.
Bot request
This method retrieves the URLs requested by search bots to SpeedWorkers, whether or not the pages are in the page inventory. URLs are added to the inventory if requested by a specific search engine bot or all bots in the specified number of days.
To add URLs by bot request:
Click the Bot request button.
Set the fields to determine when to add a URL to the inventory by selecting how often a URL is requested by a specified search engine bot over a selected period. We recommend setting this to a minimum of two requests.β
Select the cache behaviors on which you want to activate bot discovery.
Click Create.
π The number of bot requests set here is the minimum, meaning any requests over this minimum are included in the rule (e.g., one request over 30 days implies one request or more). If multiple configured bot request sources overlap, the configuration with minimum requests and maximum time increment takes precedence. For example, consider the following bot request configurations:
One request from Google over 30 days
Two requests from Google over 15 days
The one request from Google over 30 days configuration will be kept since it implies one request or more from Google over 30 days, making the second configuration redundant.
Links discovery
This method retrieves the URLs from the current web property by extracting them from the pages added to the SpeedWorkers page inventory. SpeedWorkers does not follow links that include rel="nofollow".
βοΈUse the Links discovery source type with caution as it can add many URLs at once and add pages that your Robots.txt blocks.
To add URLs by links discovery:
Click the Links discovery button.
In the Title field, provide a descriptive title to help identify the list.
In the "How long should we keep these URLs in the inventory?" section, identify the number of days that defines how long the URLs should remain in the page inventory. The maximum is 90 days.
Select the cache behaviors on which you want to activate link discovery.
Click Create.
Deleting an Inventory Source
When you delete an inventory source, Botify removes the URLs from the page inventory and does not refresh them unless they appear in another source. If the URLs appear in another source they will continue to be served to bots until the cache expires, unless a file upload has purged them.
To delete an inventory source, click the trash can icon in the row where the source name is displayed in Inventory Settings:
The source will be deleted immediately in the settings and take effect when the selected version is pushed to production.
Read next: