📘 This article advises addressing a low refresh efficiency report in SpeedWorkers. SpeedWorkers is part of Botify's Activation Suite, available as an option with a Botify Pro or Enterprise plan.
Overview
The ideal state is to have 100% of the pages served to bots in indexed status, as defined in the cache behavior settings. Refresh efficiency drops when pages are served to bots in outdated (i.e., stale) status. Sudden drops can precipitate trouble since outdated pages will soon be purged from the cache, making the Delivery Efficiency fall.
Description of the Problem
The Inventory Monitoring page demonstrates why SpeedWorkers cannot serve all pages from the inventory, regardless of page status.
This chart shows a steadily growing inventory. While the original crawl speed was sufficient to refresh all pages according to the cache behavior settings, the speed became insufficient as the inventory size grew over the month.
Possible Solutions
There are a few modifications you can make to address this type of problem:
Decrease the Inventory Size
While the number of pages can grow over time, a steady increase, as shown in the example above, is a sign that something is wrong. Consider how many pages in the inventory are never requested by search bots. If too many pages are cached uselessly, examine the input sources other than “Bots Discovery”. The Indexed URLs tab of the Inventory Monitoring page provides details on which source is bringing in the most URLs and which cache behavior is impacting their refresh.
Add other input sources to your inventory: The “Bots Discovery” input source will cache all pages requested by bots; however, it will always be a little too late for new pages because we need the first request before we can start caching the page. If this affects your delivery efficiency, add other input sources in your inventory settings one at a time, evaluating after adding each source. An ideal input source is a feed that lists new and updated pages.
Decrease the time in inventory: Decrease the time each new URL is kept in the page inventory from 90 days to much less (this is calculated for each inventory source in which the URL is found, based on the defined inventory settings).
Evaluate query string parameters: Since query string parameters can indefinitely inflate the number of unique URLs when included in reporting, you may benefit from filtering them out by setting an inventory optimization rule. Assuming the query parameters do not impact your page content, use the SpeedWorkers URL Explorer to locate the most requested or cached query parameters.
Decrease Refresh Rate in Cache Behaviors
The Cache Refreshes tab of the Inventory Monitoring page provides visibility on which behavior is responsible for all refreshes and potentially why SpeedWorkers has not been able to refresh some of them. Use the following strategies to help decrease your refresh rate:
Adjust your cache behavior settings to keep pages in the cache longer between each refresh.
Identify advanced rules to cache some pages longer (e.g., error pages, redirects) since they typically do not often change but will still consume crawl time. You can be very granular in the setup to avoid caching short-term errors while preventing permanent errors/redirects from being checked too regularly.
Increase the Crawl Speed
Examine the actual crawl speed in the “Fetch Speed Over Time” chart on the SpeedWorkers Overview page. If this chart suggests your crawl speed may be too slow, please contact the Support team to request an increase.
Read next: