Skip to main content
All CollectionsBotify ActivationSpeedWorkersSpeedWorkers Tech Docs
Deploying SpeedWorkers with CloudFront (Only GET/HEAD)
Deploying SpeedWorkers with CloudFront (Only GET/HEAD)
Updated over 6 months ago

🛠 This document explains the configuration requirements for running SpeedWorkers with CloudFront.

❗️Warning about extra costs:
With the CloudFront architecture, the viewer request lambda is called for all incoming requests. The origin request lambda is called for all bot requests that don’t hit the cache (page requests). It may introduce additional costs if you don’t already use Lambda@Edge (for malicious bot detection, for instance). See how Lambda@Edge are used below.

👉 SpeedWorkers can’t be implemented in behaviors that accept POST/PUT methods due to a CloudFront limitation, which doesn’t support the failover on these methods. If your origin accepts POST/PUT, you must configure behaviors to intercept them before the behavior that integrates SpeedWorkers.

How SpeedWorkers Works

SpeedWorkers is a service designed to deliver web pages to search engine crawler bots as fast as possible. It increases your crawl budget (SEO), serving more pages to search engines to index for the same amount of time spent on your website.

SpeedWorkers can prerender your JavaScript pages in advance, at scale, and deliver them to search engines in a few hundred milliseconds. Prerendering pages enables search engines to index your pages faster - in other words, it increases your crawl/ render budget. By storing all pages in its cache, SpeedWorkers delivers long-tail pages as fast as any other page, whereas usual CDNs cannot keep long-tail pages in the cache.

Our service has advanced quality controls to ensure the pages are rendered with all their components.

The CloudFront Situation

SpeedWorkers (SW) workflow is …

  • Intercept incoming requests

  • Call SW when the request is from a bot

  • Wait for the SW reply

  • Return the SW response in case of success or fall back to the origin server

CloudFront does not provide a comprehensive rule system to route traffic. You can only set different origins with CloudFront according to a URL path pattern. You cannot route requests based on headers or query parameters. The only way to perform advanced routing is through Lambda@Edge.

Using Lambda@Edge to Deploy SW

Lambda@Edge is a feature of Amazon CloudFront that runs code closer to web application users, which improves performance and reduces latency.

How does it work?

  • We create a viewer request lambda@edge that adds headers, like the header identifying the request type (bot or user).

  • We configure CloudFront to partition the cache according to this header (one cache for the users and one for the bots) and forward the required headers.

  • We create an origin request lambda@edge that decides whether to request SpeedWorkers' or the website’s servers.

Installing two Lambda@Edge

Two Lambda@Edge must be deployed to enable SpeedWorkers (SW).

  • Viewer request/response lambda: Called to read and edit metadata of incoming requests (headers), enabling the partitioning of CloudFront cache between bots and users.

  • Origin request/response lambda: To determine which origin server should handle the request (SpeedWorkers’ or the website’s servers).

545

A conceptual graphic showing how the CloudFront events can trigger a Lambda function

Why we cannot use a single Lambda@edge

Intercepting viewer requests (before CloudFront cache) is limited in time (5s max) and response size (40KB max); therefore, we cannot return the SW cached page version. Intercepting origin server requests (after CloudFront cache) is still limited in size (1MB). 1MB is too small, and we cannot configure CloudFront to prevent the SW page from being cached and returned to users. Learn more: Lambda Requirements Limits (AWS).

Implementation

Scripts:

Two solutions are available.

Solution 1

When the lambda for origin A is called, it changes the origin to the True Origin if the request shouldn’t be intercepted by SpeedWorkers.

When CloudFront receives an incoming request:

  • It calls the Viewer Request Lambda, which adds three headers to the request:

    • One header indicates if the request comes from a bot or a user. This header has to be whitelisted in the CloudFront behavior’s configuration to partition the cache between bots and users (to avoid returning the bot version of a page to a user because it has been put in the cache).

    • One header containing the original request hostname (domain). This header has to be forwarded to the Origin Request Lambda to rebuild the complete original request URL, as CloudFront doesn’t provide it in the request object sent to the lambda.

    • One header containing the original request user-agent. This header also has to be forwarded to the Origin Request Lambda to forward it to SpeedWorkers, as CloudFront filters out the User-Agent header by default

  • It calls the Origin Request Lambda, which, based on the request type (bot or user), decides whether to reroute the request directly to the True Origin or let it be handled by the Origin Group set for the CloudFront behavior matching the current request:

    • If the request comes from a bot, add the headers required by SpeedWorkers to process the request and let CloudFront call the Origin Group configured for this behavior (with SpeedWorkers as the primary origin).

    • If the request comes from a user, then reroute the request and make CloudFront call the True Origin directly instead of calling the primary origin (SpeedWorkers) of the Origin Group.

Benefits

  • If the SpeedWorkers origin fails (i.e., doesn’t have the page in the cache or is out of service), CloudFront will fall back to the True Origin.

  • If the Viewer Lambda is accidentally removed, the Origin Lambda won’t receive the headers it’s waiting for and will reroute the request to the True Origin.

  • If the Origin Lambda is accidentally removed, the SpeedWorkers will return an authentication error (403) because the headers are invalid (set in the origin), and CloudFront will fall back to the True Origin.

This implementation can vary depending on whether or not the client uses the CloudFront cache and the True Origin has to handle requests for several domains (hostnames):

  • If the client doesn’t use the CloudFront cache, then the Viewer Request Lambda doesn’t need to add a header to partition the cache, and this header doesn’t have to be whitelisted but the Origin Request will need access to the User-Agent header (the User-Agent header has to be whitelisted).

  • If the client’s True Origin only handles requests from one domain (hostname), like https://www.botify.com, and doesn’t have any subdomain, then the Viewer Request Lambda doesn’t need to add a header containing the host necessary to rebuild the request URL.

  • If these two conditions are met, the viewer lambda is unnecessary (but the origin lambda must be configured with an origin rewrite).

Solution 2

When the lambda for origin A is called, it fails and forces CloudFront to fall back to origin B if the request shouldn’t be intercepted by SpeedWorkers.

When CloudFront receives an incoming request:

  • It calls the Viewer Request Lambda, which adds three headers to the request:

    • One header indicates if the request comes from a bot or a user. This header has to be whitelisted in the CloudFront behavior’s configuration to partition the cache between bots and users (to avoid returning the bot version of a page to a user because it has been put in the cache).

    • One header containing the original request hostname (domain). This header has to be forwarded to the Origin Request Lambda to rebuild the complete original request URL, as CloudFront doesn’t provide it in the request object sent to the lambda.

    • One header containing the original request user-agent. This header also has to be forwarded to the Origin Request Lambda to forward it to SpeedWorkers.

  • It calls the Origin Request Lambda, which, based on the request type (bot or user), decides whether to force CloudFront to fall back and let the True Origin handle the request, or let the request be handled by the Origin Group set for the CloudFront behavior matching the current request:

    • If the request comes from a bot, then add the headers required by SpeedWorkers to process the request and let CloudFront call the Origin Group configured for this behavior

    • If the request comes from a user, return an error and make CloudFront fall back to the True Origin instead of calling the SpeedWorkers origin.

Benefits:

  • If the SpeedWorkers origin fails (doesn’t have the page in the cache or is out of service), CloudFront will fall back to the True Origin.

  • If the Viewer Lambda is accidentally removed, the Origin Lambda won’t receive the headers it’s waiting for and will return an error, making CloudFront fall back to the True Origin.

  • If the Origin Lambda is accidentally removed, the SpeedWorkers' origin will return an authentication error (403) because the headers are invalid (set in the origin), and CloudFront will fall back to the True Origin.

This implementation can vary depending on whether or not the client uses the CloudFront cache and whether or not the True Origin has to handle requests for several domains (hostnames):

  • If the client doesn’t use the CloudFront cache, then the Viewer Request Lambda doesn’t need to add a header to partition the cache, and this header doesn’t have to be whitelisted. The Origin Request will need access to the User-Agent header (the User-Agent header has to be whitelisted).

  • If the client’s True Origin only handles requests from one domain (hostname), like https://www.botify.com, and doesn’t have any subdomain, then the Viewer Request Lambda doesn’t need to add a header containing the host necessary to rebuild the request URL.

If these two conditions are met, the viewer lambda is unnecessary (but the origin lambda must be configured with an origin rewrite).

Compare Solutions

Pros

Cons

Solution 1

Fewer lambda calls as origin B will only be called if origin A failed.

Need to set the True Origin settings (domain, headers…) in the lambda. If there are several True Origins, then several lambdas will have to be created, one for each True Origin, or the lambda will have to be modified to support several True Origins.

Solution 2

A unique lambda can handle any True Origin.

Increase lambda calls as all requests not intercepted by SpeedWorkers will force CloudFront to fall back to origin B and call the lambda again (even if the lambda does nothing).

Doesn’t work if all headers are forwarded to the origin (cache disabled).

Harder to configure.

Note:

  • Both implementations require an origin group with two origins. Origin A (SpeedWorkers) as the primary origin, and Origin B (True Origin) as the secondary origin:

  • Both implementations require an origin request lambda and a viewer request lambda (except in specific cases).

Deployment Guidelines

To deploy SpeedWorkers in CloudFront, you must:

  1. Configure CloudFront to partition cache between request types (user or bot).

  2. Forward headers to origin.

  3. Create an Origin Group for each origin that receives requests for pages cached by SpeedWorkers.

  4. Configure the cache behaviors.

  5. Create two Lambda@Edge: a Viewer Request Lambda and an Origin Request Lambda.

Please ensure your CloudFront distribution behaviors are configured in a way that ensures that the SpeedWorkers lambda will only be called for page requests that may have been cached or it will introduce a significant cost overhead (don’t call SpeedWorkers for page resources like images, CSS, JavaScript, etc.).

Process

  • Create a SpeedWorkers origin in CloudFront.

  • Create a CloudFront SpeedWorkers Origin Group with SpeedWorkers as primary origin, and the True origin as the secondary origin.

  • Create a staging CloudFront behavior with a test route.

  • Assign the SpeedWOrkers Origin Group to the staging behavior.

  • Create the SpeedWorkers Viewer Request and Origin Request lambdas and assign them to the staging CloudFront behavior.

  • Test the configuration.

  • Once validated, assign the lambda to the production CloudFront behavior and replace the production behavior origin with the Speedworkers Origin Group.

Create the Origin Group(s)

Create the SpeedWorkers origin

  1. Connect to the AWS console and go to the CloudFront server:https://console.aws.amazon.com/cloudfront.

  2. Click on the Distribution you want to install Speedworkers on.

  3. Go to the “Origins and Origin Groups” tab.

  4. Click on the “Create Origin” button.

  5. Set the “Origin Domain Name” to (the SpeedWorkers domain provided by us).

  6. Set the “Origin Path” to /page.

  7. Set the “Origin ID” to SpeedWorkers.

  8. Set the “Minimum Origin SSL Protocol” to TLSv1.2.

  9. Set the “Origin Protocol Policy” to “HTTPS Only”.

  10. Set the “Origin Connection Attempts” to 1.

  11. Set the “Origin Connection Timeout to 2.

  12. Set the “Origin Response Timeout” to 2.

  13. Set the “Origin Keep-alive Timeout” to 30.

  14. Set the “HTTPS Port” to 443.

  15. Set the following headers (we need to set these values to get a 403 error instead of a 400 error is the origin lambda is not set to fallback to the secondary origin of the origin group):

    • name: “x-sw-website-id”, value: “nolambda”

    • name: “x-sw-token”, value: ”nolambda”

    • name: “x-sw-uri”, value: “nolambda”

  16. Click the Create button.

Create the Origin Group(s)

For each True Origin of your CloudFront distribution that needs SpeedWorkers, create an origin group:

  1. From your CloudFront distribution “Origins and Origin Groups” page.

  2. Click on the “Create Origin Group” button.

  3. Select the SpeedWorkers origin and click on the “Add” button.

  4. Then select your True Origin and click on the “Add” button.

  5. Set the Failover criteria: check almost all boxes (500, 502, 503, 504, 403). Do not check the 404.

  6. Set the Origin Group ID to “OriginGroup-SW-True-Origin“ (Replace True Origin with your True Origin name).

  7. Click the Create button.

Configure the Cache Behaviors

After configuring the behavior, the bot requests will be sent to SpeedWorkers first, before reaching the True Origin. If the Lambdas are not configured yet, SpeedWorkers will reject the requests because they won’t contain the required headers, and CloudFront will fall back to the True Origin. You may want to configure the lambdas first, but it’s not mandatory.

For each cache behavior where you need SpeedWorkers:

  1. Go to the “Behaviors” tab of your CloudFront distribution.

  2. Check the behavior you want to modify.

  3. Click the Edit button.

  4. If you are using the legacy cache settings (not recommended):

    • Set the “Cache Based on Selected Request Headers” to:

      • Whitelist if you use the CloudFront cache. In the “Whitelist Headers”. section, enter the custom header “X-SW-Request-Type” and click the Add Custom >> button.

      • All if you don’t use the CloudFront cache. Nothing to set; CloudFront will forward all headers.

      • With the legacy settings, the “If-Modified-Since” header can’t be forwarded to the origin, reducing efficiency.

      • Ensure the “Query String Forwarding and Caching” is configured correctly and that query parameters influencing the returned page content are forwarded.

    • If you are using a “cache policy and origin request policy”:

      • If you use the CloudFront cache:

        • Click Create a new policy on the “Cache Policy”.

        • Set a name for the policy.

        • In the Headers drop-down, select “Whitelist”.

        • Enter the custom header “X-SW-Request-Type”. It will enable SpeedWorkers to partition the cache between users and bots.

        • Click Add header.

        • Click the Create cache policy button.

      • Click Create a new policy on the “Origin Request Policy” option.

      • Set a name for the policy, like “SpeedWorkers Origin Request Policy”.

      • In the Headers drop-down, either select “All viewer headers” or “Whitelist” and add the “X-SW-Request-Type” header (plus your own whitelisted headers and the “If-Modified-Since” header).

      • In the “Cookies” and “Query Strings” drop-downs, set the same settings you normally use. Ensure the “Query Strings” are configured correctly and that query parameters influencing the returned page content are forwarded.

      • Click the Create origin request policy button.

    • In the “Origin or Origin Group” drop-down, select the Origin Group you created earlier, containing the SpeedWorkers origin as the primary origin.

    • Click the Yes, Edit button.


Create the Lambdas

Create the Viewer Request lambda

It’s unnecessary to create this lambda if you don’t use the CloudFront cache and if your CloudFront behavior handles requests for only one domain with no subdomains.

  1. Connect to the AWS console and go to the Lambda service: https://console.aws.amazon.com/lambda

  2. Make sure you are in the us-east-1 region (N. Virginia).

  3. Click the Create function button.

  4. Check the Author from scratch option.

  5. Fill in the Basic Information section:

    • Enter a name for your lambda, like “SpeedWorkers-Interceptor-ViewerRequest”.

    • Select “Node.js 12.x” as Runtime.

  6. Click the Create function button.

  7. On the “Configuration” tab:

    • In the “Function code” section:

      • Paste the code provided by Botify.

      • Set the configuration:
        swBotPattern: during the validation process of SpeedWorkers, set it to “(botify-bot-sw-)“. This way only requests containing “botify-bot-sw-” in the user-agent will be submitted to SpeedWorkers and the legitimate bot traffic won’t be affected.
        Don’t leave the original swBotPattern if you are not ready to reroute bot traffic to SpeedWorkers!

      • Click the Save button.

    • In the “Basic Settings” section:

      • Click Edit.

      • Set the timeout to 1 second.

      • Set the memory to 128MB.

      • Click the Save button.

  8. On the “Permissions” tab:

    • Click on the role name (it will open a new browser tab).

    • On the “Trust relationships” tab, click the Edit trust relationship button.

    • Set the “Policy Document” to:
      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Effect": "Allow",
      "Principal": {
      "Service": [
      "lambda.amazonaws.com",
      "edgelambda.amazonaws.com"
      ]
      },
      "Action": "sts:AssumeRole"
      }
      ]
      }

  9. Click on “Update Trust Policy”.

  10. On the lambda browser tab, click Action at the top of the page.

  11. Select “Publish new version”.

  12. Click the Publish button.

  13. On the “Configuration” tab:

    • Click Add trigger.

    • Select CloudFront.

    • Click the Deploy to Lambda@Edge button.

    • Select the distribution you want to install SpeedWorkers on.

    • Select the behavior where the lambda should be used.

    • Select Viewer request as CloudFront event.

    • Make sure “Include body” is not checked.

    • Check the acknowledgment, then click the Deploy button.

  14. Your lambda will be active once the CloudFront distribution finishes its update.

Create the Origin Request Lambda

  1. Connect to the AWS console and go to the Lambda service: https://console.aws.amazon.com/lambda

  2. Ensure you are in the us-east-1 region (N. Virginia).

  3. Click the Create function button.

  4. Check the “Author from scratch” option.

  5. Fill in the Basic Information section:

    • Enter a name for your lambda, like “SpeedWorkers-Interceptor-OriginRequest”.

    • Select “Node.js 12.x” as Runtime.

  6. Click the Create function button.

  7. On the “Configuration” tab:

    • In the “Function code” section:

      • Paste the code provided by Botify.

      • Set the configuration:

      • Click the Save button.

    • In the “Basic Settings” section:

      • Click the Edit button.

      • Set the timeout to 1 second.

      • Set the memory to 128MB.

      • Click the Save button.

    • On the “Permissions” tab

      • Click on the role name (it will open a new browser tab).

      • On the “Trust relationships” tab, click the Edit trust relationship button.

      • Set the “Policy Document” to:
        {
        "Version": "2012-10-17",
        "Statement": [
        {
        "Effect": "Allow",
        "Principal": {
        "Service": [
        "lambda.amazonaws.com",
        "edgelambda.amazonaws.com"
        ]
        },
        "Action": "sts:AssumeRole"
        }
        ]
        }

  8. Click the “Update Trust Policy”.

  9. On the lambda browser tab, click Action at the top of the page.

  10. Select “Publish new version”.

  11. Click the Publish button.

  12. On the “Configuration” tab:

    • Click Add trigger.

    • Select CloudFront.

    • Click on the Deploy to Lambda@Edge button.

    • Select the distribution you want to install SpeedWorkers on

    • Select the cache behavior where the lambda should be used (the Origin group created for SpeedWorkers).
      (You should select a cache behavior handling only page requests that SpeedWorkers may cache and avoid cache behaviors responding to resources or API requests to lower the lambda cost).

    • Select Origin request as the CloudFront event.

    • Ensure “Include body” is not checked.

    • Check the acknowledgment, then click the Deploy button.

  13. Your lambda will be active once the CloudFront distribution finishes its update.

Disable Error Code Caching

By default, CloudFront caches some error codes (400, 404…) for 10 seconds. This behavior has to be deactivated to avoid strange caching behaviors.

In the behavior:

  1. On the Error Pages tab, click the Create Custom Error Response button.

  2. Select an HTTP error code.

  3. Set the Error Caching Minimum TTL to 0.

  4. Click Create.

  5. Repeat the operation for all HTTP error codes.

Test the Configuration

SpeedWorkers implements a test request where it responds with a specific value to a specific URL. If you send a request to https://www.yourdomain.com/speed-workers-check-WEBSITEID.html (set the right domain and replace WEBSITEID with the website ID provided by Botify), you will get an HTTP status code 200 (OK) and a response body simply containing “Success”.

Botify will perform a batch of automated tests once the lambdas are deployed in a staging environment (accessible from the outside). Otherwise, these tests will have to be performed manually.

Validating the SpeedWorkers Integration

To validate the integration of SpeedWorkers in your environment, you can send the following requests.

"Always Success" Test

The "always success" test will force SpeedWorkers to return a cache hit even if the page is not in the cache. This test ensures that SpeedWorkers is called and its reply returned to the bot:

Always Success (force a cache hit in SW)
--------------
URL: Your homepage (https://www.mywebsite.com)

Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-success,echo-67674
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify

Expected response:
Status: 200
Body: "Success"
Headers:
X-Ftlcdn-Status: false
X-Sw-Echo: 67674
X-Sw-Passed-Through: true
X-Sw-Status: success

"Cache Miss" Test

The "cache miss" test forces SpeedWorkers to return a cache miss even if it has the page in the cache. This test ensures that when SpeedWorkers can't deliver the page, the request falls back properly:

URL: Your homepage (https://www.mywebsite.com)

Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-notfound,echo-41521
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify

Expected response:
Status: 200
Body: your homepage
Headers:
NO X-Sw-... headers

"Timeout" Test

The "timeout" test forces SpeedWorkers to delay its response enough to trigger the timeout in your environment. This test ensures that when SpeedWorkers doesn't reply, the request falls back properly:

URL: Your homepage (https://www.mywebsite.com)

Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-timeout,echo-42300
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify

Expected response after several seconds:
Status: 200
Body: your homepage
Headers:
NO X-Sw-... headers

Revisit Log Files

Please revisit the log files being passed to Botify for ingestion and confirm you are sending us the following:

  • origin.speedworker.com

  • cluster.speedworker.com

Troubleshooting

If sending a request to SpeedWorkers doesn’t return the expected response when testing the integration, try the following:

❗️Before testing with a third-party service, change the website ID and token in the recv snippet to avoid leaking them.


Did this answer your question?