Skip to main content
Deploying SpeedWorkers with CloudFront
Updated over 7 months ago

🛠 This document explains the configuration requirements for running SpeedWorkers with CloudFront.

❗️Warning about extra costs:

With the CloudFront architecture, the viewer request lambda is called for all incoming requests. The origin request lambda is called for all bot requests that don’t hit the cache (page requests). It may introduce additional costs if you don’t already use Lambda@Edge (for malicious bot detection, for instance). See how Lambda@Edge are used below.

How SpeedWorkers Works

SpeedWorkers is a service designed to deliver web pages to search engine crawler bots as fast as possible. It increases your crawl budget (SEO), serving more pages to search engines to index for the same amount of time spent on your website.

SpeedWorkers can prerender your JavaScript pages in advance, at scale, and deliver them to search engines in a few hundred milliseconds. Prerendering pages enables search engines to index your pages faster - in other words, it increases your crawl/ render budget. By storing all pages in its cache, SpeedWorkers delivers long-tail pages as fast as any other page, whereas usual CDNs cannot keep long-tail pages in the cache.

Our service has advanced quality controls to ensure the pages are rendered with all their components.

The CloudFront Situation

The SpeedWorkers (SW) workflow . . .

  • Intercepts incoming requests

  • Calls SW when the request is from a bot

  • Waits for the SW reply

  • Returns the SW response in case of success or falls back to the origin server

CloudFront does not provide a comprehensive rule system to route traffic. You can only set different origins with CloudFront according to a URL path pattern. You cannot route requests based on headers or query parameters. The only way to perform advanced routing is through Lambda@Edge.

Using Lambda@Edge to Deploy SW

Lambda@Edge is a feature of Amazon CloudFront that runs code closer to web application users, which improves performance and reduces latency.

How does it work?

  • We create a viewer request lambda@edge that adds headers, like the header identifying the request type (bot or user).

  • We configure CloudFront to partition the cache according to this header (one cache for the users and one for the bots) and forward the required headers.

  • We create an origin request lambda@edge that decides whether to request SpeedWorkers' or the website’s servers.

Installing Two Lambda@Edge

Two Lambda@Edge must be deployed to enable SpeedWorkers (SW).

  • Viewer request/response lambda: Called to read and edit metadata of incoming requests (headers), enabling the partitioning of CloudFront cache between bots and users.

  • Origin request/response lambda: To determine which origin server should handle the request (SpeedWorkers’ or the website’s servers).

545

A conceptual graphic showing how the CloudFront events can trigger a Lambda function

Why we cannot use a single Lambda@edge

Intercepting viewer requests (before CloudFront cache) is limited in time (5s max) and response size (40KB max); therefore, we cannot return the SW cached page version. Intercepting origin server requests (after CloudFront cache) is still limited in size (1MB). 1MB is too small, and we cannot configure CloudFront to prevent the SW page from being cached and returned to users. Learn more: Lambda Requirements Limits (AWS).

Implementation

Scripts:

How does it work?

When the lambda is called, it changes the origin to the SpeedWorkers origin if SpeedWorkers should handle the request. When CloudFront receives an incoming request:

  • It calls the Viewer Request Lambda (or Cloud Function), which adds three headers to the request:

    • One header indicates if the request comes from a bot or a user. This header has to be whitelisted in the CloudFront behavior’s configuration to partition the cache between bots and users (to avoid returning the bot version of a page to a user because it has been put in the cache).

    • One header containing the original request hostname (domain). This header has to be forwarded to the Origin Request Lambda to rebuild the complete original request URL, as CloudFront doesn’t provide it in the request object sent to the lambda.

    • One header containing the original request user-agent. This header must also be forwarded to the Origin Request Lambda to forward to SpeedWorkers, as CloudFront filters out the User-Agent header by default.

  • It calls the Origin Request Lambda, which, based on the request type (bot or user), decides whether to reroute the request directly to the SpeedWorkers origin or let it be handled by the origin set for the behavior matching the current request:

    • If the request comes from a bot, then the lambda changes the origin to point to SpeedWorkers.

    • If the request comes from a user, let CloudFront call the behavior’s origin.

  • If the request comes from a bot and SpeedWorkers doesn’t have the page in its cache, it will perform a fallback request. It will request the page with a specific User-Agent not recognized as a bot by the lambdas.

  • If the Viewer Lambda (or Cloud Function) is accidentally removed, then the Origin Request Lambda won’t receive the headers it’s waiting for and will ignore the request, letting the behavior’s origin handle it.

  • If the Origin Lambda is accidentally removed, the behavior’s origin will be called.

Deployment Guidelines

To deploy SpeedWorkers in CloudFront, you must do the following:

Ensure your CloudFront distribution behaviors are configured so the SpeedWorkers lambdas are only called for HTML page requests. Calling the lambda for resources (e.g., images, js, css) introduces a cost overhead in AWS.

Configure CloudFront Behaviors

For each cache behavior where you need SpeedWorkers:

  1. Go to the “Behaviors” tab of your distribution.

  2. Check the behavior you want to modify.

  3. Click on the Edit button.

  4. Edit your Cache policy to partition cache between request types (user or bot):

    • In the Headers drop-down, change to “Include the following headers” if it was set to None (or leave it to none if all TTL settings are set to 0).

    • Add the custom header “X-Sw-Request-Type”. It will enable SpeedWorkers to partition the cache between users and bots.

    • Save the changes.

  5. Edit your Origin request policy to forward headers to origin:

    • In the Headers drop-down, either select “All viewer headers” or “Include the following headers” and add the “X-Sw-Request-Type”, “X-Sw-Host“, “X-Sw-User-Agent“, “X-Sw-If-Modified-Since“, “X-Sw-Options“, “X-Sw-Options-Auth“ headers (plus your own headers).

    • In the “Cookies” and “Query Strings” drop-downs, set the settings you normally use. Ensure the “Query Strings” are configured correctly so that query parameters influencing the returned page content are forwarded.

    • Save the changes.

Set Origin Timeouts

To account for the additional delay that may occur when SW has to fall back to the origin server in case of a cache miss, you should add a one or two-second delay to the read timeout in your origin configuration.

Create the Lambdas

Create two lambdas:

Create the Viewer Request Lambda

You can replace the Viewer Request lambda with a Viewer Request Cloud Function (less expensive).

  1. Connect to the AWS console and go to the Lambda service: https://console.aws.amazon.com/lambda.

  2. Ensure you are in the us-east-1 region (N. Virginia).

  3. Click on the Create function button.

  4. Check the Author from scratch button.

  5. Fill the Basic Information section:

    • Enter a name for your lambda, like “SpeedWorkers-Interceptor-ViewerRequest”.

    • Select “Node.js 18.x” as Runtime.

  6. Click the Create function button.

  7. On the “Configuration” tab:

    • In the “Function code” section:

      • Paste the code provided by Botify.

      • Set the configuration:
        swBotPattern: During the validation process of SpeedWorkers, set it to “(botify-bot-sw-)“. This way only requests containing “botify-bot-sw-” in the user-agent will be submitted to SpeedWorkers and the legitimate bot traffic won’t be affected.
        Don’t leave the original swBotPattern if you are not ready to reroute bot traffic to SpeedWorkers!

      • Click the Save button.

    • In the “Basic Settings” section:

      • Click the Edit button.

      • Set the timeout to one second.

      • Set the memory to 128MB.

      • Click the Save button.

  8. On the “Permissions” tab:

    • Click on the role name (it will open a new browser tab).

    • On the “Trust relationships” tab, click the Edit trust relationship button.

    • Set the “Policy Document” to:
      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "lambda.amazonaws.com", "edgelambda.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }

  9. Click on the “Update Trust Policy”.

  10. On the Lambda browser tab, click “Action” at the top of the page.

  11. Select “Publish new version”.

  12. Click the Publish button.

  13. Back on the “Configuration” tab:

  • Click on “Add trigger”.

  • Select "CloudFront".

  • Click the Deploy to Lambda@Edge button.

  • Select the distribution you want to install SpeedWorkers on.

  • Select the behavior where the lambda should be used.

  • Select Viewer request as CloudFront event.

  • Make sure “Include body” is not checked.

  • Check the acknowledgment and click the Deploy button.

Your lambda will be active once the CloudFront distribution finishes its update.

Create the Origin Request Lambda

  1. Connect to the AWS console and go to the Lambda service: https://console.aws.amazon.com/lambda

  2. Ensure you are in the us-east-1 region (N. Virginia).

  3. Click the Create function button.

  4. Check the Author from scratch button.

  5. Fill the Basic Information section:

    • Enter a name for your lambda, like “SpeedWorkers-Interceptor-OriginRequest”.

    • Select “Node.js 18.x” as Runtime.

  6. Click the Create function button.

  7. On the “Configuration” tab:

    • In the “Function code” section:

    • In the “Basic Settings” section:

      • Click the Edit button.

      • Set the timeout to 1 second.

      • Set the memory to 128MB.

      • Click on the “Save” button.

    • On the “Permissions” tab:

      • Click on the role name (it will open a new browser tab).

      • Go to the “Trust relationships” tab.

      • Click the Edit trust relationship button.

      • Set the “Policy Document” to:
        { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "lambda.amazonaws.com", "edgelambda.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }

  8. Click on the “Update Trust Policy”.

  9. On the Lambda browser tab, click “Action” at the top of the page.

  10. Select “Publish new version”.

  11. Click the Publish button.

  12. Back on the “Configuration” tab:

  • Click on “Add trigger”.

  • Select "CloudFront".

  • Click the Deploy to Lambda@Edge button.

  • Select the distribution you want to install SpeedWorkers on.

  • Select the cache behavior where the lambda should be used (the Origin group created for SpeedWorkers). You should select a cache behavior that handles only page requests that SpeedWorkers may cache and avoid cache behaviors responding to resources or API requests to lower the lambda cost.

  • Select Origin request as CloudFront event.

  • Make sure “Include body” is not checked.

  • Check the acknowledgment and click the Deploy button.

Your lambda will be active once the CloudFront distribution finishes its update.

Disable Error Code Caching

By default, CloudFront caches some error codes (400, 404…) for 10 seconds. This behavior has to be deactivated to avoid strange caching behaviors.

In the behavior:

  1. Go to the Error Pages tab.

  2. Click the Create Custom Error Response button.

  3. Select an HTTP Error Code.

  4. Set the Error Caching Minimum TTL to 0.

  5. Click on Create.

  6. Repeat the operation for all HTTP Error Codes.

Test the Configuration

SpeedWorkers implements a test request where it responds with a specific value to a specific URL. If you send a request to https://www.yourdomain.com (set the right domain and replace WEBSITEID with the website ID provided by Botify), you will get an HTTP status code 200 (OK) and a response body simply containing “Success”.

Botify will perform a batch of automated tests once the lambdas are deployed in a staging environment (accessible from the outside). Otherwise, these tests will have to be performed manually.

Validating the SpeedWorkers Integration

To validate the integration of SpeedWorkers in your environment, you can send the following requests.

"Always Success" Test

The "always success" test will force SpeedWorkers to return a cache hit even if the page is not in the cache. This test ensures that SpeedWorkers is called and its reply returned to the bot:

Always Success (force a cache hit in SW)
--------------
URL: Your homepage (https://www.mywebsite.com)

Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-success,echo-67674
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify

Expected response:
Status: 200
Body: "Success"
Headers:
X-Ftlcdn-Status: false
X-Sw-Echo: 67674
X-Sw-Passed-Through: true
X-Sw-Status: success

"Cache Miss" Test

The "cache miss" test forces SpeedWorkers to return a cache miss even if it has the page in the cache. This test ensures that when SpeedWorkers can't deliver the page, the request falls back properly:

URL: Your homepage (https://www.mywebsite.com)

Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-notfound,echo-41521
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify

Expected response:
Status: 200
Body: your homepage
Headers:
NO X-Sw-... headers

"Timeout" Test

The "timeout" test forces SpeedWorkers to delay its response enough to trigger the timeout in your environment. This test ensures that when SpeedWorkers doesn't reply, the request falls back properly:

URL: Your homepage (https://www.mywebsite.com)

Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-timeout,echo-42300
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify

Expected response after several seconds:
Status: 200
Body: your homepage
Headers:
NO X-Sw-... headers

Revisit Log Files

Please revisit the log files being passed to Botify for ingestion and confirm you are sending us the following:

  • origin.speedworker.com

  • cluster.speedworker.com

Troubleshooting

If sending a request to SpeedWorkers doesn’t return the expected response when testing the integration, try the following:

❗️Before testing with a third-party service, change the website ID and token in the recv snippet to avoid leaking them.

Did this answer your question?