đ This document explains the configuration requirements for running SpeedWorkers with CloudFront.
âď¸Warning about extra costs:
With the CloudFront architecture, the viewer request lambda is called for all incoming requests. The origin request lambda is called for all bot requests that donât hit the cache (page requests). It may introduce additional costs if you donât already use Lambda@Edge (for malicious bot detection, for instance). See how Lambda@Edge are used below.
How SpeedWorkers Works
SpeedWorkers is a service designed to deliver web pages to search engine crawler bots as fast as possible. It increases your crawl budget (SEO), serving more pages to search engines to index for the same amount of time spent on your website.
SpeedWorkers can prerender your JavaScript pages in advance, at scale, and deliver them to search engines in a few hundred milliseconds. Prerendering pages enables search engines to index your pages faster - in other words, it increases your crawl/ render budget. By storing all pages in its cache, SpeedWorkers delivers long-tail pages as fast as any other page, whereas usual CDNs cannot keep long-tail pages in the cache.
Our service has advanced quality controls to ensure the pages are rendered with all their components.
The CloudFront Situation
The SpeedWorkers (SW) workflow . . .
Intercepts incoming requests
Calls SW when the request is from a bot
Waits for the SW reply
Returns the SW response in case of success or falls back to the origin server
CloudFront does not provide a comprehensive rule system to route traffic. You can only set different origins with CloudFront according to a URL path pattern. You cannot route requests based on headers or query parameters. The only way to perform advanced routing is through Lambda@Edge.
Using Lambda@Edge to Deploy SW
Lambda@Edge is a feature of Amazon CloudFront that runs code closer to web application users, which improves performance and reduces latency.
How does it work?
We create a viewer request lambda@edge that adds headers, like the header identifying the request type (bot or user).
We configure CloudFront to partition the cache according to this header (one cache for the users and one for the bots) and forward the required headers.
We create an origin request lambda@edge that decides whether to request SpeedWorkers' or the websiteâs servers.
Installing Two Lambda@Edge
Two Lambda@Edge must be deployed to enable SpeedWorkers (SW).
Viewer request/response lambda: Called to read and edit metadata of incoming requests (headers), enabling the partitioning of CloudFront cache between bots and users.
Origin request/response lambda: To determine which origin server should handle the request (SpeedWorkersâ or the websiteâs servers).
A conceptual graphic showing how the CloudFront events can trigger a Lambda function
Why we cannot use a single Lambda@edge
Intercepting viewer requests (before CloudFront cache) is limited in time (5s max) and response size (40KB max); therefore, we cannot return the SW cached page version. Intercepting origin server requests (after CloudFront cache) is still limited in size (1MB). 1MB is too small, and we cannot configure CloudFront to prevent the SW page from being cached and returned to users. Learn more: Lambda Requirements Limits (AWS).
Implementation
Scripts:
Viewer request lambda: CloudFront viewer request lambda
Viewer request cloud function: CloudFront viewer request Cloud Function
Origin request lambda: CloudFront Origin Request Lambda
How does it work?
When the lambda is called, it changes the origin to the SpeedWorkers origin if SpeedWorkers should handle the request. When CloudFront receives an incoming request:
It calls the Viewer Request Lambda (or Cloud Function), which adds three headers to the request:
One header indicates if the request comes from a bot or a user. This header has to be whitelisted in the CloudFront behaviorâs configuration to partition the cache between bots and users (to avoid returning the bot version of a page to a user because it has been put in the cache).
One header containing the original request hostname (domain). This header has to be forwarded to the Origin Request Lambda to rebuild the complete original request URL, as CloudFront doesnât provide it in the request object sent to the lambda.
One header containing the original request user-agent. This header must also be forwarded to the Origin Request Lambda to forward to SpeedWorkers, as CloudFront filters out the User-Agent header by default.
It calls the Origin Request Lambda, which, based on the request type (bot or user), decides whether to reroute the request directly to the SpeedWorkers origin or let it be handled by the origin set for the behavior matching the current request:
If the request comes from a bot, then the lambda changes the origin to point to SpeedWorkers.
If the request comes from a user, let CloudFront call the behaviorâs origin.
If the request comes from a bot and SpeedWorkers doesnât have the page in its cache, it will perform a fallback request. It will request the page with a specific User-Agent not recognized as a bot by the lambdas.
If the Viewer Lambda (or Cloud Function) is accidentally removed, then the Origin Request Lambda wonât receive the headers itâs waiting for and will ignore the request, letting the behaviorâs origin handle it.
If the Origin Lambda is accidentally removed, the behaviorâs origin will be called.
Deployment Guidelines
To deploy SpeedWorkers in CloudFront, you must do the following:
Ensure your CloudFront distribution behaviors are configured so the SpeedWorkers lambdas are only called for HTML page requests. Calling the lambda for resources (e.g., images, js, css) introduces a cost overhead in AWS.
Configure CloudFront Behaviors
For each cache behavior where you need SpeedWorkers:
Go to the âBehaviorsâ tab of your distribution.
Check the behavior you want to modify.
Click on the Edit button.
Edit your Cache policy to partition cache between request types (user or bot):
In the Headers drop-down, change to âInclude the following headersâ if it was set to None (or leave it to none if all TTL settings are set to 0).
Add the custom header âX-Sw-Request-Typeâ. It will enable SpeedWorkers to partition the cache between users and bots.
Save the changes.
Edit your Origin request policy to forward headers to origin:
In the Headers drop-down, either select âAll viewer headersâ or âInclude the following headersâ and add the âX-Sw-Request-Typeâ, âX-Sw-Hostâ, âX-Sw-User-Agentâ, âX-Sw-If-Modified-Sinceâ, âX-Sw-Optionsâ, âX-Sw-Options-Authâ headers (plus your own headers).
In the âCookiesâ and âQuery Stringsâ drop-downs, set the settings you normally use. Ensure the âQuery Stringsâ are configured correctly so that query parameters influencing the returned page content are forwarded.
Save the changes.
Set Origin Timeouts
To account for the additional delay that may occur when SW has to fall back to the origin server in case of a cache miss, you should add a one or two-second delay to the read timeout in your origin configuration.
Create the Lambdas
Create two lambdas:
Create the Viewer Request Lambda
You can replace the Viewer Request lambda with a Viewer Request Cloud Function (less expensive).
Connect to the AWS console and go to the Lambda service: https://console.aws.amazon.com/lambda.
Ensure you are in the us-east-1 region (N. Virginia).
Click on the Create function button.
Check the Author from scratch button.
Fill the Basic Information section:
Enter a name for your lambda, like âSpeedWorkers-Interceptor-ViewerRequestâ.
Select âNode.js 18.xâ as Runtime.
Click the Create function button.
On the âConfigurationâ tab:
In the âFunction codeâ section:
Paste the code provided by Botify.
Set the configuration:
âswBotPattern: During the validation process of SpeedWorkers, set it to â(botify-bot-sw-)â. This way only requests containing âbotify-bot-sw-â in the user-agent will be submitted to SpeedWorkers and the legitimate bot traffic wonât be affected.
âDonât leave the original swBotPattern if you are not ready to reroute bot traffic to SpeedWorkers!Click the Save button.
In the âBasic Settingsâ section:
Click the Edit button.
Set the timeout to one second.
Set the memory to 128MB.
Click the Save button.
On the âPermissionsâ tab:
Click on the role name (it will open a new browser tab).
On the âTrust relationshipsâ tab, click the Edit trust relationship button.
Set the âPolicy Documentâ to:
â{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "lambda.amazonaws.com", "edgelambda.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }
â
Click on the âUpdate Trust Policyâ.
On the Lambda browser tab, click âActionâ at the top of the page.
Select âPublish new versionâ.
Click the Publish button.
Back on the âConfigurationâ tab:
Click on âAdd triggerâ.
Select "CloudFront".
Click the Deploy to Lambda@Edge button.
Select the distribution you want to install SpeedWorkers on.
Select the behavior where the lambda should be used.
Select Viewer request as CloudFront event.
Make sure âInclude bodyâ is not checked.
Check the acknowledgment and click the Deploy button.
Your lambda will be active once the CloudFront distribution finishes its update.
Create the Origin Request Lambda
Connect to the AWS console and go to the Lambda service: https://console.aws.amazon.com/lambda
Ensure you are in the us-east-1 region (N. Virginia).
Click the Create function button.
Check the Author from scratch button.
Fill the Basic Information section:
Enter a name for your lambda, like âSpeedWorkers-Interceptor-OriginRequestâ.
Select âNode.js 18.xâ as Runtime.
Click the Create function button.
On the âConfigurationâ tab:
In the âFunction codeâ section:
Paste the code provided by Botify.
Set the configuration:
swAdnToken: The ADN token provided by Botify.
swAllowedUrls: To limit SpeedWorkers to specific URLs. If you set something here, like âhttps://www.mydomain.com/products/â, then only URLs starting with âhttps://www.mydomain.com/products/â will be handled by SpeedWorkers, like âhttps://www.mydomain.com/products/phone.htmlâ.
swRewriteOrigin: If you set something here, whatever the host of the original request, it will be replaced with this value. If you set https://www.example.com/test and the incoming request URL is https://www.otherdomain.com/index.html, then the URL requested to SpeedWorkers will be https://www.example.com/test/index.html. Useful for staging.
swBotPattern: During the validation process of SpeedWorkers, set it to â(botify-bot-sw-)â. This way only requests containing âbotify-bot-sw-â in the user-agent will be submitted to SpeedWorkers and the legitimate bot traffic wonât be affected.
swDomain: Set the same provided by Botify (something like XXXXX.sw.adn.cloud).
Click the Save button.
In the âBasic Settingsâ section:
Click the Edit button.
Set the timeout to 1 second.
Set the memory to 128MB.
Click on the âSaveâ button.
On the âPermissionsâ tab:
Click on the role name (it will open a new browser tab).
Go to the âTrust relationshipsâ tab.
Click the Edit trust relationship button.
Set the âPolicy Documentâ to:
â{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "lambda.amazonaws.com", "edgelambda.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }
â
Click on the âUpdate Trust Policyâ.
On the Lambda browser tab, click âActionâ at the top of the page.
Select âPublish new versionâ.
Click the Publish button.
Back on the âConfigurationâ tab:
Click on âAdd triggerâ.
Select "CloudFront".
Click the Deploy to Lambda@Edge button.
Select the distribution you want to install SpeedWorkers on.
Select the cache behavior where the lambda should be used (the Origin group created for SpeedWorkers). You should select a cache behavior that handles only page requests that SpeedWorkers may cache and avoid cache behaviors responding to resources or API requests to lower the lambda cost.
Select Origin request as CloudFront event.
Make sure âInclude bodyâ is not checked.
Check the acknowledgment and click the Deploy button.
Your lambda will be active once the CloudFront distribution finishes its update.
Disable Error Code Caching
By default, CloudFront caches some error codes (400, 404âŚ) for 10 seconds. This behavior has to be deactivated to avoid strange caching behaviors.
In the behavior:
Go to the Error Pages tab.
Click the Create Custom Error Response button.
Select an HTTP Error Code.
Set the Error Caching Minimum TTL to 0.
Click on Create.
Repeat the operation for all HTTP Error Codes.
Test the Configuration
SpeedWorkers implements a test request where it responds with a specific value to a specific URL. If you send a request to https://www.yourdomain.com (set the right domain and replace WEBSITEID with the website ID provided by Botify), you will get an HTTP status code 200 (OK) and a response body simply containing âSuccessâ.
Botify will perform a batch of automated tests once the lambdas are deployed in a staging environment (accessible from the outside). Otherwise, these tests will have to be performed manually.
Validating the SpeedWorkers Integration
To validate the integration of SpeedWorkers in your environment, you can send the following requests.
"Always Success" Test
The "always success" test will force SpeedWorkers to return a cache hit even if the page is not in the cache. This test ensures that SpeedWorkers is called and its reply returned to the bot:
Always Success (force a cache hit in SW)
--------------
URL: Your homepage (https://www.mywebsite.com)
Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-success,echo-67674
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify
Expected response:
Status: 200
Body: "Success"
Headers:
X-Ftlcdn-Status: false
X-Sw-Echo: 67674
X-Sw-Passed-Through: true
X-Sw-Status: success
"Cache Miss" Test
The "cache miss" test forces SpeedWorkers to return a cache miss even if it has the page in the cache. This test ensures that when SpeedWorkers can't deliver the page, the request falls back properly:
URL: Your homepage (https://www.mywebsite.com)
Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-notfound,echo-41521
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify
Expected response:
Status: 200
Body: your homepage
Headers:
NO X-Sw-... headers
"Timeout" Test
The "timeout" test forces SpeedWorkers to delay its response enough to trigger the timeout in your environment. This test ensures that when SpeedWorkers doesn't reply, the request falls back properly:
URL: Your homepage (https://www.mywebsite.com)
Headers:
User-Agent: botify-bot-sw-test
X-Sw-Options: passed-through,request-time,always-timeout,echo-42300
X-Sw-Options-Auth: XXXXXX <= the website ID provided by Botify
Expected response after several seconds:
Status: 200
Body: your homepage
Headers:
NO X-Sw-... headers
Revisit Log Files
Please revisit the log files being passed to Botify for ingestion and confirm you are sending us the following:
origin.speedworker.com
cluster.speedworker.com
Troubleshooting
If sending a request to SpeedWorkers doesnât return the expected response when testing the integration, try the following:
Replace the SpeedWorkers host (origin) with a third-party service like PutsReq, Request Catcher - record HTTPS requests, webhooks, API calls, or Beeceptor - Rest API mocking in seconds. It will help you verify that the call to SW is correct.
Remove the failover rule to get details in the edge diagnostic tools about why the request to SpeedWorkers fails, such as the Akamai reference error.
âď¸Before testing with a third-party service, change the website ID and token in the recv snippet to avoid leaking them.