๐ This page describes filtering your log files to remove unnecessary information before sending them to Botify.
Overview
LogAnalyzer requires all the log lines from your front-end servers, including cache servers or CDNs, related to crawls by search engine bots and visits from search result pages. Use the following filtering methods to ensure you send the data required to provide the most robust reporting in Botify.
Filtering Methods
A simple filtering method is to keep all log lines that contain any of the following strings:
AdsBot|Googlebot|Mediapartners-Google|bingbot|bing|google
To send data for all supported search engines:
AdsBot|Applebot|Baiduspider|Googlebot|Mediapartners-Google|Yandex|bingbot|naver|baidu|bing|google|yandex|GPTBot
Doing this ensures you provide all useful lines to Botify. You will provide additional information, typically lines that contain these strings in fields other than User Agent or Referer. If your log lines use a key-value mechanism (JSON format, Splunk format, etc), you can specifically filter on User-Agent and Referer.
Pattern for detecting bots via their User Agents:
AdsBot|Googlebot|Mediapartners-Google
To send user agent data for all our supported search engines:
AdsBot|Applebot|Baiduspider|Googlebot|Mediapartners-Google|Yandex|bingbot|naver|GPTBot
Pattern for visits:
bing|google
To send visit data for all our supported search engines:
baidu|bing|google|naver|yandex
Stripping Private Information from Log Files
While Botify does not process any Personally Identifiable Information (PII) to build its analytics or store any PII in its databases, you can remove PII from your log files before sending them to Botify for ingestion. Most CDNs enable you to select or exclude certain fields from the logs and to remove the IP in your CDN's log settings. If you choose to filter logs on your own, please contact Support for assistance.
Contact Support
If you need any assistance, please contact Support using the email address for your region: