Filtering Log Data
Updated this week

๐Ÿ›  This page describes filtering your log files to remove unnecessary information before sending them to Botify.

Overview

LogAnalyzer requires all the log lines from your front-end servers, including cache servers or CDNs, related to crawls by search engine bots and visits from search result pages. Use the following filtering methods to ensure you send the data required to provide the most robust reporting in Botify.

Filtering Methods

A simple filtering method is to keep all log lines that contain any of the following strings:

AdsBot|Googlebot|Mediapartners-Google|bingbot|bing|google

To send data for all supported search engines:

AdsBot|Applebot|Baiduspider|Googlebot|Mediapartners-Google|Yandex|bingbot|naver|baidu|bing|google|yandex|GPTBot

Doing this ensures you provide all useful lines to Botify. You will provide additional information, typically lines that contain these strings in fields other than User Agent or Referer. If your log lines use a key-value mechanism (JSON format, Splunk format, etc), you can specifically filter on User-Agent and Referer.

Pattern for detecting bots via their User Agents:

AdsBot|Googlebot|Mediapartners-Google

To send user agent data for all our supported search engines:

AdsBot|Applebot|Baiduspider|Googlebot|Mediapartners-Google|Yandex|bingbot|naver|GPTBot

Pattern for visits:

bing|google

To send visit data for all our supported search engines:

baidu|bing|google|naver|yandex

Stripping Private Information from Log Files

While Botify does not process any Personally Identifiable Information (PII) to build its analytics or store any PII in its databases, you can remove PII from your log files before sending them to Botify for ingestion. Most CDNs enable you to select or exclude certain fields from the logs and to remove the IP in your CDN's log settings. If you choose to filter logs on your own, please contact Support for assistance.


Contact Support

If you need any assistance, please contact Support using the email address for your region:

Did this answer your question?