Skip to main content
Filtering Log Data
Updated over 3 months ago

๐Ÿ›  This page describes filtering your log files to remove unnecessary information before sending them to Botify.

Overview

LogAnalyzer requires all the log lines from your front-end servers, including cache servers or CDNs, related to crawls by search engine bots and visits from search result pages. Use the following filtering methods to ensure you send the data required to provide the most robust reporting in Botify.

Filtering Methods

Use the following methods to filter your log data. Your Botify subscription plan defines the search engines to include.

A simple filtering method is to keep all log lines that contain any of the following strings:

AdsBot|Googlebot|Mediapartners-Google|bingbot|bing|google

Alternatively, include only the search engines included in your subscription plan.

All Search Engines and Bots

To send data for all supported search engines and bots:

AdsBot|Applebot|Baiduspider|Googlebot|Mediapartners-Google|Yandex|bingbot|naver|baidu|bing|google|yandex|GPTBot|Amazonbot|Anthropic|Bytespider|CCBot|ChatGPT|Claudebot|Claude|Facebook|Meta-External|OAI-SearchBot|Perplexity|YouBot

This ensures you provide all useful lines to Botify. You will provide additional information, typically lines that contain these strings in fields other than user agent or referer. If your log lines use a key-value mechanism (JSON format, Splunk format, etc), you can specifically filter the user agent and referer.

All AI Bots

To send data for all supported AI bots:

GPTBot|Amazonbot|Anthropic|Bytespider|CCBot|ChatGPT|Claudebot|Claude|Facebook|Meta-External|OAI-SearchBot|Perplexity|YouBot

Stripping Private Information from Log Files

While Botify does not process any Personally Identifiable Information (PII) to build its analytics or store any PII in its databases, you can remove PII from your log files before sending them to Botify for ingestion. Most CDNs enable you to select or exclude specific fields from the logs and to remove the IP in your CDN's log settings. If you choose to filter logs on your own, please contact Support for assistance.

Did this answer your question?