Botify PII Management
Updated over a week ago

🛠 This article explains how Botify protects Personally Identifiable Information (PII) in the platform.

Overview

Botify is committed to building a secure platform and takes the utmost care about user privacy. Our platform monitoring includes security monitoring, and we carry out recurrent security audits of our platform to ensure we take appropriate action at all times.

👍 Botify does not process personal information to build its analytics or store any PII in its databases.

PII in Botify LogAnalyzer

Botify does not request any PII from customers in the raw log files and does not process or store any PII if any is provided. In most cases, the only data classified as PII in the raw log files is the client's IP Address. Additional PII may exist in some cases, such as first or last names in some URLs, cookies, geolocation, etc.

Why Botify Uses the Bots' IP Addresses

We use the IP Address for one purpose only: When reading a log line for the first time, we use the IP Address present in crawl lines to verify the source of the crawler and ensure it is authentic. For example, if we read a crawl line stating that Googlebot crawled a page, we check the ownership of the corresponding IP address, which is public data. If the IP address belongs to Google, we keep the line for further processing. We discard the entire line if the IP address does not belong to Google.

Where the IP Address is Stored

After the first check, we never use the IP Address for any other purpose. The IP address is stripped out in the first step of the logs computation process, does not enter the processing chain, and is not stored in our databases. IP Addresses are only present in the raw log files in the airlock (the FTP/FTPS/SFTP endpoint used for logs delivery) and their backup.

We usually keep the raw log files for a few months in case we need to recompute things, change configuration, etc., but this is not mandatory: Botify can commit to not backing up the raw logs and to delete the raw logs after processing if requested by the customer.

Additional PII in Log Files

Log Analysis for SEO purposes only relates to data the customer wants to make public and indexed by search engines. Therefore, Botify LogAnalyzer only needs public data. In some cases, log files may contain additional PII in some URLs, such as:

  • Cookies

  • User names or firstnames/lastnames in URLs

  • Geolocation
    If these URLs are not intended to be indexed by search engines or are useful for SEO purposes, you do not need to monitor the crawl ratio or visits on these URLs, and Botify does not need these URLs to be uploaded to Botify LogAnalyzer. In this case, we recommend the following process:

  • Crawl these URLs with Botify Analytics and verify they are either not accessible from our crawler or listed as non-indexable in the Botify Analytics report

  • Filter out those URLs on the customer side in the log files before uploading the logs. These URLs will not be in the crawl or visit lines, so filtering to keep only crawls and visits will remove the PII.

If these URLs are intended to be indexed by search engines, we recommend you remove the personal information from the URL in the logs and send us the logs with URLs without the additional information (geolocation, cookie, etc.). With this method, customers can monitor the crawl ratio on these URLs and match them with those crawled by Botify to work on orphan URLs.

Customer Responsibility

We recommend you filter your log files before delivery to Botify for the following reasons:

  • To remove all lines that are not crawl or visit lines.

  • To hide the IP address in all visit lines.

  • To hide any other PII in all crawl and visit lines.
    Doing this ensures the only IP addresses that remain are IP Addresses of crawlers (bots) and do not qualify as PII.

You can also remove the IP address in all log lines before sending the logs to Botify: Botify will deliver a dashboard that counts all crawls declared from a bot as real without removing fake bots. If you prefer to send us raw logs that include IP Addresses for all lines, Botify commits to discard the IP addresses as soon as the Bot Authenticity check is done. IP Addresses are not processed and are not stored in our databases.

PII in Botify Analytics and RealKeywords

Botify Analytics does not use any PII information and only processes public data already available on your website. If Botify Analytics is integrated with Google Analytics, Google Analytics Premium, Adobe Analytics, or Google Search Console, Botify only collects consolidated data already stored by third-party software outside the customer’s perimeter.


Contact Support

If you need any assistance, please contact Support using the email address for your region:

Did this answer your question?