Skip to main content

HTML Extracts Overview

Updated over a year ago

πŸ“˜ This article provides an overview of HTML Extraction in Botify, available to customers on any Botify plan.

Overview

As search engines get better at analyzing the full content of websites at scale, it is more important than ever for website owners to have a well-organized inventory of their site content. HTML extractions allow for checking specific elements of a page that might be important from a technology or SEO standpoint. Use HTML extractions to check for valuable information like inventory, the existence of certain elements, or word counts in specific areas to keep a close watch on potential changes on your site. You can also use extracted code to correlate with rankings, crawls, or traffic as indicators for future site changes or, in business cases, to make changes.

How HTML Extraction Works

HTML Extraction allows you to extract any portion of your page's HTML code and use it as a custom field or filter in your Botify crawl data. For example, the extracted code may correspond to something shown on the page, such as:

  • Product prices or stock volumes on e-commerce websites

  • Result counts in search result pages

  • The number of comments on blog posts

Extracted code may correspond to a technical element not visible to users, such as:

  • Audience tracking tags

  • Data layer variables

  • Structured data

Some structured data types are retrieved automatically and stored in the "Structured Data" folder in Botify filters and Explorers.

All URLs returning an HTTP 200 status code in a Botify crawl are evaluated for extraction. As with all project settings, HTML extracts are saved and will be applied to your future analyses in the same project. Refer to Extracting Custom Data from Your Pages to learn how to create an HTML extract.

Extracts are limited to 300 characters. If the extracted data exceeds 300 characters, it will be truncated.

HTML Extracts in Reports

Extracted HTML code is shown in SiteCrawler's Distribution > Top Charts report, which shows the number of pages where the custom data was found in Botify's crawl:

Click on the numbers in the table to get custom extraction data by page in the URL Explorer.

You can use these custom fields as filters and columns in your Botify reports.

Filters

Use your custom fields to filter reports. In the following example, the filter selects only pages where the number of posts in an archive is greater than 10:

Columns

Add your custom field as a column in URL Explorer reports:


See Also:

Did this answer your question?