π This article provides an overview of HTML Extraction in Botify, available to customers on any Botify plan.
Overview
As search engines get better at analyzing the full content of websites at scale, it is more important than ever for website owners to have a well-organized inventory of their site content. HTML extractions allow for checking specific elements of a page that might be important from a technology or SEO standpoint. Use HTML extractions to check for valuable information like inventory, the existence of certain elements, or word counts in specific areas to keep a close watch on potential changes on your site. You can also use extracted code to correlate with rankings, crawls, or traffic as indicators for future site changes or, in business cases, to make changes.
How HTML Extraction Works
HTML Extraction allows you to extract any portion of your page's HTML code and use it as a custom field or filter in your Botify crawl data. For example, the extracted code may correspond to something shown on the page, such as:
Product prices or stock volumes on e-commerce websites
Result counts in search result pages
The number of comments on blog posts
Extracted code may correspond to a technical element not visible to users, such as:
Audience tracking tags
Data layer variables
Structured data
Some structured data types are retrieved automatically and stored in the "Structured Data" folder in Botify filters and Explorers.
All URLs returning an HTTP 200 status code in a Botify crawl are evaluated for extraction. As with all project settings, HTML extracts are saved and will be applied to your future analyses in the same project. Refer to Extracting Custom Data from Your Pages to learn how to create an HTML extract.
Extracts are limited to 300 characters. If the extracted data exceeds 300 characters, it will be truncated.
HTML Extracts in Reports
Extracted HTML code is shown in SiteCrawler's Distribution > Top Charts report, which shows the number of pages where the custom data was found in Botify's crawl:
Click on the numbers in the table to get custom extraction data by page in the URL Explorer.
You can use these custom fields as filters and columns in your Botify reports.
Filters
Use your custom fields to filter reports. In the following example, the filter selects only pages where the number of posts in an archive is greater than 10:
Columns
Add your custom field as a column in URL Explorer reports:
See Also: