š This article explains how Botify evaluates your page content quality.
Overview
Website pages must be crawled and indexed to rank on search engine results pages. After they are crawled and indexed, search engines will begin to assess the quality of the content on these pages to determine if they will rank. Search Engines do not assess content quality the same way humans do. Although there is no one-size-fits-all definition of quality (this varies by industry, vertical, and even search queries), there are best practices that correlate with how often Google will crawl a page and how likely it is to rank.
Botify Analytics mimics how search engines look at your website to analyze content quality, including more than 25 metrics to evaluate page content quality.
Template vs. Core Content
Most modern pages share common areas of content: top navigation, a footer, a ārelated linksā module with the latest news, etc. This is referred to as the page ātemplateā. If you remove the content inside the template from a page, you will be left with the core content: the unique value of a page that search engines care about. Using advanced algorithms similar to those used by search engines, Botify can recognize patterns between groups of pages to automatically identify ātemplate sectionsā vs. ācore content sectionsā.
Botify automatically detects templates based on the sequence of words (n-grams), not HTML code. This means that changing the design of your website will not impact template detection.
A template is more likely to be fully detected when shared by at least 1,000 site pages.
Content Size
Botify measures content size by the number of words:
No. of Words (Template): The number of words considered as part of the page ātemplateā (which is ignored for most content quality analysis).
No. of Words (Content): The number of words on a page, excluding those in the template (core content only).
No. of Words (Total): Total number of words on a page (template and core content).
% Template Words: The percentage of words ignored on a page considered part of the template.
Content quality analysis is limited to the first 10,000 words on a page, template included.
Content Similarities and Duplication
Comparing words is insufficient to analyze similarities between pages. For this, Botify looks for a sequence of words, or ān-gramsā.
No. of Unique Word Sequences (Total): Total number of n-grams found in a page.
No. of Unique Word Sequences (Content): Number of n-grams on a page that are considered ācore contentā; this excludes those in the template.
Similarity Scores to Find Duplicate Content
Each new page should provide unique value for search engines that make it worthwhile to index and rank. Botify offers metrics to find similar pages on a website. To avoid false positives, Botify only compares pages that share the same language (based on the pages' meta http-equiv="content-languageā), are on the same domain or subdomain, and share the same HTTP or HTTPS protocol.
The metrics below exclude content from the ātemplateā of each page. The simplest way to find pages with many similarities is to use Botify's pre-calculated score that ranks all pages on a website by this criteria:
Highest Similarity Percentage: A score from 0 to 100, where the lowest score is a page that has extremely unique content relative to others on the website, and the highest score is a page that is a complete duplicate of one or more.
No. of Similar Pages: The number of indexable pages similar to the current page. Botify provides three thresholds of similarity to help identify pages with a majority of duplicated content:
(Score >= 50%): 50% or more of the content is similar
(Score >= 75%): 75% or more of the content is similar
(Score >= 90%): 90% or more of the content is similar
Similar Pages (Sample including Not Indexable Pages): The URLs of the top 15 pages with the most similarities to the current page, including their similarity percentage.
Advanced users can find a more granular explanation of a page's similarity score with these metrics related to n-grams:
No. of Unique Word Sequences (Found on this page only): The total number of n-grams found only on the current page.
% of Unique Word Sequences (Found on this page only): The percentage of n-grams unique to the current page (unique n-grams / total n-grams).
% of Unique Word Sequences (Found on this page andā¦
Only 1 Other
Between 2 and 4 Others
Between 5 and 9 Others
Between 10 and 99 Others
Between 100 and 999 Others
Between 1000 and 9999 Others
more than 9999 Others
Similarity Scores to Confirm Pages are Similar
Sometimes, you may want to ensure the content is similar between multiple pages. Botify tracks content uniqueness between a page and its known AMP, mobile, and canonical versions.
Similarity with AMP Page: Compare the current page with its known AMP page (templates excluded).
Similarity with AMP Page (Ignore Nothing): Compare the current page with its known AMP page (templates included).
Similarity with Canonical Page: Compare the current page with its known canonical (templates excluded). Canonical is found using the meta data rel="canonical".
Similarity with Canonical Page (Ignore Nothing): Compare the current page with its known canonical (templates included).
Similarity with Mobile Page: Compare the current page with its known mobile version (templates excluded) when Botify crawls both desktop and mobile versions on separate URLs.
Similarity with Mobile Page (Ignore Nothing): Compare the current page with its known mobile version (templates included).
FAQ
If a page has the same words but in a different order, will the similarity score be the same?
If a page has the same words but in a different order, will the similarity score be the same?
No, but only by a little. Because the similarity score is based on n-grams (sequence of words), changing the order of the words will impact the unique n-grams and, therefore, the similarity score.
Are meta information such as meta-title and meta-description included in the similarity score?
Are meta information such as meta-title and meta-description included in the similarity score?
As a general rule, anything visible to users is included. Meta-title is included in the similarity score, not the meta-description or structured data. Image āaltā attributes and the image filename are also included. Botify also includes metrics dedicated to HTML tags, such as āTitle Qualityā and āMeta Description Qualityā, that are available in the filters āHTML Tagsā section and in SiteCrawler's Content report.
Can I see exactly what was considered a page ātemplateā?
Can I see exactly what was considered a page ātemplateā?
No, we do not offer a visual representation of what our algorithm considers a template or page core content.
What does the metric āEvaluatedā mean?
What does the metric āEvaluatedā mean?
Content Quality is only evaluated on useful pages. Botify excludes error pages, redirects, etc. āEvaluatedā allows you to filter URLs only on those with content quality metrics.
Does Content Quality work with CJK languages? (Chinese, Japanese, Korean)
Does Content Quality work with CJK languages? (Chinese, Japanese, Korean)
Yes, Botify supports CJK languages. These languages do not have words similar to the English language, so instead of looking for "n-grams" (i.e., sequence of words), Botify looks for sequences of characters (i.e., symbols).
See also: