Skip to main content

Botify Glossary

Updated over a year ago

📘 This article defines terms used in Botify.

Overview

This is a list of commonly used acronyms, abbreviations, and terms you may find throughout Botify. Some terms are unique to Botify, and others are common in the SEO industry.

Terms and Definitions

A-C

Active Page: An active page in Botify is a page that has received at least one organic visit over a specific period. In SiteCrawler, active pages are any URLs that have received one or more organic visits during the 30 days before the crawl completion date, as reported by the third-party analytics data integrated with the project (Adobe, GA, Piano). In LogAnalyzer and RealKeywords, the number of active pages, either via organic visits shown in logs or clicks seen from Google Search Console, is updated depending on the selected date range.

Tracking active pages can be especially useful when investing in specific site sections because you can easily identify the visit volume and the number of active pages by segment. Active pages are also a valuable performance metric, as increasing both the number of total and active pages can have a significantly more positive result than simply increasing the number of visits to a fixed set of pages.

AMP Pages: The Accelerated Mobile Pages (AMP) project was announced in 2015 as a new web framework designed to serve faster and lighter pages to users to reduce load times, specifically for mobile users. Originally designed for news publishers, AMP is now a framework made available to any sites that want to leverage its light and simple AMP-HTML templates, as well as the Google network of servers that can be used to load pages almost instantly for most users.

While there is no positive SEO ranking factor associated with AMP pages, there are still significant user experience benefits to consider when sites are deciding whether to use the AMP framework for their pages. Users who receive content faster and can engage with more pages on a site are more likely to convert (i.e., complete the desired action, such as multiple page views, making a purchase, or signing up for a newsletter).

Anchor Text: Anchor text is the text component of a link on the web that is visible to users. The text in a link is used both by users and search engines to get more information about what they can expect to see on the page that is being linked. Whether the link points to another page on the same site, or to an entirely different website, anchor text is usually the contextual indicator that describes the relationship between two pages.

In practice, anchor text should be descriptive and informative so users know what to expect when clicking the link. The same practice is relevant for search engines, which use anchor text as a signal to assign relevance to pages that are being linked to. As a best practice, sites should avoid generic anchor text like “click here” and opt for more specific language to the benefit of users and search engines.

API: Application Programming Interface

Botify Indicator: Metric available in Botify that provides information about a page or a group of pages generated by data from the Botify crawler and/or additional data sources (for example, LogAnalyzer, web analytics, or Google Search Console).

Blacklisted Domain: A domain that has been removed from a search engine's index. Domains may be blacklisted due to malicious activity, abuse of terms and conditions (TOC), or evidence-based abusive behavior. Blacklisted domains are excluded from Botify crawls.

Branded Keywords: Keywords and phrases that include a company’s name and products and/or variations of them (including misspellings). The indicator for these keywords is “Branded Keyword” (singular) in Botify.

Canonical Tags: Canonical tags are HTML attributes for search engines to identify and consolidate duplicate content. When a URL has a < link> tag with the rel=”canonical” attribute in the < head> section of the HTML, search engines use that signal to identify the authoritative version of a piece of content on the web. Canonical references can point to the page they are on (self-referencing), pages elsewhere on the same site, or pages on an entirely different domain. Canonical tags are valuable tools when you need multiple versions of the same or very similar content on your website without having to worry about asking search engines to index duplicate content.

A URL (page A) with a canonical tag pointing to another page on the site (page B) is a signal to search engines to consolidate all of the signals about page A to page B. In addition, when a human user searches for content on page A, a search engine might serve page B instead since the pages have been identified as candidates for consolidation. For normal browsing on a website, canonical tags do not have consequences for human users outside of organic search.

Canonical tags are hints to most search engines. Search engines rely on many signals, including their own analysis of the content on each page, to determine which pages should be consolidated in their index. Therefore, it is possible for a search engine to ignore canonical signals altogether or, conversely, to consolidate two pages that do not have a canonical relationship.

CDN: Content Delivery Network

CLP: Custom Landing Pages

CMS: Content Management System

Click Potential: A metric calculated to estimate additional clicks you may receive if a page ranks in the top three positions in search engine results pages. Read the full description in the How Click Potential is Calculated article.

Compliant Page: See Indexable.

Crawl (or crawling): The process in which search engines and Botify “spider” through a website to access, render (if possible), and analyze its pages’ HTML, CSS, and JavaScript.

Crawl Budget: Crawl Budget is the amount of crawling resources that search engines allocate to a site since they have finite resources and can typically crawl only approximately 50% of enterprise websites. Crawl budget may be optimized to improve indexing and grow organic traffic.

Crawl Depth: A Botify setting indicating how many clicks from the start page (typically the homepage) the crawler is allowed to go to explore the site.

Crawl Ratio: The percentage of URLs in your website structure crawled by a search engine robot.

Crawled URLs: Crawled URLs in SiteCrawler are the number of pages on a website that Botify successfully crawled and analyzed. The size of the site and the limit on the number of crawled URLs in the analysis settings determine the number of crawled URLs. When an analysis has a crawled URL number that is lower than the limit defined in settings, we can assume that Botify has successfully identified every page in the linked structure of the website as the settings allow.

CRM: Customer Relationship Management (software)

CTR: Click-Through Rate

CX: Customer Experience

CS: Customer Success

D-H

Discovered URLs: In SiteCrawler, discovered URLs are the number of unique URLs the crawler identified in the analysis. When the number of discovered URLs is higher than the number of crawled URLs, there were still pages in the queue to be crawled but the crawl was either stopped or reached the URL limit as defined in the analysis settings. When the number of discovered URLs is much larger than the number of crawled URLs, there are likely large sections of the site that have yet to be crawled by Botify, and you should restart the crawl with a higher limit. When the number is inflated by unnecessary URL parameters or subdomains and alternate versions that do not need to be part of the analysis, you may want to reduce the number of discovered URLs in settings.

Duplicate Content: Duplicate content generally refers to pages within a website that have the same or very similar content. Technical duplicates commonly occur when a website is configured to serve the same content on two different URLs, such as URLs with a tracking parameter that does not change the content on the page. Duplicate content may also include pages intended to have unique content but not properly differentiated from similar pages on the site. Publishing sites, for example, may unintentionally create duplicate content by creating a “Best of 2019” and “Best of 2018” post, where there is a significant overlap in the content from both years.

Search engines typically use canonical tags and other signals to consolidate duplicate pages on a website properly, but unmanaged duplication can be a significant waste of crawl budget. Any kind of duplicate content can make it significantly more difficult for site owners to monitor web traffic.

Emulated Device: Whether the URL was crawled as a mobile, desktop, or tablet device.

Event DOM Content Received: The browser fully parsed the HTML document, and the DOM tree is built, but external resources like pictures <img> and stylesheets are not yet loaded.

Event Load Received: The browser loaded all mandatory resources (images, styles, etc.). This does not indicate the page is ready since scripts are running and may load third-party resources/API calls.

First Contentful Paint (P2): When useful content appears on the page. This is the first time users can start consuming page content.

First Image Paint: When the first images appear on the page (part two of First Contentful Paint).

First Meaningful Paint (P3): When the page's primary content appears on the screen. This is a primary metric for the user-perceived loading experience that indicates when the browser has started to render the page.

First Paint (P1): The moment that any change in the page becomes visible to a browser.

FTP: File Transfer Protocol

GDPR: General Data Protection Regulation

Google Analytics/GA: A web analytics tool that allows users to track and report website traffic.

Google Data Studio (GDS): See Looker Studio.

Google Search Console (GSC): A tool that measures your site's Google organic search traffic and performance, enabling you to fix issues and discover some of the terms users are searching on Google that lead to clicks to your website. RealKeywords connects to the GSC API for more robust reporting and analytics.

Hreflang Tags: Hreflang tags are an HTML attribute that tells search engines about alternate versions of content in another language. The tags indicate the specific country-language combination that should be associated with each URL referenced. Hreflang tags most commonly appear in a webpage's <head> section with multiple language variations but can also be listed in XML sitemaps and HTTP response headers.

When sites serve users in multiple geographies that use different languages, hreflang tags are the strongest signal to a search engine that two pages in different languages are alternate versions. Therefore, if a page has an English and French version, the hreflang attribute can tell search engines when users search in French, the French version of the page should surface in search results instead of the English page.

International targeting works across domains so that sites can designate example.com, example.fr, and example.de, and search engines will recognize those URLs as alternate versions of the original on example.com. Pages using hreflang annotations should have a self-referencing hreflang attribute, and an hreflang attribute pointing to any other versions of the same page in different languages. Optionally, sites may choose to set an x-default language to signal the version of the page to be served when there is no specific language match for the user.

HTML Extract: A Botify feature that enables you to create custom fields from data found when Botify crawled your site, such as product prices and stock quantities, using CSS selectors or regular expressions.

In the Same Zone: A combination of the protocol and language. For example, if both HTTP and HTTPS protocols and EN-GB and EN-US for languages exist, we would only look at HTTP or HTTPS + EN-GB or EN-US.

I-M

Inactive Page: Pages on a website that have not received organic visits over the last 30 days in SiteCrawler, or the date range selected in LogAnalyzer or RealKeywords. When pages are inactive, it is generally a sign that they are either non-indexable or not performing well enough in search results to earn organic traffic. Finding all pages that are indexable but not active can be a valuable opportunity to understand which pages on a website are eligible to receive organic visits but have not over the given time period.

Indexable: URLs that are eligible to be indexed by a search engine. Indexable URLs served a 200 status code, did not have a noindex meta tag, did not have a canonical tag pointing to a different URL, and had a content type of text/HTML when Botify crawled them. URLs that are indexable by this definition have an opportunity to be crawled and indexed by search engines and potentially earn organic traffic from search results. The number of indexable URLs on a site can vary widely but is usually a good representation of the number of pages on a site with SEO opportunities.

Inlinks/backlinks: Internal linking is the foundation of how search engines discover and rank content. Whether analyzing internal links within the same website or between two separate websites, search engines use these signals to determine the relevance and importance of each page.

Within a single website, it is common to see pages the site links to most often are the pages that search engines crawl most often and the pages that receive the most organic visitors. This is due to search engines’ ability to determine the relevance of a page from linking signals within a site. When a page is linked very often, it is seen as an important part of the site.

The same can be said for incoming links from other websites, often called backlinks. A backlink is a link from one site to another site. Search engines use backlinks as a vote of confidence in the page being linked since pages linked to more often are also likely to be the most popular with users. Backlinks are typically regarded as one of the most important ranking factors in SEO but are among the hardest to influence.

IP: Internet Protocol

Is Main Keyword: For a given URL+Keyword combination, equal to "Yes" if the keyword is the keyword generating the most clicks for the URL.

Is Main URL: For a given URL+Keyword combination, equal to "Yes" if the URL is the URL generating the most clicks for the keyword.

JS: JavaScript

JS Rendering Attempted: Whether JavaScript rendering was attempted during Botify's crawl.

JS Rendering Successful: Whether page rendering was successful during Botify's crawl.

JS Render Time: The total time to render the page (not tied to the other timing metrics).

Keyword Position for URL: Botify sorts keywords by the number of clicks for a given URL (i.e., position) to compute this metric. The Main Keyword generates the most clicks and has a "Keyword Position for URL" equal to 1. The keyword generating the second most clicks has a "Keyword Position for URL" equal to 2, etc. If there are no clicks for the given URL, keywords are ordered by impressions.

Load Time: HTML load time can be separated into two major topics: Time to First Byte (TTFB) and HTML Load Time. These metrics refer to the time it takes a user or search engine bot to receive the HTML document from your website. Time to First Byte is the number of milliseconds it takes to receive your website's first byte of HTML. HTML Load Time measures the number of milliseconds it takes to receive the last byte of HTML from your website. HTML Load Time is an accurate measure of the time it takes the user to receive the full HTML document of a webpage.

These two key load time metrics do not include the time it might take a browser to request additional images, JavaScript, CSS files, or tracking pixels. Many factors can impact web performance from whether your site uses a CDN, your users' geographic location, or the downloaded content's size. If your site is experiencing slow HTML load times, consider checking that your site is hosted in a way that makes it easily and quickly accessible for users from all locations within your target markets, and ensure the HTML documents on your website are not excessively large.

Looker Studio: A data visualization tool that enables you to create custom reports and illustrations of your data. Formerly known as Google Data Studio.

Main Keyword: The Keyword bringing the most traffic for the selected dimension(s).

Main URL: The URL bringing the most traffic for the selected dimension(s).

Meta Descriptions: Meta descriptions are HTML attributes that enable search engines to display a snippet of content from a search engine results page. Although the meta description is typically not visible on the webpage, it will appear in search engine results pages to give users an idea of what to expect before they get to a page. Although they are not an explicit ranking factor in organic search, meta descriptions are a valuable way to encourage more users to click on a given site from search results. Sites will often use promotional language or a call to action in search results so users skimming the results know what to expect from the website.

Meta descriptions should be 155-160 characters for the full snippet to be displayed on a search engine results page. Longer descriptions may be truncated or switched out for other content on the page that the search engine sees as relevant.

Mobile Pages: An increasing number of users browsing the web on mobile devices worldwide means having accessible content to those users is becoming more important. Along with users switching to a mobile-first world, Google now uses the mobile version of a website as the primary content when determining how pages rank in an environment called the “Mobile First Index”.

Most modern websites make their content accessible to users on mobile devices in various ways - responsive design, dynamic serving, and separate mobile websites. Each type of mobile page has its considerations for SEO, but responsive design is generally the most common and the least error-prone implementation.

Responsive design is the most common way of serving content to mobile users and involves coding a website in a way that allows the content to be resized to fit any screen. Mobile, desktop, and tablet users with screens of any size receive the same HTML for a given URL, and styling elements change font sizes, borders, and other factors to make the site appear correctly.

Dynamic serving websites have multiple versions of an HTML document specific to the type of device requesting the page. These websites detect what type of device is requesting a page and serve the corresponding HTML document for that screen size and device type so that users can access content across different devices in an optimized way.

Finally, separate mobile sites leverage a different subdomain to host the mobile content, like m.example.com. Separate mobile sites rely on search engines to serve the correct version of a page in search results depending on the device a user is on and redirect to ensure that users are sent to the appropriate site for their page. Separate mobile sites are decreasing in popularity as new technologies make responsive design much easier to implement without maintaining two website versions.

N-R

Noindex Tagging: A noindex meta tag is a directive specifically used by search engines to signal that a page should be excluded from the search engines’ index. Noindex is considered a directive that must be followed by most search engines and is one of the quickest and most effective ways for a page to be removed from a search engine index. Noindex tags can be served in an HTML document's <head> tag or an HTTP response header within the x-robots tag.

Using the noindex tag is common to prevent content from being indexed that is not intended for search engines. You might find a noindex tag on private internal documents, shopping cart pages, or pages in a staging environment that has yet to go live. The most common form of a noindex tag is a “robots” noindex, which sends the directive to any search engine bot that encounters it. You can also specify individual search engine bots you want to follow the directive, like a “Googlebot” or “Bingbot” noindex.

No. Keywords to Reach 90% Page Clicks: The number of keywords to make up at least 90% of the traffic to the given URL.

No. of Calls that Returned a JSON: Any resources that returned a JSON response (typically an API) that would populate content on the page.

No. of CSS Found: The number of CSS files found by the rendering process.

No. of Fonts Found: The number of font files found by the rendering process.

No. of Images Found: The number of image files found by the rendering process (e.g., .jpg, .png).

No. of Media Found: The number of media files found by the rendering process that are not CSS, fonts, or images (e.g., .mp4, .mp3).

No. of Resources Allowed by JS Settings: The number of resources allowed by JavaScript settings, which Botify configures. Botify does not allow analytics by default.

No. of Allowed Resources Blocked by Robots.txt: The number of resources allowed by JavaScript settings but blocked by robots.txt. This metric helps identify specific override settings.

No. of Resources Failed to Fetch: The number of resources Botify could not fetch because of bad status codes, invalid URL, or invalid method.

No. of Resources Executed: The number of resources Botify executed to render the page (e.g., JS, CSS).

No. of Resources Found: The number of all resources found in the page source code (e.g., JS, IMG), including those excluded by the settings.

No. of Resources Not Cacheable: The number of resources Botify crawled that disallowed caching.

No. of Websocket Calls: The number of WebSocket calls to allow communication between a server and browser. Googlebot's rendering service does not support WebSocket calls.

No. of XHR Calls: The number of calls based on XHR. Typically used by an AJAX process, Googlebot historically has had trouble with XHR calls.

No. URLs to Reach 90% Keyword Clicks: The number of URLs to make up at least 90% of the traffic to the given keyword.

Non-branded Keywords: Keywords that do not include any part of a brand or product name as identified in your Keyword settings, including misspellings. Within Botify, the indicator of these keywords is “branded = no.”

Non-Compliant page: See Non-indexable.

Non-Indexable: A non-indexable page is a page that is not technically eligible to be indexed by a search engine. URLs are non-indexable if they have a non-200 status code, a noindex meta tag, a canonical tag that points to a different URL, or a content type other than text/HTML.

If a page is marked as non-indexable, it does not necessarily mean the page is broken, or something was set up incorrectly, it is simply a high-level indicator that the page is not likely to be indexed by search engines. Many websites use canonical and noindex tags properly to keep pages from being included in search engines’ indices or to consolidate duplicate content. You should monitor these pages closely to ensure they do not constitute a significant waste of crawl budget and that the signals are in place intentionally.

NPS: Net-Promoter-Score

Orphan Pages: Pages within a website that are not linked within the site’s linking structure, making them difficult for internet bots to find and crawl. Orphan pages shown in Botify may be found in an external data source (e.g., web server log files, website analytics, Google Search Console) but not by the Botify crawler because the URL is not linked in the website structure defined by the crawler settings.

PA: Page Authority

Pagerank/PR: The PageRank definition from Google is applied to the pages crawled by Botify.

Page Depth: A Botify indicator that reveals how many clicks are needed to reach the page from the start URL (usually the homepage), using the shortest path available to reach the page.

Page Templates: Page templates are typically the sections of pages repeated across all pages or all pages of the same type. Templates typically incorporate the main menus, footer links, and navigational links on a website's main pages. Well-structured templates make it easy for users and search engines to find and navigate the most important pages on your website efficiently.

Templated content is typically treated as separate from the primary content of a page by search engines. The content and links within a page template are not weighted as heavily when search engines rank individual pages, as they are typically repeated across most pages on the site. However, search engines use signals within page templates like top-level navigation links to better understand the structure of a website.

PDP: Product Details Page

PLP: Product Landing Page

POC: Proof of Concept

Project: A collection for a single or multi-domain website or website section, in which successive analyses are automatically compared to visualize trends and measure changes for each indicator and each URL.

QA: Quality Assurance

QC: Quality Control

Regex (or regular expression): A sequence of characters that define a search pattern. Used in search engines, word-processing “find” functions, programming languages, and more, regex often indicates preset, standard syntax for denoting patterns for matching text.

Render Budget: As with “crawl budget”, this indicates the amount of a website’s JavaScript that Googlebot renders after crawling the site.

Redirects (including chains): Redirects are a class of HTTP status codes (3xx) indicating a URL has been moved permanently or temporarily by the website. The two most common forms of redirects are 301 (permanent redirect) and 302 (temporary) redirects, which are effective ways to manage the content on a website and ensure that users or search engines do not find old or temporarily unavailable content.

Redirects have consequences for both users and search engines. When a user clicks on a link that redirects to another page, they must wait for the server to process the first request and then be redirected to a second URL, resulting in an additional request to the website before they get the final content. This added request and waiting period can add significant latency, especially on mobile, for users expecting to get content quickly. The same latency issues apply to search engines as well. Additionally, search engines encountering 3xx redirects on a website may not have a clear signal for which URL version should be indexed.

These negative consequences can be multiplied when a URL redirects to another page that also redirects, which is known as a redirect chain. Pages that redirect multiple times add to the significant waiting period for users and become increasingly likely to be ignored by search engines as the number of redirects in the chain increases.

RK: RealKeywords (formerly Botify Keywords)

Robots.txt: A robots.txt file is a resource hosted on a website that intends to provide crawling directives to search engines. A robots.txt file tells search engines which sections, or specific pages on a site, a search engine can crawl. In addition, the directives can prevent search engines from crawling specific resources like images, CSS files, or JavaScript files.

Robots.txt files are not used to keep pages out of a search engine’s index, they simply tell search engines not to crawl them. Most major search engines and web-archiving tools respect the robots.txt standard, and Google announced in 2019 that they are working towards creating a web standard for the robots exclusion protocol.

The file must live at the root of a domain (example.com/robots.txt) and has an allow/disallow syntax that supports wildcard matches to give site owners complete control over which folders, pages, or resources they want search engines to be able to crawl.

ROI: Return on Investment

ROAS: Return on Ad Spend

S-Z

Status Codes: HTTP status codes are a numeric value a server returns in response to a request. The specific response code, consisting of three numbers, gives the requester (like a browser or a search engine bot) more information about what to expect, whether content, a redirect, or an error. The first number in a status code is used to classify the response into the following families:

  • 1xx informational response – the request was received

  • 2xx successful – the request was successfully received

  • 3xx redirection – something else will happen before the request is completed

  • 4xx client error – the request cannot be fulfilled due to an error by the requester

  • 5xx server error – the server failed to produce a request

Each three-digit status code has a specific and universal meaning so browsers and other requesters know exactly what to expect or not expect from a request.

2xx and 3xx status codes indicate normal responses and are the most common to find on a website. Users might encounter 4xx errors, like a 404 when a URL is mistyped or a link contains a misspelled URL. Users might encounter 5xx errors, like a 503 status code when a site is down for maintenance or the server is overwhelmed with the traffic it receives.

SEO: Search Engine Optimization

SFTP: Secure File Transfer Protocol

SLA: Service Level Agreement

Start URL: The entry point/s for a crawler or a user. In Botify, all start URLs must be in allowed domains. When using multiple start URLs in Botify, the start URLs will all be at depth 0, which may impact the reported depth structure.

SSH: Secure Shell

Title Tags: HTML title tags (< title >) signal the main topic of a webpage or describe to users what they can expect to see when browsing search results. Title tags are apparent to users in two places: on the tab within a browser and as the main snippet in organic search results. Search engines use title tags as an indicator to understand the primary topic of a page and display them in search results to assist users. Title tags should include a webpage's primary keywords or topics and ideally be around 70 characters to allow the full title tag to appear on search engine results pages.

Thin Content: In SEO, thin content typically refers to content or web pages with little to no content compared to similar pages covering the same topic. Thin content can range from poorly merchandised category pages on an e-commerce site with one or even zero products to articles on a publisher site that are not long enough to cover the topic comprehensively. There is no specific threshold for when content becomes “too thin”. Thin content is only a concept to describe pages that are underperforming or even not indexed because of the lack of content.

When identifying thin content, a good place to start is by looking at the word count of each of the pages on your site and starting with the bottom 25% to see if there are pages that need more content. There may be pages with very little content that can still perform well in organic search because they accurately and descriptively cover the concepts that the page is intended to rank for.

UI: User Interface

URL Parameter: A parameter name and value inserted in the URL's query string (after a"?").

URL Structure: A Uniform Resource Locator (URL) is the standard method that the modern internet uses to name resources and files on the web. There are several parts of a URL:

Protocol - https://www.example.com/files/new_file?trackId=

Subdomain - https://www.example.com/files/new_file?trackId=1

Domain - https://www.example.com/files/new_file?trackId=1 

Path - https://www.example.com/files/new_file?trackId=1 

Query string - https://www.example.com/files/new_file?trackId=1

Each part of a URL is a website indicator about what exact file the browser is looking for, to ensure that users can quickly locate exactly what they are looking for. Search engines also use URLs for the same reasons, and look at different parts of a URL, including the content and keywords mentioned, to determine where content lives within a site structure.

The folder structure of a URL path should ideally reflect the site’s hierarchy so that it is clear to users what section of the website they are in. The number of folders in a URL, however, does not necessarily have any bearing on how a search engine may interpret or evaluate the content on a given page. A URL with five different folder paths and a URL with one path have an equal opportunity to perform well in organic search results. When search engines evaluate site structure, one of the most important factors is the distance from the site's home page, often called “depth”. Websites should ensure that URLs are descriptive and easy to read for users and search engines and that strategic pages are not too many clicks from the homepage.

User Agent/UA: A user agent allows the network protocol peers to identify the application type, operating system, and software vendor or software version of the requesting software user agent. The UA crawler can be configured in Botify in advanced settings.

URL Rewriting (or URL Manipulation): Changing any part of a given URL. As a Botify setting, this indicates how the crawler can be set up to rewrite URLs to behave as if it had seen the rewritten URL instead of the original URL in any link or tag.

UX: User Experience

XML Sitemap: XML Sitemaps are the authoritative list of the most important URLs on a website to be served to search engines. XML Sitemaps can also consist of image and video files to give search engines quick access to important assets on a site. Search engines use XML Sitemaps as a source of information to crawl sites intelligently and quickly identify new content. Aside from listing URLs, you can also include important information, such as when a page was last updated, how frequently it changes, and any alternate versions of the page in another language.

XML Sitemaps can contain up to 50,000 URLs, images, or video files. For sites requiring a larger set of URLs, you can use an XML Sitemap index, which consists of up to 50,000 additional linked sitemaps. XML Sitemaps are most useful for very large sites or sites that consistently publish new content or update content frequently. New sites may also benefit from XML Sitemaps because search engines can quickly crawl and prioritize a website's full contents. Visit the SiteCrawler Sitemaps report to compare how Botify's crawler saw your site vs. what is included in your sitemaps.


See also:

Did this answer your question?