📘 This is a reference for segment rules and syntax for defining your project segmentation in Botify.
Overview
Use the following as a guide when creating your project segmentation.
Basic Structure
A list of rules defines segments. Each rule specifies a pattern that is part of the URLs to be included in the segment.
For example:
URLs with /products/ can be grouped into a segment with product pages
URLs with /category/ can be grouped into a segment with navigation pages.
Each rule begins with the name of the segment, preceded by @, then lists the conditions for a page to be placed in the segment:
@product
path /products/*
Each set of segment rules must start with a line that defines the segment name between square brackets. For example, for a set of segment rules called "pagetype":
[segment:pagetype]
List all possible values for that set of segment rules, with one or more rules for each value. Each named segment (value) contains one or several conditions to be met for a URL to be placed in the segment.
For example, consider the following simple segment settings:
[segment:pagetype]
@product
path /products/*
@navigation
path /categories/*
@forum
path /forum/*
Botify will evaluate the rules for each URL and place the URL in the first segment where it meets all conditions (i.e., the first "match"). The segment rules are processed from top to bottom; therefore, the order of the rules is important.
Segment Rule Operators
A segment is defined by one or more lines of the following form, each defining a condition to be met:
<selector> <url_pattern>
Selector: The part of the URL that you want to match.
Pattern: How the selected part of the URL must look (a character string with variable parts, using wildcards or a regular expression).
Selectors
The following selectors specify which part of the URL should be tested. For example, if the URL is:
url tests the entire URL string:
protocol tests the protocol only:
domain (or host) tests the domain name only:
path tests the part of the URL that begins after the domain name and stops before the query string, if any, at the end of the URL:
query tests the query string, which is after the question mark:
path-query tests the path and query string:
For example, a product segment with URLs of the form:
could be defined like this:
@product
path /product/*
query productid=*
Note that all conditions must be met for a URL to be placed in the segment (each line must be true). There is an implied "AND" operator between each line.
Logical Operators
Use the following logical operators in your segment rules:
AND
"AND" is automatically implied between each line of a segment rule. For example, the following rule matches URLs where the protocol must be HTTPS and the path must begin with /product/:
@product
protocol https
path /product/*
OR
There are two ways to implement an "OR" statement. You can specify it explicitly in the rule, as shown here:
@navigation
or (
path */category/*
path */subcategory/
)
Or you can write several rules - one for each possibility - with the same segment name (value). Each rule describes a possible way of placing a URL in the segment. Using the "OR" example above, the following would use two rules instead of one:
@navigation
path */category/*
@navigation
path */subcategory/*
NOT
You can specify that the URL must not match a pattern by adding "NOT" between the selector and the pattern.
For example:
path not /forum/*
Order of Rules
Botify applies segment rules in the order defined in the Segment Rules Editor and adds a page to the first segment the URL matches. This means the same rules in a different order may produce a different result because of how they are applied. You should identify the most specific rules first, followed by more generic rules.
For example:
To define a segment with URLs containing a tracking parameter (URL query string containing from=) for all page templates since you know these are duplicates.
For pages without tracking parameters, you want to separate product pages (such as http://www.mywebsite.com/catalog/product-description_item-123456.html) from navigation pages (category pages with lists of products, with URLs such as http://www.mywebsite.com/category/garden-furniture_C2-29/).
It is important to note that tracking parameters can be on product and navigation pages.
You can define the above segments with the following rules:
[segment:pagetype]
@duplicates
query *from=*
@products
path /catalog/*
@categories
path /category/*
However, if you do this:
[segment:pagetype]
@products
path /catalog/*
@duplicates
query *from=*
@categories
path /category/*
Then, the “duplicates” segment will not cover duplicate products (such as http://www.mywebsite.com/catalog/product-description_item-123456.html?from=promotion_home). These will be in the products segment because it is the first segment that matches the URL.
Here is another example:
If your website has product pages (http://www.mywebsite.com/catalog/product-description_item-123456.html) and review pages associated with products (http://www.mywebsite.com/catalog/product-description_item-123456_reviews-1.html, you may want to separate product reviews from actual product pages (those with the "Add to cart" link).
The following rules will separate the two and place reviews in a subsegment inside the Products segment:
[segment:pagetype]
@product/reviews
path /catalog/*
path *_reviews-*
@product
path /catalog/*
However, if you do this:
[segment:pagetype]
@product
path /catalog/*
@product/reviews
path /catalog/*
path *_reviews-*
All reviews will be in the Product segment, and the Product/reviews subsegment will be empty.
An alternative is to make the rules mutually exclusive:
[segment:pagetype]
@product
path /catalog/*
path not *_reviews-*
@product/reviews
path /catalog/*
path *_reviews-*
However, this approach rapidly results in much more complex rules and longer processing time.
Wildcards
Wildcards (*) can be used before and/or after an expression, not in the middle.
For instance, these are correct:
path /product/*
path *.html
query *productid=*
This is incorrect:
path /categories/*.html
You can specify the above with these two lines:
path /categories/*
path *.html
Regular Expressions
If wildcards (*) are not enough to define the pattern you need, you can use regular expressions (regex) instead by adding rx: before the string to match. For example, to specify that the URL path ends with -id followed by numbers and nothing else:
path rx:-id[\d]+$
👉 Regular expressions are limited to 500 characters, and CJK characters are supported.
Comments
Add comments to segment rules in lines that start with #.
Subsegments
You can define subsegments for any segment. To define a subsegment, use the parent segment’s name, followed by a forward slash (/) and the desired name of the subsegment.
The "parent" segment below has 2 subsegments: child1 and child2.
@parent/child1
<rules for child1>
@parent/child2
<rules for child2>
Subsegments help to provide visibility over all URLs in the parent segment, along with the distribution between the two children, and allow filtering directly on one of the children.
You can create a rule to place URLs in the parent itself without being in one of the child segments:
@parent
<rules for parent (URLs which belong to parent but not to child1 or child2)>
You can also define several levels of children. There could be:
@parent/child2/grandchildA
<rules for grandchildA>
In the following example, an e-commerce website with its own products and a marketplace defines subsegments to enable them to distinguish between the two:
@product/marketplace
path /marketplace-product/*
@product/direct
path /product/*
If your e-commerce website has a meta-products page that lists all offers for a given product, direct or marketplace, these could be placed in the parent without a subsegment:
@product
path /meta-product/*
Alternatively, you could choose to create a third child and leave the parent empty (no rule for the parent itself):
@product/meta
path /meta-product/*
Pagination and Facets
To easily filter listing pages by pagination, use the following flag segments, where p1 is the first listing page, and p+ is all other listing pages:
flag p1
flag p+
Similarly, use the facets flag to identify facet levels:
flag facets_0
flag facets_1
flag facets_2
flag facets_3
flag facets_4+
Warning Segments
When defining segments, you can flag them as a "warning" for URLs you want to pay special attention to, such as URL patterns that should not exist on your website. For example, warning segments can be used to track malformed URLs, URL patterns from a previous version of your website, or pages with a non-secure protocol if your website is secure.
In LogAnalyzer reports, specific graphs focus on "warning" segments, and this warning flag can also be used as a field filter.
An optional line can be added to specify that the segment is a "warning" segment (there is no other allowed value than "warning" for the flag line):
flag warning
This line can be anywhere in the segment rule:
@category/sort_parameter
query *sort=*
flag warning
This subsegment isolates and flags category pages with a sort parameter in their URL as a "warning" because they generate pseudo-duplicates of pages with the default sorting.
Part of the URL Must be Empty
To create a segment rule requiring a part of the URL to be empty, enter two single quotes '' for the value. For example, for URLs with no query string:
query ''
See also: