URL Enrichment
To better understand how users are engaging with content, Lytics first needs to understand that content itself. One way Lytics does this is by analyzing the URLs that are passed to Lytics to determine the topics that best describe the URL.
Lytics already knows which URLs a user has engaged with. By associating topics with URLs, Lytics is also able to understand which topics a user has engaged with. In doing so, the Lytics Content Affinity Engine can find relevant content for users, as well as find relevant users for content.
How Data Comes into Lytics
When Lytics receives data about actions taken by a customer, it is called an event. Each event has fields that store pieces of information describing the event.
An example of an event is a web page view. After the Lytics JavaScript tag is added to a web page, any time that web page is viewed, an event is sent to Lytics. A web page view event includes fields that describe when the event occurred, which browser was used, and the URL that was viewed. The event tells Lytics that a specific person viewed a specific web page at a specific time. The event does not include any details on what information was displayed on the web page.
Events come into Lytics on a data stream. The data stream is just a label assigned to the event that identifies where the data came from. Lytics applies some processing logic to all events, regardless of the data stream the event came in on. Other processing logic is only applied to events that come in on certain data streams.
Events with URLs
When Lytics receives an event with a URL in it - specifically when an event with a field named url
comes in on any data stream - Lytics determines whether the URL is new or not. A new URL is one that Lytics has not previously handled.
Lytics then creates a new event and writes that event to the data stream lytics_content_enrich
, called the content enrichment stream. An LQL query named lytics_content
handles events written to the content enrichment stream. This results in a new entity being created in the content table.
Lytics listens for events with new URLs on the content enrichment stream. When a new event is available, Lytics runs the URL enrichment process.
URL Enrichment Process
Data enrichment is a common practice in Lytics. It refers to the ability to add data onto inbound data to improve its quality. This process is also used in user profile enrichment.
Enrichment is handled by components called enrichers. Each enricher performs a specific task. A common task for an enricher is to associate topics with a URL, but there are other tasks that enrichers can perform.
Whatever its specific purpose, the result of an enricher running is that additional data may be added to the inbound data (event). After the enrichers run, another new event is written to the content enrichment process.
This time, the new event is not enriched because the URL is not new. But the event includes all of the data that was previously added during the enrichment process, so when the query lytics_content
runs, it is able to map that new data to the corresponding entity in the content table.
Enrichers
The specific enrichers that Lytics uses depends on how your account is configured. The account setting enrich_content_sources
controls which enrichers are used. Your Lytics representative can help you change the enrichers that are enabled on your account.
One enricher that is always used is the meta enricher.
Meta Enricher
The meta-enrichment process begins with Lytics sending a request for the URL. The response allows Lytics to collect some information to improve the efficiency of the overall enrichment process. Examples of information collected are:
- Status code - This is data returned from the web server that handled the request. It tells Lytics whether the URL is valid and accessible on the server. This is important because Lytics is able to generate content recommendations, and you don't want Lytics to include URLs that will result in a 404 or other errors.
- Meta tags - Lytics can read data from certain meta tags to associate topics with a URL. This logic runs during the meta-enrichment process.
- Canonical URL - The content on a web page may be accessible using multiple URLs. For example, a product online may appear in multiple categories. The canonical URL is used to associate the multiple product pages with one another. This is an important value to ensure Lytics doesn't process the same content multiple times, just because the URL is different.
URL Normalization
As the Content Affinity Engine ingests web-based content, it attempts to resolve duplicate URLs and create links between documents, much like a search engine would. As such, the Content Affinity Engine does things like respect robots.txt
directives, resolve canonical URLs when present, etc.
Lytics attempts to sanitize URLs as much as possible before ingesting them into the Content Affinity Engine. Sanitization includes removing all URL parameters and cleaning URL syntax. This happens via an LQL function called urlmain
. If any of your web properties implement routing through URL parameters — like domain.com?pageId=123
— you'll need to update your queries to instead use urlminusqs
.