Documentation / Product / Features / Content Affinity Engine

Content Curation

Content curation on Lytics involves scanning your website and other content to ingest topics and build content affinities. Properly setting up the curation process is key to enabling use cases such as promoting relevant ads and delivering targeted web content.

This document gives an overview of important concepts and considerations to make while curating your content during the early stages of implementation. It will also help determine if any custom content curation should be planned for.


Once the Lytics JavaScript tag is installed, Lytics automatically begins crawling the content on your website using Natural Language Processing (NLP) engines. Put simply, this means that:

  • Lytics will index your content. Out of the box, content includes your web pages and images. By indexing it, Lytics end up with a list of all your content.

  • Lytics will crawl the content via NLP providers (such as Google NLP or Diffbot, and extract topics associated with that content.

Over time, as users interact with your content, Lytics identifies user content affinity levels for various topics. While this functionality begins working right away without you having to do anything, there are various things to consider to ensure Lytics is bringing in the content you’ll need to execute use cases.

What domains should Lytics scan?

By default, Lytics will scan all content on your domain as well as the referral page that the user came to your website from. If you wish to turn this off, please whitelist your domain by navigating to Account Settings > Content. Consider the following when deciding whether to have Lytics scan referral domains:

  • Pro: Track the topics your users are interested in prior to coming to your website to understand more about them.
  • Con: Topics completely unrelated to your brand may show up.

If there are only certain sections of your website Lytics should be scanning, you can customize using whitelisting/ blacklisting as well. For example, if you have a blog section and other pages won’t be relevant to gauge what users are interested in, you can whitelist the path /blog/.

Are there any paths that should be avoided?

A website may have sections that should not be scanned for topics such as password reset pages or any pages hidden behind a log-in (e.g. /password-reset/ or /admin/). These paths can be blacklisted in your Account Settings under Content.

Which NLP service should I be using?

Out of the box, Lytics uses Google NLP which pulls from their knowledge graph/ taxonomy. If you determine you're not getting enough topics from your content, Lytics can use Diffbot in addition to Google NLP, which has more loose associations between topics and content. You'll bring in more topics, but may be slightly surprised by what you see!

If your brand is international, you may need to consider which languages are supported by each NLP service. Please consult the provider documentation for a list of languages suppored by Google NLP and Diffbot. Another option is to turn NLP completely off and use only custom topics.

For more details on each service Lytics uses, see NLP services.


If you aren’t seeing the content you expected to, note it may take some time for Lytics to crawl all of your content. By default, monthly limits exist for scanning new content (described below). If Lytics scans all new content without having reached the limit, Lytics will move on to older content until it is all scanned. Please allow time for this.

Also, consider how far back should Lytics be scanning. For example, is it really necessary to scan content from 2 years ago? If people are no longer interacting with that content, use Account Settings to set a date to start the scan from.

Content Enrichment limits

Lytics will scan and enrich up to 20,000 URLs per month by default. This limit is designed to act as a guard rail to ensure good filter hygiene is in place. Most accounts do not publish close to 20,000 pieces of distinct content per month. If you believe your account is hitting this limit, please check with your Lytics Implementation team. Once confirmed, you can consider the following options:

  • Are there any domains or content paths that you can blacklist? This will likely be part of the solution. See the section above for more information.
  • Do you need Lytics to increase the limit?

Robot directives

Your domain likely has a robot directive (e.g. that provides instructions to crawlers on how to crawl or index your content. Lytics will follow these directives. While typically not an issue, it’s worth turning to your directives when troubleshooting any missing content or information.


You may want to build a collection of content based on publish date or author. Consider the following if this is the case:

If you have metadata on your website, are they OpenGraph? Open Graph tags will populate the following default values in Lytics: title, image, published_time, description, and lytics:topics. You can check the lytics_content query in your account to check if all the Open Graph tags you need are being picked up.

If you are not using Open Graph, Lytics may not be picking up any meta tags automatically. The quickest way to check that Lytics is bringing in what’s needed is to look at a webpage’s Source Code and explore the tags.

Alternatively, in the UI, you can check by navigating to Content > Collections. Try to build content collections by author or publish date as these are the most commonly used filters. If the content doesn’t come up as expected, you may need to curate your tags.

Custom topics

For many users outside of publishing who may not have rich content, NLP derived topics aren’t enough. To accommodate this, Lytics can add custom topics via the metadata. The easiest way to do this is by adding the lytics:topics meta tag. Read more on providing custom topics.

If you already have topics in your metadata using a different meta tag than the above, it’s possible Lytics may be able to bring those in as well by making a change to your account settings. Speak with your implementation team about this.

Once this setting is changed and content is rescanned, you will be able to build content collections with this topic, and users' affinities will be generated for these new topics.

Total number of topics

In the Lytics UI, you will see a max of 500 topics. Lytics keeps all of your topics, but only the top 500 are surfaced in the UI.

  • As you blacklist topics, Lytics will backfill to show 500.
  • You can whitelist topics to ensure that they make the top 500.

NOTE: If you choose to whitelist topics, make sure that the topics actually exist - either as custom topics or are being picked up by NLP. Case sensitivity is important when whitelisting. For example, whitelisting ABBA is different than whitelisting Abba.

Other content

If you have other content on your site outside of web pages or images (e.g. PDFs) that you’d like to derive topics from and have them generate affinities, Lytics will need to develop a plan to bring those in using our APIs. Read more on adding new documents.

Document properties

All topics - NLP derived or custom topics - will allow for two things:

  1. Building a content collection with the topics.
  2. Assign affinities to users for those topics.

There are instances where you may want to build collections based on a topic, but not have them generate affinities. For example, a collection of featured sale items, SKUs, genres, etc. You can set these as document properties in your metadata to allow for this. Read more on customizing document properties.