Understanding Lytics / Product Documentation / Content Affinity Engine

Manually Assigning Topics

In most cases topic extraction automatically assigns the expected topic to your content. If, however, you find that to not be the case or you would like to expand the topics assigned to content Lytics allows you to manually assign topics to your content.

Content is stored in an entity called a document. Each document is a collection of fields each storing a specific piece of information about that content. Each document may have multiple fields that are used to store the topics for that particular entity. The process of manually assigning topics involves updating one of those fields.

Viewing Topics Assigned to a Document

Each document is assigned a URL as a unique identifier. You can use the Lytics Content API to retrieve a document and view the topics assigned to it.

# Get the information about the URL for 
# the Lytics website home page
curl -s -XGET 'https://api.lytics.io/api/content/doc?urls=www.lytics.com' \
    -H "Authorization: $LIOKEY"

This will return a JSON object of the requested document:

{
  "data": {
    "total": 1,
    "urls": [{
      "url": "www.lytics.com",
      "https": false,
      "title": "",
      "description": "",
      "topics": ["CDP", "Customer Data"],
      "topic_relevances": {
        "CDP": 1,
        "Customer Data": 1
      },
      "primary_image": "",
      "author": "",
      "created": "2018-10-24T23:10:06Z",
      "id": "-7169839995045099096",
      "stream": "",
      "updated": "2018-10-24T23:14:09Z",
      "fetched": "2018-10-24T23:14:09Z"
    }]
  },
  "message": "success",
  "status": 200
}

You can see the topics assigned to the requested content and the relevancy range of those topics from 0 to 1.

"topics": ["CDP", "Customer Data"],
"topic_relevances": {
        "CDP": 1,
        "Customer Data": 1
 },

Assigning Topics Manually

Manually assigning topics can be done in several ways:

  1. Data import
  2. Lytics Content Corpus API
  3. Lytics Topic Curation API

Data Import

The data (in either CSV or JSON format) can be sent to Lytics using any of the methods available for importing data, including CSV file or JSON file integrations and the Lytics Bulk Upload API. Just be sure that you send the data to the correct data stream: lytics_content_enrich. The uploaded data must be formatted in the following ways:

CSV

url,topic_Portland,topic_Oregon
https://www.lytics.com,1,.96

JSON

{
    "url": "https://www.lytics.com",
    "topic_Portland": 1,
    "topic_Oregon": .96
}

When data is sent to the data stream lytics_content_enrich, the LQL function urlmain is applied to the value. You can see this in the query lytics_content. The result is that :// and everything before it is removed.

In other words, uploading the following data will give you the same result as the data above. This is important to understand because if you ever need to find a url, you should exclude :// and the protocol before it:

CSV

url,topic_Portland,topic_Oregon
www.lytics.com,1,.96

JSON

{
    "url": "www.lytics.com",
    "topic_Portland": 1,
    "topic_Oregon": .96
}

Content Corpus API

The content corpus endpoint can be used to associate topics with a URL. The corpus API does not allow you to specify the relevance. Topics will be assigned a relevance 1.

The following command demonstrates how to use this API to set topics on content:

curl -s -XPOST "https://api.lytics.io/api/content/corpus" \
  -H "Authorization: $LIOKEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url":"www.lytics.com",
    "topics":["Portland", "Oregon"]
}'

Topic Curation API

The topic curation endpoint can be used to add topics to content. However, this approach is a bit more complicated because you must know an identifier for the content you want to add new topics to.

By default, the following fields are identifiers on content table:

  • contentid - This value only applies to email content.
  • fbid - This value only applies to Facebook content.
  • hashedurl - Lytics does not use the URL for the content is not an identifier. The URL is a string value that can be quite long. For performance reasons, it is better to create a hash of the URL and use that hashed value as the identifier. A hash is just a way of converting a string into a number.

How to Generate a Hash for a URL

There are many hash functions available, but Lytics uses a specific one when it hashes URLs: sip hash.

The following command demonstrates how to use the Lytics query test evaluation endpoint to generate a sip hash for a URL. In this example, the value that is used is https://www.lytics.com

curl --request POST \
  --url 'https://api.lytics.io/api/query/_test?value=what%20you%20want%20to%20hash' \
  --header 'authorization: $LIOKEY' \
  --header 'content-type: test/text/plain' \
  --data 'SELECT hash.sip(`value`) AS hashed FROM test INTO test BY hashed ALIAS test'

The result of this command will be something like the following. The value of the field hashed is the hashed value

{
  "data": {
    "_created": "2018-11-05T22:00:15.688307117Z",
    "_modified": "2018-11-05T22:00:15.688307117Z",
    "hashed": "7394646926640356587"
  },
  "message": "success",
  "status": 200
}

If you are using the Visual Studio Code Extension for Lytics, there is a command that you can use to generate a sip hash without having to write any API calls.

Setting Topics Using Hashed URL

Above you determined the sip hash for https://www.lytics.com is 7394646926640356587. The following command will associate the topic CDP with a relevance of 1.0 with this hashed URL:

curl --request POST \
  --url 'https://api.lytics.io/api/content/doc/hashedurl/7394646926640356587/topic/CDP?relevance=1' \
  --header 'authorization: $LIOKEY' \
  --header 'content-type: application/json'

Removing Topics Manually

When a topic is associated with a document, a new field is created on the entity. The field stores a value from zero (no relevance) to one (highest relevance).

In Lytics, you cannot delete fields from documents. So, technically, there is no way to remove a topic from being associated with a content entity. Instead, what you do is set the relevance to zero. Since zero indicates no relevance, it effectively removes the topic from the document.

Removing a topic is not the same as blacklisting a topic. Blacklisting a topic acknowledges that a topic may be relevant but is too generic to be useful. For example, at Lytics we blacklist the topic "data" because that topic is relevant on almost all of our content, and for that reason it is not useful at all.

Topics can be removed from content using one of the following approaches.

Data Import

As described above, CSV or JSON data can be sent to Lytics. The following examples demonstrate how to remove a topic from content by setting the relevance for the topic to zero:

CSV

url,topic_Portland
https://www.lytics.com,0

JSON

{
    "url": "https://www.lytics.com",
    "topic_Portland": 0
}

Topic Curation API

The topic remove endpoint allows you to remove a topic associated with content.

This API sets the relevance for the topics to zero. It does not actually delete any the topic from the content.

Above you determined the sip hash for https://www.lytics.com is 7394646926640356587. The following command will remove the topic CDP from this hashed URL:

curl --request DELETE \
  --url 'https://api.lytics.io/api/content/doc/hashedurl/7394646926640356587/topic/CDP' \
  --header 'authorization: $LIOKEY' \
  --header 'content-type: application/json'