Documentation / Product / Features / Data Onboarding and Management

Custom Data Ingestion

Getting your custom data ready to be properly ingested is crucial during your onboarding process with Lytics. This document gives an overview of the formatting requirements for custom data sources sent to Lytics via batch CSV or JSON, utilizing S3 or SFTP import workflows, as well as batch or real-time imports using our collection APIs.

File naming

When you have a recurring bulk import, from S3 or SFTP for example, you must consistently follow naming conventions to ensure your data is ingested and displayed correctly. Files may be compressed in a Zip format before ingestion if need be.

  • Keep file naming consistent by determining casing and spacing.
    • E.g. all lower-case, use underscores for spaces: file_source_1_date.
  • Name each successive file with an identical 'root' along with a YYYYMMDD suffix such as filetitle-20191119.csv

Field formatting

  • Phone numbers should be standardized. Lytics suggests normalizing phone numbers in a format such as 12223334444.
  • Omit double quotes or escape quotes.
  • Omit newlines.
  • Keep all free form text in quotes if possible.
  • Avoid page breaks or special characters.

Headers

  • Keep headers consistent across your organization and your vendors by determining casing and spacing.
  • Column headers have to match the sample file exactly.
  • When adding a new source, review current mappings and headers to determine if any headers need to be mapped or consolidated into the same field in Lytics.
    • E.g. if the field mobile comes in from source A, and the field cell comes in from source B, it’s likely these should be mapped to the same Lytics field, which will require an LQL modification.

Timestamps

Lytics is able to ingest data in a different order than the events described in that data transpired. For this to work, individual JSON or CSV records must have a timestamp associated with them. For workflow-based imports, select the key/column which contains the event timestamp when import workflows are configured. For Collection API imports, use the timestamp field URL parameter.

NOTE: All imports require the following format: YYYY-MM-DDTHH:MM:S. If an explicit timestamp is not specified, the data will be timestamped by Lytics upon ingestion.

When Lytics looks for new files, it will choose first based on the file's last modified date. If multiple files have the same last modified date, the date stamp in the file name is used to select the next file to import.

JSON formatting

JSON file formatting varies slightly depending on the method and nature of the data to import. See examples for bulk and real-time imports below.

Bulk

Files imported via S3, SFTP, or bulk collection should be newline delimited, meaning each object represents a single record/event, and there is a newline separating them.

{"event":"register","date":"2014/04/05"}
{"event":"login","date":"2014/04/05"}

Real-time

Files sent to collection should be formatted as regular JSON, where each record/event is an object in an array of objects:

[
  {"event":"register","date":"2014/04/05"},
  {"event":"login","date":"2014/04/05"}
]

NOTE: Lytics has a limited ability to parse nested data. Objects may contain other objects and arrays, but objects may not contain nested arrays of objects therein.