Custom Data Ingestion
Getting your custom data ready to be properly ingested is crucial during your onboarding process with Lytics. This document gives an overview of the formatting requirements for custom data sources sent to Lytics via batch CSV or JSON, utilizing S3 or SFTP import workflows, as well as batch or real-time imports using our collection APIs.
File naming
When you have a recurring bulk import, from S3 or SFTP for example, you must consistently follow naming conventions to ensure your data is ingested and displayed correctly.
- Keep file naming consistent by determining casing and spacing.
- E.g. all lower-case, use underscores for spaces:
file_source_1_date
.
- E.g. all lower-case, use underscores for spaces:
- Name each successive file with an identical 'root' along with a
YYYYMMDD
suffix such asfiletitle-20191119.csv
File compression
If needed, files may be compressed using the zip format prior to ingestion. The zip file will be decompressed and deleted after the content has been ingestied.
Field formatting
- Phone numbers should be standardized. Lytics suggests normalizing phone numbers in a format such as
12223334444
. - Omit double quotes or escape quotes.
- Omit newlines.
- Keep all free form text in quotes if possible.
- Avoid page breaks or special characters.
For more, reference the basic rules for CSV files.
Headers
- Keep headers consistent across your organization and your vendors by determining casing and spacing.
- Column headers have to match the sample file exactly.
- When adding a new source, review current mappings and headers to determine if any headers need to be mapped or consolidated into the same field in Lytics.
- E.g. if the field
mobile
comes in from source A, and the fieldcell
comes in from source B, it’s likely these should be mapped to the same Lytics field, which will require an LQL modification.
- E.g. if the field
Timestamps
Lytics is able to ingest data in a different order than the events described in that data transpired. For this to work, individual JSON or CSV records must have a timestamp associated with them. For workflow-based imports, select the key/column which contains the event timestamp when import workflows are configured. For Collection API imports, use the timestamp field URL parameter.
When Lytics looks for new files, it will choose first based on the file's last modified date. If multiple files have the same last modified date, the date stamp in the file name is used to select the next file to import.
JSON formatting
JSON file formatting varies slightly depending on the method and nature of the data to import. See examples for bulk and real-time imports below.
Bulk
Files imported via S3, SFTP, or bulk collection should be newline delimited, meaning each object represents a single record/event, and there is a newline separating them.
{"event":"register","date":"2014/04/05"}
{"event":"login","date":"2014/04/05"}
Real-time
Files sent to collection should be formatted as regular JSON, where each record/event is an object in an array of objects:
[
{"event":"register","date":"2014/04/05"},
{"event":"login","date":"2014/04/05"}
]