Documentation / Product / Integrations / Databricks

Databricks: Import Table

Import user data directly from your Databricks database table into Lytics, resulting in new user profiles or updates to fields on existing profiles.

Integration Details

  • Implementation Type: Server-side.
  • Implementation Technique: Databricks database connection.
  • Frequency: One-time or scheduled batch (can be hourly, daily, weekly, or monthly depending on configuration).
  • Resulting data: Raw events, user profiles and user fields.

This integration uses Databricks gosql driver to establish connection with Databricks database and imports data by querying the table selected during configuration. Once started, the job will:

  1. Creates a temporary view containing a snapshot of the database table to query against.
    • Adds a column for consistent ordering of data.
    • Only rows that have a timestamp after the last import, or Since Date will be included.
  2. Query a batch of rows from the temporary table.
  3. Emits rows to the data stream.
  4. Repeat steps 2 and 3 until the entire temporary view is read.
  5. Once all the rows are imported, if the job is configured to run continuously, it will sleep until the next run. The time between runs can be selected during configuration.

Fields

Fields imported through Databricks will require custom data mapping via LQL. For assistance mapping your custom data to Lytics user fields, please contact Lytics Support.

Configuration

Follow these steps to set up and configure a Databricks import table job in the Lytics platform. If you are new to creating jobs in Lytics, see the Jobs Dashboard documentation for more information.

  1. Select Databricks from the list of providers.
  2. Select the Import Table job type from the list.
  3. Select the Authorization you would like to use or create a new one.
  4. Enter a Label to identify this job you are creating in Lytics.
  5. (Optional) Enter a Description for further context on your job.
  6. Complete the configuration steps for your job. databricks-import
  1. Using the Database dropdown menu, select the database you would like to import data from.
  2. Using the Table dropdown menu, select the table you want to import data from.
  3. From the Timestamp Column input, select the timestamp column to order the events. This column is also used to determine updated rows when running continuously.
  4. (Optional) From the Record Timestamp Column input, select the timestamp column to use as event timestamps, if left blank the Timestamp Column will be used.
  5. (Optional) In the Since Timestamp text box, enter the earliest timestamp to import records from; only records with a Timestamp Column after this date will be imported. Use yyyy-mm-dd HH:MM:SS UTC format.
  6. (Optional) From the Stream input, select or type the data stream name you want to add data to. If the data stream does not exist, it will be created. If left blank the data will go to the databricks_{TABLE_NAME} stream.
  7. (Optional) Select the Keep Updated checkbox to repeatedly run this import on a schedule.
  8. (Optional) From the Import Frequency input, choose how often a repeated import should run.
  9. (Optional) From the Time of Day input, select time of day to start import.
  10. (Optional) From the Timezone input, select timezone for time of day.
  11. (Optional) In the Query Timeout numeric field, enter the maximum time a query is allowed to run in minutes.
  12. Click Start Import.