Convert Tetra Data to Lakehouse Tables

To automatically create open-format Lakehouse tables from your data that's coming into the Tetra Data Platform (TDP), create a Tetra Data Pipeline that uses the ids-to-lakehouse protocol. After you enable the pipeline, any data that meets the trigger conditions you set is automatically converted into Lakehouse tables.

To backfill historical data into Lakehouse tables, create a Bulk Pipeline Process Job that uses an ids-to-lakehouse pipeline and is scoped by the appropriate IDSs and historical time ranges.

What Happens Next

After your data is converted to Lakehouse tables, you can then transform and run SQL queries on the data in the Lakehouse by using any of the following methods:

You can also use Tetraflow pipelines to define and schedule data transformations in a familiar SQL language and generate custom, use case-specific Lakehouse tables that are optimized for downstream analytics applications.

To view Lakehouse tables after you create them, see Query SQL Tables in the TDP.

🚧

IMPORTANT

When querying data in your Lakehouse tables, make sure that you apply Lakehouse data access best practices. Applying these patterns ensures that you're using the latest records in downstream datasets and retrieving the most current data in SQL query results when either running SQL queries on the new tables or setting up a Tetraflow pipeline.

Create an ids-to-lakehouse Pipeline

To create an ids-to-lakehouse pipeline, do the following:

  1. Sign in to the Tetra Data Platform (TDP).

  2. On the Pipeline Manager page, create a new pipeline and define its trigger conditions by selecting the IDS data that you want to transform to Lakehouse tables. You can do either of the following:

    • To convert all of your IDS data to Lakehouse tables, select the IDS trigger type only.

      - or -

    • To convert IDS data from a specific source, select the the IDS trigger and a Source Type trigger.

  1. Choose Next.
  2. In the Select Protocol section, select the ids-to-lakehouse protocol. Don't change the protocol's default Configuration settings.

  1. Choose Next.
  2. Set email notifications.
  3. Finalize the details you need to finish creating your pipeline.
  4. Choose Create.

📘

NOTE

If the ENABLED toggle is set to active when you choose Create, then the pipeline will run as soon as the configured trigger conditions are met.

For more information about configuring pipelines, see Set Up a Pipeline.

Backfill Historical Data Into Lakehouse Tables

To backfill historical data into Lakehouse tables, create a Bulk Pipeline Process Job that uses an ids-to-delta pipeline and is scoped in the Date Range field by the appropriate IDSs and historical time ranges.