Convert Tetra Data to Lakehouse Tables

To automatically create open-format Lakehouse tables (Delta Tables) from your Tetra Data, create a Tetra Data Pipeline that uses the ids-to-delta protocol. After you enable the pipeline, any Intermediate Data Schema (IDS) data that meets the trigger conditions you set is automatically converted into Lakehouse tables. You can then transform and run SQL queries on the data in the Lakehouse by using any of the following methods:

After your data is converted to Lakehouse tables, you can also use Tetraflow pipelines to define and schedule data transformations in a familiar SQL language and generate custom, use case-specific Lakehouse tables that are optimized for downstream analytics applications.

To view Lakehouse tables after you create them, see Query SQL Tables in the TDP.

To backfill historical data into Lakehouse tables, please contact your customer success manager (CSM).

📘

NOTE

The Data Lakehouse Architecture is available to all customers as part of an early adopter program (EAP) and will continue to be updated in future TDP releases. If you are interested in participating in the early adopter program, please contact your customer success manager (CSM).

Create an ids-to-delta Pipeline

To create an ids-to-delta pipeline, do the following:

  1. Sign in to the Tetra Data Platform (TDP).

  2. On the Pipeline Manager page, create a new pipeline and define its trigger conditions by selecting the IDS data that you want to transform to Lakehouse tables. You can do either of the following:

    • To convert all of your IDS data to Lakehouse tables, select the IDS trigger type only.

      - or -

    • To convert IDS data from a specific source, select the the IDS trigger and a Source Type trigger.

  3. Choose Next.

  4. In the Select Protocol section of the Pipeline Manager page, select the ids-to-delta protocol. Don't change the protocol's default Configuration settings.

  5. Choose Next.

  6. Set email notifications.

  7. Finalize the details you need to finish creating your pipeline.

  8. Choose Create.

📘

NOTE

If the ENABLED toggle is set to active when you choose Create, then the pipeline will run as soon as the configured trigger conditions are met.

For more information, see Set Up a Pipeline.

🚧

IMPORTANT

When querying data in your Lakehouse tables, make sure that you apply Lakehouse data access best practices. Applying these patterns ensures that you're using the latest records in downstream datasets and retrieving the most current data in SQL query results when either running SQL queries on the new tables or setting up a Tetraflow pipeline.

Backfill Historical Data Into Lakehouse Tables

To backfill historical data into Lakehouse tables, contact your CSM or account manager. They'll help you backfill your historical Tetra Data into two types of Lakehouse tables (Delta Tables):

  • IDS Lakehouse tables, which are the new nested-format Delta Tables. To access the new IDS Lakehouse tables, customers will need to update their existing queries to align with the new table structure eventually.
  • Normalized IDS Lakehouse tables, which transform IDS Lakehouse tables from the new, nested Delta Table structure to a normalized structure that mirrors the legacy Amazon Athena table structure in the Data Lakehouse. They reduce the need for rewriting existing queries and make it easier to point downstream analytics applications to Tetra Data in the Lakehouse.

To create these tables, your CSM or account manager will help you create Bulk Pipeline Process Jobs that use an ids-to-delta pipeline and are scoped by the appropriate IDSs and historical time ranges.

🚧

IMPORTANT

In TDP v4.1.0, the ids-to-delta pipeline runs in "append" mode only, which means new data is added to the tables without overwriting the previous data. This behavior creates multiple copies of each row (record). It also means that if an IDS is updated, then the newest version of the IDS will be appended to the Delta Table, not updated. Downstream systems using the IDS should deduplicate the IDS records by applying the strategy described in Data Deduplication .