Convert Tetra Data to Lakehouse Tables
To automatically create open-format Lakehouse tables (Delta Tables) from your Tetra Data, create a Tetra Data Pipeline that uses the ids-to-delta
protocol. After you enable the pipeline, any Intermediate Data Schema (IDS) data that meets the trigger conditions you set is automatically converted into Lakehouse tables. You can then transform and run SQL queries on the data in the Lakehouse by using any of the following methods:
- A third-party tool and a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) driver endpoint
- The TDP user interface
- Databricks Delta Sharing
- Snowflake Data Sharing
After your data is converted to Lakehouse tables, you can also use Tetraflow pipelines to define and schedule data transformations in a familiar SQL language and generate custom, use case-specific Lakehouse tables that are optimized for downstream analytics applications.
To view Lakehouse tables after you create them, see Query SQL Tables in the TDP.
To backfill historical data into Lakehouse tables, please contact your customer success manager (CSM).
NOTE
The Data Lakehouse Architecture is available to all customers as part of an early adopter program (EAP) and will continue to be updated in future TDP releases. If you are interested in participating in the early adopter program, please contact your customer success manager (CSM).
Create an ids-to-delta
Pipeline
ids-to-delta
PipelineTo create an ids-to-delta
pipeline, do the following:
-
On the Pipeline Manager page, create a new pipeline and define its trigger conditions by selecting the IDS data that you want to transform to Lakehouse tables. You can do either of the following:
-
To convert all of your IDS data to Lakehouse tables, select the IDS trigger type only.
- or -
-
To convert IDS data from a specific source, select the the IDS trigger and a Source Type trigger.
-
-
Choose Next.
-
In the Select Protocol section of the Pipeline Manager page, select the
ids-to-delta
protocol. Don't change the protocol's default Configuration settings. -
Choose Next.
-
Finalize the details you need to finish creating your pipeline.
-
Choose Create.
NOTE
If the ENABLED toggle is set to active when you choose Create, then the pipeline will run as soon as the configured trigger conditions are met.
For more information, see Set Up a Pipeline.
IMPORTANT
When querying data in your Lakehouse tables, make sure that you apply Lakehouse data access best practices. Applying these patterns ensures that you're using the latest records in downstream datasets and retrieving the most current data in SQL query results when either running SQL queries on the new tables or setting up a Tetraflow pipeline.
Backfill Historical Data Into Lakehouse Tables
To backfill historical data into Lakehouse tables, contact your CSM or account manager. They'll help you backfill your historical Tetra Data into two types of Lakehouse tables (Delta Tables):
- IDS Lakehouse tables, which are the new nested-format Delta Tables. To access the new IDS Lakehouse tables, customers will need to update their existing queries to align with the new table structure eventually.
- Normalized IDS Lakehouse tables, which transform IDS Lakehouse tables from the new, nested Delta Table structure to a normalized structure that mirrors the legacy Amazon Athena table structure in the Data Lakehouse. They reduce the need for rewriting existing queries and make it easier to point downstream analytics applications to Tetra Data in the Lakehouse.
To create these tables, your CSM or account manager will help you create Bulk Pipeline Process Jobs that use an ids-to-delta
pipeline and are scoped by the appropriate IDSs and historical time ranges.
IMPORTANT
In TDP v4.1.0, the
ids-to-delta
pipeline runs in"append"
mode only, which means new data is added to the tables without overwriting the previous data. This behavior creates multiple copies of each row (record). It also means that if an IDS is updated, then the newest version of the IDS will be appended to the Delta Table, not updated. Downstream systems using the IDS should deduplicate the IDS records by applying the strategy described in Data Deduplication .
Updated 28 days ago