Convert Tetra Data to Lakehouse Tables
To automatically create open-format Lakehouse tables from your data that's coming into the Tetra Data Platform (TDP), create a Tetra Data Pipeline that uses the ids-to-lakehouse
protocol. After you enable the pipeline, any data that meets the trigger conditions you set is automatically converted into Lakehouse tables.
IMPORTANT
When configuring
ids-to-lakehouse
pipelines, make sure that you select the Run on Deleted Files checkbox on the pipeline configuration page. This setting ensures that any upstream file delete events are accurately propagated to the downstream Lakehouse tables.
To backfill historical data into Lakehouse tables, create a Bulk Pipeline Process Job that uses an ids-to-lakehouse
pipeline and is scoped by the appropriate IDSs and historical time ranges.
What Happens Next
After your data is converted to Lakehouse tables, you can then transform and run SQL queries on the data in the Lakehouse by using any of the following methods:
- A third-party tool and a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) driver endpoint
- The TDP user interface
- Databricks Delta Sharing
- Snowflake Data Sharing
You can also use Tetraflow pipelines to define and schedule data transformations in a familiar SQL language and generate custom, use case-specific Lakehouse tables that are optimized for downstream analytics applications.
To view Lakehouse tables after you create them, see Query SQL Tables in the TDP. For best practices on querying data in the Lakehouse, see Query Lakehouse Tables.
Create an ids-to-lakehouse
Pipeline
ids-to-lakehouse
PipelineTo create an ids-to-lakehouse
pipeline, do the following:
-
On the Pipeline Manager page, create a new pipeline and define its trigger conditions by selecting the IDS data that you want to transform to Lakehouse tables. You can do either of the following:
-
To convert all of your IDS data to Lakehouse tables, select the IDS trigger type only.
- or -
-
To convert IDS data from a specific source, select the the IDS trigger and an IDS Type trigger.
-
- Click on Select Protocol.
- In the Select Protocol section, select the
ids-to-lakehouse
protocol. Don't change the protocol's default Configuration settings. - Select the RUN ON DELETED FILES checkbox. This setting ensures that any upstream file delete events are accurately propagated to the downstream Lakehouse tables.
- (Optional) Set email notifications for successful and failed pipeline executions.
- Choose Save Changes.
NOTE
If the Pipeline Enabled toggle is set to active when you choose Save Changes, then the pipeline will run as soon as the configured trigger conditions are met.
For more information about configuring pipelines, see Set Up and Edit a Pipeline.
Backfill Historical Data Into Lakehouse Tables
To backfill historical data into Lakehouse tables, create a Bulk Pipeline Process Job that uses an ids-to-lakehouse
pipeline and is scoped in the Date Range field by the appropriate IDSs and historical time ranges.
Updated 17 days ago