Convert Tetra Data to Lakehouse Tables

Suggest Edits

To automatically create open-format Lakehouse tables from your data that's coming into the Tetra Data Platform (TDP), create a Tetra Data Pipeline that uses the ids-to-lakehouse protocol. After you enable the pipeline, any data that meets the trigger conditions you set is automatically converted into Lakehouse tables.

🚧
IMPORTANT
When configuring ids-to-lakehouse pipelines, make sure that you select the Run on Deleted Files checkbox on the pipeline configuration page. This setting ensures that any upstream file delete events are accurately propagated to the downstream Lakehouse tables.

To backfill historical data into Lakehouse tables, create a Bulk Pipeline Process Job that uses an ids-to-lakehouse pipeline and is scoped by the appropriate IDSs and historical time ranges.

What Happens Next

After your data is converted to Lakehouse tables, you can then transform and run SQL queries on the data in the Lakehouse by using any of the following methods:

You can also use Tetraflow pipelines to define and schedule data transformations in a familiar SQL language and generate custom, use case-specific Lakehouse tables that are optimized for downstream analytics applications.

To view Lakehouse tables after you create them, see Query SQL Tables in the TDP. For best practices on querying data in the Lakehouse, see Query Lakehouse Tables.

Create an `ids-to-lakehouse` Pipeline

To create an ids-to-lakehouse pipeline, do the following:

Sign in to the Tetra Data Platform (TDP).
On the Pipeline Manager page, create a new pipeline and define its trigger conditions by selecting the IDS data that you want to transform to Lakehouse tables. You can do either of the following:
- To convert all of your IDS data to Lakehouse tables, select the IDS trigger type only.
  
  - or -
- To convert IDS data from a specific source, select the the IDS trigger and an IDS Type trigger.

Click on Select Protocol.
In the Select Protocol section, select the ids-to-lakehouse protocol. Don't change the protocol's default Configuration settings.
Select the RUN ON DELETED FILES checkbox. This setting ensures that any upstream file delete events are accurately propagated to the downstream Lakehouse tables.

(Optional) Set email notifications for successful and failed pipeline executions.
Choose Save Changes.

📘
NOTE
If the Pipeline Enabled toggle is set to active when you choose Save Changes, then the pipeline will run as soon as the configured trigger conditions are met.

For more information about configuring pipelines, see Set Up and Edit a Pipeline.

Backfill Historical Data Into Lakehouse Tables

To backfill historical data into Lakehouse tables, create a Bulk Pipeline Process Job that uses an ids-to-lakehouse pipeline and is scoped in the Date Range field by the appropriate IDSs and historical time ranges.

Documentation Feedback

Do you have questions about our documentation or suggestions for how we can improve it? Start a discussion in TetraConnect Hub. For access, see Access the TetraConnect Hub.

📘
NOTE
Feedback isn't part of the official TetraScience product documentation. TetraScience doesn't warrant or make any guarantees about the feedback provided, including its accuracy, relevance, or reliability. All feedback is subject to the terms set forth in the TetraConnect Hub Community Guidelines.

Updated 21 days ago

Convert Tetra Data to Lakehouse Tables

🚧
IMPORTANT

What Happens Next

Create an `ids-to-lakehouse` Pipeline

📘
NOTE

Backfill Historical Data Into Lakehouse Tables

Documentation Feedback

📘
NOTE

🚧IMPORTANT

What Happens Next

Create an ids-to-lakehouse Pipeline

📘NOTE

Backfill Historical Data Into Lakehouse Tables

Documentation Feedback

📘NOTE

🚧
IMPORTANT

Create an `ids-to-lakehouse` Pipeline

📘
NOTE

📘
NOTE