Use Snowflake to Access Tetra Data
IMPORTANT
The ability to access data through Snowflake is available through an early adopter program (EAP) currently and is activated for customers through coordination with TetraScience. For more information, or to activate this feature in your TDP environment, contact your customer success manager (CSM).
The Tetra Data Platform (TDP) provides the ability to access your Tetra Data directly through Snowflake, in addition to the TDP SQL interface.
Architecture
The following diagram shows an example workflow for accessing Tetra Data through Snowflake.
The diagram shows the following process:
- Tetra components connect to the Amazon Simple Storage Service (Amazon S3) bucket that stores Intermediate Data Schemas (IDSs) and their associated Amazon Athena tables. Tetra Data that’s stored in the Tetra Scientific Data Cloud is exposed to a Snowflake database in a External Staging Account that TetraScience manages.
- A Snowflake database Direct Share securely shares data to a consumer Snowflake database in an External Sharing Account that's managed by the customer. One schema for each IDS version is shared along with a metadata schema. Each schema has its own queryable data views.
- Users can then access their organization’s Tetra Data directly through Snowflake, in addition to the TDP SQL interface.
Prerequisites
To access Tetra Data through Snowflake, a Business Critical Edition Snowflake account is required in the same AWS Region as your TDP deployment.
Configure Snowflake to Access Tetra Data
To set up access to your Tetra Data through Snowflake, do the following.
Share Your Snowflake Account Information with TetraScience
TetraScience runs the automations required to allow Snowflake to access Tetra Data in your environment. To coordinate the deployment, contact your CSM and provide them with the following information about your Snowflake environment:
- Snowflake Organization ID
- Snowflake External Sharing Account ID
- Snowflake External Sharing Account AWS Region
Accept the Direct Share
To view Tetra Data in Snowflake after the integration is activated, you must explicitly accept the Direct Share from the integration by doing the following:
- Sign in to Snowsight with your Snowflake External Sharing Account as a user with ACCOUNTADMIN permissions.
- Select Data. Then, choose Private Sharing.
- Choose Shared With You.
- In the Direct Shares section, find the following Direct Share:
TDP_EXT_VIEW_SHARE_[ORG-SLUG]
- Next to the name of the Direct Share, select the blue button. A Get Data dialog appears.
- In the Get Data dialog, for Database name, enter a short name for the shared database (for example, TETRA_SHARE). This is the database that’s used to query IDS data.
- For Which roles, in addition to ACCOUNTADMIN, can access this database?, select any additional roles in your Snowflake account that you want to grant database access to.
- Select the Get Data button. You can now query Tetra Data in Snowflake by using the shared database.
Access Tetra Data in Snowflake
You can access your Tetra Data in Snowflake after the configuration is set up by using any of the available Snowflake database connection options.
For a complete list of connection options, see Connect to Snowflake in the Snowflake Documentation.
Schema Types
Two schemas are created in Snowflake for each IDS version:
- IDS schemas: Contain queryable views of each IDS version’s data that map 1:1 to the existing SQL tables in the TDP through Amazon Athena (available views differ between each IDS)
NOTE
IDS schema names in Snowflake use the following format:
<CLIENT | COMMON | PRIVATE>_<IDS-NAME>_<IDS-MAJOR-VERSION>_DATA
- Metadata schemas: Contain queryable views of the TPD metadata (
METADATA_FILE_INFO
) and attributes (METADATA_FILE_ATTRIBUTE
) associated with each file in the TDP across all file categories
Schema Structure
In the TDP, multiple SQL tables are created for each IDS that you're using, based on the sections (partitions) within each IDS. Knowing how these SQL tables are structured and how they’re converted into views in Snowflake can help you find specific data when running SQL queries.
For more information about how IDS files are converted into SQL tables, see TDP Athena SQL Table Structure. For information about how TDP Athena tables are converted into views in Snowflake, see Differences Between IDS Schema Views and TDP SQL Tables.
Differences Between Schema Views and TDP SQL Tables
To make TDP SQL tables queryable data views in Snowflake, the TDP creates and shares one schema per IDS version along with a metadata schema. Each of these schema types has its own queryable data views that align with the SQL tables in the TDP, but are organized somewhat differently.
IDS and Metadata schema views in Snowflake differ from TDP SQL tables in the following ways:
- Each IDS version has its own schema.
- Each IDS version’s
root
tables, subtables,datacubes
, anddatacubes_data
tables are all converted into views within the same IDS schema. For more information, see IDS-Specific Tables. file_info_v1
andfile_attribute_v1
tables are converted into views namedMETADATA_FILE_INFO
andMETADATA_FILE_ATTRIBUTE
, and are located in a single metadata schema for the entire shared database.- The
METADATA_FILE_INFO
andMETADATA_FILE_ATTRIBUTE
views include the following additional columns, which aren’t included in their associatedfile_info_v1
andfile_attribute_v1
tables:file_name
: Shows the name of the underlying filefile_modified_at
: Shows when was the file info entry was last modifiedloaded_at
: Shows when Snowflake registered that the file was created or updated- (For
METADATA_FILE_ATTRIBUTE
tables only)org_slug
: Shows the organizational slug associated with the file attribute entry
IMPORTANT
TDP SQL table names sometimes appear differently in views, because of Snowflake’s Identifier requirements. For example, in the
METADATAFILE_ATTRIBUTE
view, the value column is converted tovalue_
,group
displays asts_group
, and2d_barcode
displays asts_2d_barcode
.
Updated 3 months ago