Use Snowflake to Access Tetra Data

🚧

IMPORTANT

The ability to access data through Snowflake is available through an early adopter program (EAP) currently and is activated for customers through coordination with TetraScience. For more information, or to activate this feature in your TDP environment, contact your customer success manager (CSM).

The Tetra Data Platform (TDP) provides the ability to access your Tetra Data directly through Snowflake, in addition to the TDP SQL interface.

Architecture

The following diagram shows an example workflow for accessing Tetra Data through Snowflake.

Tetra Snowflake Integration workflow example

TDP/Snowflake workflow example

The diagram shows the following process:

  1. Tetra components connect to the Amazon Simple Storage Service (Amazon S3) bucket that stores Intermediate Data Schemas (IDSs) and their associated Amazon Athena tables. Tetra Data that’s stored in the Tetra Scientific Data Cloud is exposed to a Snowflake database in a External Staging Account that TetraScience manages.
  2. A Snowflake database Direct Share securely shares data to a consumer Snowflake database in an External Sharing Account that's managed by the customer. One schema for each IDS version is shared along with a metadata schema. Each schema has its own queryable data views.
  3. Users can then access their organization’s Tetra Data directly through Snowflake, in addition to the TDP SQL interface.

Prerequisites

To access Tetra Data through Snowflake, a Business Critical Edition Snowflake account is required in the same AWS Region as your TDP deployment.

Configure Snowflake to Access Tetra Data

To set up access to your Tetra Data through Snowflake, do the following.

Share Your Snowflake Account Information with TetraScience

TetraScience runs the automations required to allow Snowflake to access Tetra Data in your environment. To coordinate the deployment, contact your CSM and provide them with the following information about your Snowflake environment:

  • Snowflake Organization ID
  • Snowflake External Sharing Account ID
  • Snowflake External Sharing Account AWS Region

Accept the Direct Share

To view Tetra Data in Snowflake after the integration is activated, you must explicitly accept the Direct Share from the integration by doing the following:

  1. Sign in to Snowsight with your Snowflake External Sharing Account as a user with ACCOUNTADMIN permissions.
  2. Select Data. Then, choose Private Sharing.
  3. Choose Shared With You.
  4. In the Direct Shares section, find the following Direct Share: TDP_EXT_VIEW_SHARE_[ORG-SLUG]
  5. Next to the name of the Direct Share, select the blue button. A Get Data dialog appears.
  6. In the Get Data dialog, for Database name, enter a short name for the shared database (for example, TETRA_SHARE). This is the database that’s used to query IDS data.
  7. For Which roles, in addition to ACCOUNTADMIN, can access this database?, select any additional roles in your Snowflake account that you want to grant database access to.
  8. Select the Get Data button. You can now query Tetra Data in Snowflake by using the shared database.

Access Tetra Data in Snowflake

You can access your Tetra Data in Snowflake after the configuration is set up by using any of the available Snowflake database connection options.

For a complete list of connection options, see Connect to Snowflake in the Snowflake Documentation.

Schema Types

Two schemas are created in Snowflake for each IDS version:

  • IDS schemas: Contain queryable views of each IDS version’s data that map 1:1 to the existing SQL tables in the TDP through Amazon Athena (available views differ between each IDS)

📘

NOTE

IDS schema names in Snowflake use the following format:

<CLIENT | COMMON | PRIVATE>_<IDS-NAME>_<IDS-MAJOR-VERSION>_DATA

  • Metadata schemas: Contain queryable views of the TPD metadata (METADATA_FILE_INFO) and attributes (METADATA_FILE_ATTRIBUTE) associated with each file in the TDP across all file categories

Schema Structure

In the TDP, multiple SQL tables are created for each IDS that you're using, based on the sections (partitions) within each IDS. Knowing how these SQL tables are structured and how they’re converted into views in Snowflake can help you find specific data when running SQL queries.

For more information about how IDS files are converted into SQL tables, see TDP Athena SQL Table Structure. For information about how TDP Athena tables are converted into views in Snowflake, see Differences Between IDS Schema Views and TDP SQL Tables.

Differences Between Schema Views and TDP SQL Tables

To make TDP SQL tables queryable data views in Snowflake, the TDP creates and shares one schema per IDS version along with a metadata schema. Each of these schema types has its own queryable data views that align with the SQL tables in the TDP, but are organized somewhat differently.

IDS and Metadata schema views in Snowflake differ from TDP SQL tables in the following ways:

  • Each IDS version has its own schema.
  • Each IDS version’s root tables, subtables, datacubes, and datacubes_data tables are all converted into views within the same IDS schema. For more information, see IDS-Specific Tables.
  • file_info_v1 and file_attribute_v1 tables are converted into views named METADATA_FILE_INFO and METADATA_FILE_ATTRIBUTE, and are located in a single metadata schema for the entire shared database.
  • The METADATA_FILE_INFO and METADATA_FILE_ATTRIBUTE views include the following additional columns, which aren’t included in their associated file_info_v1 and file_attribute_v1 tables:
    • file_name: Shows the name of the underlying file
    • file_modified_at: Shows when was the file info entry was last modified
    • loaded_at: Shows when Snowflake registered that the file was created or updated
    • (For METADATA_FILE_ATTRIBUTE tables only) org_slug: Shows the organizational slug associated with the file attribute entry

🚧

IMPORTANT

TDP SQL table names sometimes appear differently in views, because of Snowflake’s Identifier requirements. For example, in the METADATAFILE_ATTRIBUTE view, the value column is converted to value_, group displays as ts_group, and 2d_barcode displays as ts_2d_barcode.