TDP Athena SQL Table Structure Overview
Before you determine which tables to query in the Tetra Data Platform (TDP), TetraScience recommends that you understand the structure of the SQL tables (which contain your file's data). This page describes the structure and naming conventions used for the database and tables.
Normalization of Names
All Athena database, table, and column names are normalized. Because of the character restriction of S3 and Athena, all non-alphanumeric characters are converted to underscores. S3 and Athena still allow some special characters; however for simplicity, TetraScience converts all special characters to underscores.
Primary Database Name
Each organizationās primary database name is the normalized organization slug (organization short name is org slug). The normalization process occurs when the database is automatically created. For example, if the org slug was āpharmacorp-uatā, then the normalized form changes to āpharmacorp_uatā.
Aggregated Databases
As of Tetra Data Platform (TDP) version 3.0, aggregated databases are no longer used.
Table Structure
In the TDP, Athena tables are created based on the TetraScience IDS designed for each specific instrument or use case.
Table Guidelines
The automatic creation of Athena tables from a particular IDS follows these general guidelines:
- Each key whose value is not an array of objects or an array of arrays -> becomes a column in the root table.
- Keys that contain an array of objects or an array of arrays -> become separate tables and are linked to the parent table using the
parent_uuid
. - Keys that contain an array of literals -> become columns in the table.
You should be aware of these guidelines because they may help you identify the data you are interested in reviewing.
This diagram shows an example of how an IDS fileās data is converted into Athena tables:
Diagram Notes
- For improved readability, click on the diagram to increase its size.
- For simplicity, the
<IDS name>_<IDS version>_
prefix is omitted for each table name.

IDS to Athena table diagram
Table Name Conventions
The name of the root table is defined in athena.json. Each table name uses this format:
<IDS name>_<IDS version>_<root field name>_<child field name 1>_<child field name 2>_ā¦
NOTE
Private client namespace tables have
client_<orgSlug>_
prepended to the beginning of the table name. For example, for a table namedms_masslynx_v2_root
in the common namespace, the same table in the private namespace for a company with the org slug, "pharmacorp" would beclient_pharmacorp_ms_masslynx_v2_root
.
Root tables are important to identify because they typically contain a primary key and foreign keys that are in other tables. Root tables are essential to using joins when you define your queries.
For example, for a table named: ms_masslynx_v2_root
, where:
- IDS Name =
ms-masslynx
- IDS Version =
v2
- Root Field Name =
root
In this example, the root table is named root
. However in some cases, the root table may be named differently. For example, the root field name for Empower tables is injection
because injection
is the root field in the IDS. Therefore, the name of the table is: lcuv_empower_v2_injection
.
Column Names
You can use the Tetra Data Platform (TDP) to view the columns in each table and the table datatype. For column names, all non-alphanumeric characters are converted to underscores because of the character restriction of S3 and Athena. S3 and Athena still allow some special characters; however for simplicity, TetraScience converts all special characters to underscores.
This example shows column names in a root
table:

root table example
This example shows column names in a method_instrument_detector_channels
table:

method_instrument_detector_channels table example
Third-Party Tools and Athena
If you are using a third-party tool to query Athena tables, you may need to declare which database to connect to upfront.
- If you do this, then you can use the table names as they appear in the system.
- If do do not declare the database upfront, then you must add the database name in your query, for example:
demo_uat.lcuv_empower_v2_injection
.
Datacube Tables
Depending on the instrument you are using, you may or may not have datacube tables in your SQL table list.
datacubes
table creation is handled differently than general Athena table creation. There are only two datacube tables:
<IDS name>_<IDS version>_datacubes
<IDS name>_<IDS version>_datacubes_data
For example, Empower v2 datacubes tables are:
lcuv_empower_v2_datacubes
lcuv_empower_v2_datacubes_data
This diagram shows an example of how IDS datacubes are converted into Athena tables:
Diagram Notes
- For improved readability, click on the diagram to increase its size.
- For simplicity, the
<IDS name>_<IDS version>_
prefix is omitted for each table name.

IDS datacubes to Athena table
This example shows a datacubes
table:

datacubes
table
The datacubes
table contains all metadata and:
- Each measureās name and unit
- Each dimensionās name and unit
Each measureās name and unit, and each dimensionās name and unit, becomes an individual column. How the columns are auto-generated depends on the number of measures and dimensions defined in the IDS file.
This example shows a datacubes_data
table:

datacubes_data
table
The datacubes_data
table stores measure and dimension values. Similarly to name and unit, each measure and dimension value becomes an individual column.
Metadata Tables
TetraScience creates platform metadata from all files in the Data Lake, and stores it in the metadata_v1
table in your SQL table list.

metadata_v1 table
This information displays as columns in the metadata_v1
table:
file_id
file_path
ids_type
ids_version
To display the metadata from the Tetra Data Platform (TDP) user interface:
- Click Search Files from the left side menu.
- Select a JSON file to review.
- Click View JSON Details to open the file and view its metadata from the Elasticsearch document dialog.
Partitions
To speed up query time, you can specify a partition in athena.json where S3 stores the data differently. A partition divides the data into groups. And ideally, these groups are the ones you would use in your where
clause to filter data and speed up the query process.
Do not select a partition key that is too granular; for example do not use an UUID. An effective partition key should divide the data into groups into similar sizes. For example, an effective partition key for Empower is: "project name".
Due to the character restrictions on the partition value, all non-alphanumeric characters in the value are converted to double underscores (__).
For example:
- The 'project_name' column contains the original value.
- Because 'project_name' was used as the partition key, all of its values are normalized in the
project
column.

Partition
Athena and Partitions
When you use a partition, Athena will add a new virtual column to the table. The column name will be the partition key, and the value will be the values of this key. However, this extra column is not added into the original data.
If you need to add a new partition to the data model, then you must get the latest version.
Use UNION
Clause to Avoid Duplicate Results
UNION
Clause to Avoid Duplicate ResultsIf you change the version of the IDS that you are using in a pipeline, new version-specific tables are created which include the processing result. If you query the results, it may be possible that you see duplicate information from results generated by the old and new versions of IDS.
To avoid having duplicate results and return a distinct set of values, TetraScience recommends that you use the UNION
clause. The UNION
clause enables you to combine two separate SELECT
statements to produce a result set as a union of both the SELECT
statements. To use the UNION
clause:
- The fields used in both
SELECT
statements must be in same order, have the same number, and have the same data type. - The
UNION
clause produces distinct values in the result set
SELECT column_name(s) FROM lcuv_empower_v8_injection UNION SELECT column_name(s) FROM lcuv_empower_v9_injection;
Result set consists of distinct values.
where:
lcuv_empower_v8_injection
= table generated by the first IDS version 8
lcuv_empower_v9_injection
= table generated by the second IDS version 9
Updated about 1 month ago