TDP Athena SQL Table Structure

Understanding the structure of the SQL tables that hold your file's data can be quite helpful to determine which tables you want to query. This section presents the structure and naming conventions for the database and tables.

About the Primary Database Name

Each organization’s primary database name is the normalized organization slug (organization short name). The normalization process, which occurs when the database is automatically created, converts all dashes to underscores. For example, if the org slug is “pharmacorp-uat”, the normalized form would be “pharmacorp_uat“.

📘

NOTE:

Aggregated databases are no longer used.
In versions of the TDP prior to 3.0, each organization also had an aggregated database. The name of that database was the primary database name plus _aggregated. For example “demo_uat_aggregated“. The aggregated database used Apache Parquet, which stored data more compactly. Table names and table column names were the same as the primary database.

General Table Structure

In the TDP, Athena tables are created based on the TetraScience IDS designed for each specific instrument or use case.

There are several general rules that guide the automatic creation of Athena tables from a particular IDS. Note that this is not something that you need to do, but it is helpful to be aware of these guidelines as they can help you identify the data you are interested in reviewing.

  • Each key whose value is not an array of objects or array of arrays, is a column in the root table.
  • Keys that contain an array of objects or array of arrays, become separate tables and are linked to the parent table using parent_uuid.
  • Keys that contain an array of literals, become columns in a table.

The diagram below shows an example of how an IDS file’s data is converted into Athena tables. We will use this graphic to illustrate concepts in this section. Note the "datacubes" field is covered near the end of this article in another section. If the diagram is too small on your screen, click it to make it bigger.

10361036

IDS to Athena table diagram

📘

NOTE:

For each table name, the <IDS name>_<IDS version>_ prefix is omitted in this graphic for simplicity.

Table Naming Conventions

Each table name has the format like this:
<IDS name>_<IDS version>_<root field name>_<child field name 1>_<child field name 2>_…

For example, for the table named "ms_masslynx_v2_root":

  • IDS Name: ms-masslynx
  • IDS Version: v2
  • Root Field Name: root

In the example above, the root table is simply named "root". But in some cases, the root table is named differently. For example, the root field name for for empower tables is "injection" because "injection is the root field in the IDS. Therefore the name of the table is: lcuv_empower_v2_injection.

Root tables are important to identify because they typically contain a primary key and foreign keys that are in other tables; understanding this is essential to using joins to provide context for your queries.

Column Names

Examples of columns in tables appear below.

25302530

"root" table

15081508

"method_instrument_detector_channels" table

In the TDP, you can see the columns each table has as well as its datatype. For column names, all nonalphanumeric characters are converted to underscores due to the character restriction of S3 and Athena (S3 and Athena still allow some special characters, but for simplicity, TetraScience converts all special characters to underscores).

📘

NOTE:

If you are using a third-party tool to query Athena tables, you might need to declare the database to connect to upfront. If you do this, you can use the table names as they appear in the system. If not, you will need to add the database name in your query like this: "demo_uat"."lcuv_empower_v2_injection". For more information on using third-party tools to query Athena tables, see TBD.

Datacube Tables

Depending on the instrument you are using, you may or may not have datacubes tables in your SQL table list.

datacubes table creation is handled differently than general Athena table creation. There are only two datacube tables:

  • <IDS name>_<IDS version>_datacubes
  • <IDS name>_<IDS version>_datacubes_data

For example, Empower v2 “datacubes” tables are:

  • lcuv_empower_v2_datacubes
  • lcuv_empower_v2_datacubes_data

Reference the following graphic as you read more about the datacubes and datacubes data tables.

961961

IDS datacubes to Athena tables diagram

📘

NOTE:

For each table name, the <IDS name>_<IDS version>_ prefix is omitted in this graphic for simplicity.

  • The datacubes table contains all metadata, each measure’s name and unit, as well as each dimension’s name and unit. As illustrated below, each measure’s name and unit and each dimension’s name and unit will be an individual column. These columns will be auto-generated depends on the number of measures and the number of dimensions defined in the IDS file.

  • The datacubes_data table stores measure and dimension values. And, like name and unit, each measure and dimension value will be an individual column.

23122312

"datacubes" table

10381038

"datacubes_data" table

Metadata Table

Platform metadata that TetraScience creates is stored in the metadata_v1 table in your SQL table list.

11671167

metadata_v1 table

In metadata_v1, you will see file_id, file_path, ids_type, ids_version as the columns. This metadata is displayed in the when you click the "Preview" button on the JSON files on "Search Files" screen on the platform.

Viewing the Database and Tables

You can easily see the tables for common instruments as well as your organization's custom tables in two places in the TDP:

  • Data Screen
  • SQL Tables Screen
    For information on how to view the tables in TDP, see this article.

You can also view them using third-party tools. For information on how to do that see this article.