Query Lakehouse Tables

This guide explains how to effectively query nested data structures within Lakehouse tables.

For information about the different ways you can access your Lakehouse tables and apply these best practices, see Lakehouse Data Access.

Best Practices For Querying Lakehouse Tables

When querying Lakehouse tables, it's recommended that you do the following:

Use aliases to improve readability when working with deeply nested structures.
Use UNNEST for array processing when needing to work with individual elements.

For more best practices on querying hierarchical data, see Use CROSS JOIN and UNNEST in the Amazon Athena User Guide.

Working With Nested Data

Lakehouse tables contain columns that may represent complex nested structures, including the following:

Objects: collections of key-value pairs
Arrays: ordered lists of values
Combinations of objects and arrays at multiple levels of nesting

Basic Dot Notation Access

Accessing Object Fields

To access fields within object columns use dot notation. To navigate through multiple levels of nested objects, use chained dot notation as in the injection_id column in the following example.

Dot Notation Query Example

sql  
SELECT  
  file_metadata.file_id,  
  runs[1].injection.id as injection_id,  
  systems[1].name as system_name,  
  project.name as project  
FROM  
  lcuv_empower_v17 limit 10;

Results

file_id	injection_id	system_name	project
ac7c32e3-0593	1937	System A	Site_Working_Projects_X
b6c244b1-8799	1931	System B	Site_Working_Projects_Y
332f4d40-72af	2033	System A	Site_Working_Projects_Z

Querying Array Elements

Array Unnesting with UNNEST

To transform arrays into rows, use the UNNEST function. The UNNEST function enables you to create rows out of nested values, so you can better understand what your end state data will look like for your analytics use case.

UNNEST Function Example Query

The following query demonstrates using the UNNEST function to analyze peak data for the Empower IDS.

sql  
SELECT 
        file_metadata.file_id,
        rp_element.analyte AS analyte,
        rp_element.area.value AS area_value,
        rp_element.area.percent.value AS area_percent,
        rpw_element.value AS peak_width,
    FROM lcuv_empower_v17
    CROSS JOIN UNNEST(results) AS t1(result_element)
    CROSS JOIN UNNEST(result_element.peaks) AS t2(rp_element)
    CROSS JOIN UNNEST(rp_element.widths) AS t3(rpw_element);

Results

file_id	analyte	area_value	area_percent	peak_width
6cfec553	Lithium	26527.0	16.17	0.3272604
6cfec553	Sodium	37879.0	23.09	0.59
6cfec553	Sodium	37879.0	23.09	0.313613

Documentation Feedback

Do you have questions about our documentation or suggestions for how we can improve it? Start a discussion in TetraConnect Hub. For access, see Access the TetraConnect Hub.

📘
NOTE
Feedback isn't part of the official TetraScience product documentation. TetraScience doesn't warrant or make any guarantees about the feedback provided, including its accuracy, relevance, or reliability. All feedback is subject to the terms set forth in the TetraConnect Hub Community Guidelines.

Updated 7 days ago

Best Practices For Querying Lakehouse Tables

Working With Nested Data

Basic Dot Notation Access

Accessing Object Fields

Dot Notation Query Example

Querying Array Elements

Array Unnesting with UNNEST

UNNEST Function Example Query

Documentation Feedback

📘NOTE

📘
NOTE