Query Lakehouse Tables

This guide explains how to effectively query nested data structures within Lakehouse tables.

For information about the different ways you can access your Lakehouse tables and apply these best practices, see Lakehouse Data Access.

Best Practices For Querying Lakehouse Tables

When querying Lakehouse tables, it's recommended that you do the following:

  • Use aliases to improve readability when working with deeply nested structures.
  • Use UNNEST for array processing when needing to work with individual elements.

For more best practices on querying hierarchical data, see Use CROSS JOIN and UNNEST in the Amazon Athena User Guide.

Working With Nested Data

Lakehouse tables contain columns that may represent complex nested structures, including the following:

  • Objects: collections of key-value pairs
  • Arrays: ordered lists of values
  • Combinations of objects and arrays at multiple levels of nesting

Basic Dot Notation Access

Accessing Object Fields

To access fields within object columns use dot notation. To navigate through multiple levels of nested objects, use chained dot notation as in the injection_id column in the following example.

Dot Notation Query Example
sql  
SELECT  
  file_metadata.file_id,  
  runs[1].injection.id as injection_id,  
  systems[1].name as system_name,  
  project.name as project  
FROM  
  lcuv_empower_v17 limit 10;

Results

file_idinjection_idsystem_nameproject
ac7c32e3-05931937System ASite_Working_Projects_X
b6c244b1-87991931System BSite_Working_Projects_Y
332f4d40-72af2033System ASite_Working_Projects_Z

Querying Array Elements

Array Unnesting with UNNEST

To transform arrays into rows, use the UNNEST function. The UNNEST function enables you to create rows out of nested values, so you can better understand what your end state data will look like for your analytics use case.

UNNEST Function Example Query

The following query demonstrates using the UNNEST function to analyze peak data for the Empower IDS.

sql  
SELECT 
        file_metadata.file_id,
        rp_element.analyte AS analyte,
        rp_element.area.value AS area_value,
        rp_element.area.percent.value AS area_percent,
        rpw_element.value AS peak_width,
    FROM lcuv_empower_v17
    CROSS JOIN UNNEST(results) AS t1(result_element)
    CROSS JOIN UNNEST(result_element.peaks) AS t2(rp_element)
    CROSS JOIN UNNEST(rp_element.widths) AS t3(rpw_element);

Results

file_idanalytearea_valuearea_percentpeak_width
6cfec553Lithium26527.016.170.3272604
6cfec553Sodium37879.023.090.59
6cfec553Sodium37879.023.090.313613