Query Lakehouse Tables
This guide explains how to effectively query nested data structures within Lakehouse tables.
For information about the different ways you can access your Lakehouse tables and apply these best practices, see Lakehouse Data Access.
Best Practices For Querying Lakehouse Tables
When querying Lakehouse tables, it's recommended that you do the following:
- Use aliases to improve readability when working with deeply nested structures.
- Use UNNEST for array processing when needing to work with individual elements.
For more best practices on querying hierarchical data, see Use CROSS JOIN and UNNEST in the Amazon Athena User Guide.
Working With Nested Data
Lakehouse tables contain columns that may represent complex nested structures, including the following:
- Objects: collections of key-value pairs
- Arrays: ordered lists of values
- Combinations of objects and arrays at multiple levels of nesting
Basic Dot Notation Access
Accessing Object Fields
To access fields within object columns use dot notation. To navigate through multiple levels of nested objects, use chained dot notation as in the injection_id
column in the following example.
Dot Notation Query Example
sql
SELECT
file_metadata.file_id,
runs[1].injection.id as injection_id,
systems[1].name as system_name,
project.name as project
FROM
lcuv_empower_v17 limit 10;
Results
file_id | injection_id | system_name | project |
---|---|---|---|
ac7c32e3-0593 | 1937 | System A | Site_Working_Projects_X |
b6c244b1-8799 | 1931 | System B | Site_Working_Projects_Y |
332f4d40-72af | 2033 | System A | Site_Working_Projects_Z |
Querying Array Elements
Array Unnesting with UNNEST
To transform arrays into rows, use the UNNEST function. The UNNEST function enables you to create rows out of nested values, so you can better understand what your end state data will look like for your analytics use case.
UNNEST Function Example Query
The following query demonstrates using the UNNEST function to analyze peak data for the Empower IDS.
sql
SELECT
file_metadata.file_id,
rp_element.analyte AS analyte,
rp_element.area.value AS area_value,
rp_element.area.percent.value AS area_percent,
rpw_element.value AS peak_width,
FROM lcuv_empower_v17
CROSS JOIN UNNEST(results) AS t1(result_element)
CROSS JOIN UNNEST(result_element.peaks) AS t2(rp_element)
CROSS JOIN UNNEST(rp_element.widths) AS t3(rpw_element);
Results
file_id | analyte | area_value | area_percent | peak_width |
---|---|---|---|---|
6cfec553 | Lithium | 26527.0 | 16.17 | 0.3272604 |
6cfec553 | Sodium | 37879.0 | 23.09 | 0.59 |
6cfec553 | Sodium | 37879.0 | 23.09 | 0.313613 |
Updated 1 day ago