Use SQL and Athena to Query Data
NOTE
You can use SQL and Amazon Athena queries to explore or analyze tabular data across one or more datasets. To look for specific datasets and corresponding files by using a search engine, see How to Search Files in the Data Lake.
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using a standard SQL dialect. Athena allows users to run ad-hoc queries and get results in seconds search on properly structured data.
TetraScience Data Lake uses S3 as a storage for both primary files and processed files, and not all files stored on S3 are available to query. The process for preparing data for querying with Athena is the following:
- Data connectors and data pipelines collect data from instruments, CRO, and enterprise databases
- Then, data is standardized into the Intermediate Data Schema (IDS)
- Finally, data is prepared so that it is ready to be queried through Athena
After your data is harmonized and converted to IDS format, contents are automatically processed to populate databases and tables using a schema created and managed by TetraScience. You can then run standard SQL ad hoc queries against these tables and retrieve results. Because Athena executes queries in parallel, you can execute complex queries and large datasets quickly and efficiently.
Amazon Athena Pricing
Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. Athena is priced per query. For details, see the Athena documentation for more details.
You can query your data stored in these tables using two methods:
- Tetra Data Platform (TDP) user interface: With this method, no connection details or third-party tool set up is required. After the TDP has been configured for your organization, you are ready to run any standard SQL query.
- Use a third-party tool (for example, Tableau to query data): This method is useful when you want to query the data, and visualize it to perform further analysis.
IMPORTANT
If you're using a third-party tool to query Athena tables, you must declare which database that you want to connect to up front.
To declare your database up front, make sure that you use the table names as they appear in the TDP. If you don't declare the database up front, then you must add the database name in your query in the following format:
org_slug.tablename
(for example:demo_uat.lcuv_empower_v2_injection
).
Updated 2 months ago