Use SQL and Athena to Query Data

📘

NOTE

You can use SQL and Amazon Athena queries to explore or analyze tabular data across one or more datasets. To look for specific datasets and corresponding files by using a search engine, see How to Search Files in the Data Lake.

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using a standard SQL dialect. Athena allows users to run ad-hoc queries and get results in seconds search on properly structured data.

TetraScience Data Lake uses S3 as a storage for both primary files and processed files, and not all files stored on S3 are available to query. The process for preparing data for querying with Athena is the following:

  1. Data connectors and data pipelines collect data from instruments, CRO, and enterprise databases
  2. Then, data is standardized into the Intermediate Data Schema (IDS)
  3. Finally, data is prepared so that it is ready to be queried through Athena

After your data is harmonized and converted to IDS format, contents are automatically processed to populate databases and tables using a schema created and managed by TetraScience. You can then run standard SQL ad hoc queries against these tables and retrieve results. Because Athena executes queries in parallel, you can execute complex queries and large datasets quickly and efficiently.

📘

Amazon Athena Pricing

Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. Athena is priced per query. For details, see the Athena documentation for more details.

You can query your data stored in these tables using two methods:

  • Tetra Data Platform (TDP) user interface - With this method, no connection details or third-party tool set up is required. After the TDP has been configured for your organization, you are ready to run any standard SQL query.
  • Use a third-party tool (for example, Tableau to query data) - This method is useful when you want to query the data, and visualize it to perform further analysis.