Querying Data Using SQL and AWS Athena

Amazon AWS Athena is an interactive query service that allows you to use standard SQL to view and analyze data in your organization's Tetra Data Lake S3 bucket.

TetraScience has designed its Data Lake using AWS S3. TetraScience's Data Connectors and Data Pipelines collect data from instruments, CRO, and enterprise databases, standardize the data into TetraScience's Intermediate Data Schema (IDS) and prepare the data such that it is ready to be queried via Athena.

After your data is harmonized and converted to IDS format, contents are automatically processed to populate databases and tables using a schema already created by TetraScience. You can then run standard SQL ad hoc queries against these tables and get results. Because Athena executes queries in parallel, complex queries and large datasets can be executed efficiently.

📘

NOTE:

AWS Athena is priced per query. Please view the AWS Athena documentation for more details.

You can query your data that is stored in these tables using two methods.

  • Use the TDP interface. With this method, you don't need to worry about connection details or third-party tool set up. Once the TDP has been configured for your organization, you are ready to run any standard SQL query.
  • Use a third-party tool, such as Tableau to query data. This method is quite helpful if you want to not only query the data, but visualize it and perform further analysis.

The following sections explain how SQL tables are structured, how to use TDP to view SQL tables, and how to connect using third-party tools.

There is also specific data on how the Waters Empower Database tables are structured.