Use SQL and Athena to Query Data

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

TetraScience has designed its Data Lake using AWS S3 where:

  1. Data connectors and data pipelines collect data from instruments, CRO, and enterprise databases
  2. Then, data is standardized into the Intermediate Data Schema (IDS)
  3. Finally, data is prepared so that it is ready to be queried through Athena

After your data is harmonized and converted to IDS format, contents are automatically processed to populate databases and tables using a schema already created by TetraScience. You can then run standard SQL ad hoc queries against these tables and retrieve results. Because Athena executes queries in parallel, you can execute complex queries and large datasets quickly and efficiently.


Amazon Athena Pricing

Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. Athena is priced per query. For details, see the Athena documentation for more details.

You can query your data stored in these tables using two methods:

  • Tetra Data Platform (TDP) user interface - With this method, no connection details or third-party tool set up is required. After the TDP has been configured for your organization, you are ready to run any standard SQL query.
  • Use a third-party tool (for example, Tableau to query data) - This method is useful when you want to query the data, and visualize it to perform further analysis.