Data Warehouse and Relational Database

TetraScience also provides features and services to pipeline your data into Data Warehouse and SQL Data Base solutions. Some of the most common ones include AWS Redshift, AWS RDS and AWS Aurora. This is optional and not provided out of the box. If you need such capability, please let your Account Manager and Solution Architect know!

AWS RedshiftAWS Aurora RDSAWS MySQL/PostgreSQL RDS
Use Cases In contrast to an OLTP database, an OLAP database is designed to process large datasets quickly to answer questions about data. The name reflects this purpose: Online Analytic Processing. Common use cases for an OLAP database are:

- What’s the customer lifetime value of my e-commerce application?
- What’s the median duration of play time for my mobile game?
- What’s the conversion rate for various landing pages based on the referrer?

An OLAP database is optimized for scanning and performing computations across many rows of data for one or multiple columns. To improve performance, OLAP databases are designed to to be columnar. Instead of organizing data as rows, the underlying data in an OLAP database is organized column by column. Columnar based storage allows for better compression and easier sequential reads, features necessary for scanning large amounts of data quickly.

Other popular OLAP databases include

- HP Vertica
- Teradata
- IBM Netezza
- KDB+
OLTP is what most people thinks of databases. It stands for Online Transactional Processing and is designed to serve as a persistent state store for front-end applications. They excel at quickly looking up specific information as well as transactional procedures like INSERT, UPDATE, or DELETE. Some common tasks asked of OLTP systems include situations like:

- What is the name of the current user, when given an email address?
- What is the last stage that a player was in for my mobile game?
- Update the billing addresses for a set of clients

These type of problems require a system that can look up and update one or more columns within one or many rows. The strength of OLTPs is that they support fast writes. A typical workload for OLTP is both frequent reads and writes, but the reads tend to be more of looking up a specific value rather than scanning all values to compute an aggregate. Common OLTP systems are:

- PostgreSQL
- Amazon Aurora
Similar to AWS Aurora
Cost HighMediumLow
Storage Limit PetaByte Scale16 TB16 TB
Concurrent Connections 501000s
(Aurora is designed for super high throughput transactions)

Our recommendation is that you start with the less opinionated and cost-effective solution such as AWS RDS MySQL and RDS PostgreSQL. This allows you to use pretty much most of the common tools and frameworks available and just hit the ground running.

After you become more familiar and confident of your business need and the types of operations you would like to perform, you can then transition to AWS RedShift or AWS Aurora. At that point, you will be able to better understand the type of queries and aggregations you would like to perform and can optimize your use cases by designing the right table in AWS Redshift.

How can you move data from TetraScience Data Lake to Data Warehouse and Relational Database

TetraScience delivery team will provide AWS Glue ETL pipelines to move the latest data from S3 to your data warehouse. We will handle edge cases such as when the data is re-processed (scientists may recalculate the peaks on the chromatogram) and your data warehouse needs to contain the latest changes. Note that this will incur an extra cost based on workload and traffic.


For those of you who would like to use Oracle databases, we recommend you consider AWS Oracle RDS. Here are some answers to your common questions.

Q: Does Oracle RDS provides license?
A: You will need to bring your own Oracle license in order to use the Oracle Enterprise Edition. This is the Bring Your Own License (BYOL) model. However, there is also a License Included model. For more information, please refer to this document.

Q: Can the data be encrypted using Oracle's Transparent Data Encryption?
A: Yes. This is compatible with AWS RDS Oracle.

Q: Does RDS oracle support Spring Data JPA.
A: Yes. This is compatible with AWS RDS Oracle. Reference:
Using Amazon Web Services Relational Database Service for your Spring Boot App