Data Warehouse and Relational Database

TetraScience provides features and services to pipeline your data into Data Warehouse and SQL Database solutions. The most commonly used are:

  • Amazon Redshift
  • Amazon Relational Database Service (Amazon RDS)
  • Amazon Aurora

📘

Optional Features and Services

These features and services are optional and are not provided out-of-the-box. To use these features and services, please contact your Account Manager and Solution Architect.

Amazon RedshiftAmazon Aurora RDSAWS MySQL/PostgreSQL RDS
WebsiteLinkLinkLink
Use Cases In contrast to an OLTP (Online Transactional Processing) database, an OLAP (Online Analytic Processing) database is designed to process large datasets quickly to answer questions about data. Common use cases for an OLAP database are:

- What’s the customer lifetime value of my e-commerce application?
- What’s the median duration of play time for my mobile game?
- What’s the conversion rate for various landing pages based on the referrer?

An OLAP database is optimized for scanning and performing computations across many rows of data for one or multiple columns. To improve performance, OLAP databases are designed to be columnar. Instead of organizing data as rows, the underlying data in an OLAP database is organized by column. Columnar based storage allows for better compression and easier sequential reads necessary for scanning large amounts of data quickly.

Other popular OLAP databases include:

- HP Vertica
- Teradata
- IBM Netezza
- KDB+
OLTP (Online Transactional Processing) is the most commonly thought of database. OLTP is designed to serve as a persistent state store for front-end applications. They quickly look up specific information as well as transactional procedures, such as INSERT, UPDATE, or DELETE. Some common tasks performed by OLTP systems include:

- What is the name of the current user, when given an email address?
- What is the last stage of play for my mobile game?
- Update the billing addresses for a set of clients

These problems require a system that can look up and update one or more columns within one or many rows. The strength of OLTPs is that they support fast writes. A typical workload for OLTP is both frequent reads and writes, but the reads tend to be more of looking up a specific value rather than scanning all values to compute an aggregate. Common OLTP systems are:

- MySQL
- PostgreSQL
- Amazon Aurora
Similar to Amazon Aurora
Cost HighMediumLow
Storage Limit PetaByte Scale16 TB16 TB
Concurrent Connections 501000s
(Aurora is designed for super high throughput transactions)
100s

TetraScience recommends that you begin with the less complicated and most cost-effective solution, for example Amazon RDS MySQL and RDS PostgreSQL. This enables you to use most of the available common tools and frameworks to quickly get started.

After you become more familiar and confident of your business needs and the types of operations you want to perform, you can then transition to Amazon RedShift or Amazon Aurora. At that point, you are more confident with the type of queries and aggregations you want to perform, and can better optimize your use cases by designing the correct table in Amazon Redshift.

How to Move Data from TetraScience Data Lake to Data Warehouse and Relational Database

The TetraScience delivery team will provide AWS Glue ETL pipelines to move the latest data from Amazon Simple Storage Service (Amazon S3) to your data warehouse. We will handle edge cases, such as when the data is re-processed (for example, scientists may recalculate the peaks on the chromatogram) and your data warehouse must contain the latest changes.

📘

AWS Glue ETL Pipelines

Use of the AWS Glue ETL pipelines will incur an extra cost based on workload and traffic.

AWS Oracle RDS

To use Oracle databases, TetraScience recommends that you consider Amazon RDS for Oracle, and you review the information in this table:

QuestionAnswer
Does Amazon RDS for Oracle provide licenses?- For the Bring Your Own License (BYOL) model, you will need to bring your own Oracle license to use the Oracle Enterprise Edition.
- However, there is also a License Included model. For more information, see this document.
Can the data be encrypted using Oracle's Transparent Data Encryption?Yes, this is compatible with Amazon RDS for Oracle.
Does Amazon RDS for Oracle support Spring Data JPA?Yes, this is compatible with Amazon RDS for Oracle. Reference:
Using Amazon Web Services Relational Database Service for your Spring Boot App