Data Warehouse and Relational Database

TetraScience provides features and services to pipeline your data into Data Warehouse and SQL Database solutions. The most commonly used are:

  • AWS Redshift
  • AWS RDS
  • AWS Aurora

📘

Optional Features and Services

These features and services are optional and are not provided out-of-the-box. To use these features and services, please contact your Account Manager and Solution Architect.

AWS Redshift

AWS Aurora RDS

AWS MySQL/PostgreSQL RDS

Website

Link

Link

Link

Use Cases

In contrast to an OLTP (Online Transactional Processing) database, an OLAP (Online Analytic Processing) database is designed to process large datasets quickly to answer questions about data. Common use cases for an OLAP database are:

  • What’s the customer lifetime value of my e-commerce application?
  • What’s the median duration of play time for my mobile game?
  • What’s the conversion rate for various landing pages based on the referrer?

An OLAP database is optimized for scanning and performing computations across many rows of data for one or multiple columns. To improve performance, OLAP databases are designed to be columnar. Instead of organizing data as rows, the underlying data in an OLAP database is organized by column. Columnar based storage allows for better compression and easier sequential reads necessary for scanning large amounts of data quickly.

Other popular OLAP databases include:

  • HP Vertica
  • Teradata
  • IBM Netezza
  • KDB+

OLTP (Online Transactional Processing) is the most commonly thought of database. OLTP is designed to serve as a persistent state store for front-end applications. They quickly look up specific information as well as transactional procedures, such as INSERT, UPDATE, or DELETE. Some common tasks performed by OLTP systems include:

  • What is the name of the current user, when given an email address?
  • What is the last stage of play for my mobile game?
  • Update the billing addresses for a set of clients

These problems require a system that can look up and update one or more columns within one or many rows. The strength of OLTPs is that they support fast writes. A typical workload for OLTP is both frequent reads and writes, but the reads tend to be more of looking up a specific value rather than scanning all values to compute an aggregate. Common OLTP systems are:

  • MySQL
  • PostgreSQL
  • Amazon Aurora

Similar to AWS Aurora

Cost

High

Medium

Low

Storage Limit

PetaByte Scale

16 TB

16 TB

Concurrent Connections

50

1000s
(Aurora is designed for super high throughput transactions)

100s

TetraScience recommends that you begin with the less complicated and most cost-effective solution, for example AWS RDS MySQL and RDS PostgreSQL. This enables you to use most of the available common tools and frameworks to quickly get started.

After you become more familiar and confident of your business needs and the types of operations you want to perform, you can then transition to AWS RedShift or AWS Aurora. At that point, you are more confident with the type of queries and aggregations you want to perform, and can better optimize your use cases by designing the correct table in AWS Redshift.

How to Move Data from TetraScience Data Lake to Data Warehouse and Relational Database

The TetraScience delivery team will provide AWS Glue ETL pipelines to move the latest data from S3 to your data warehouse. We will handle edge cases, such as when the data is re-processed (for example, scientists may recalculate the peaks on the chromatogram) and your data warehouse must contain the latest changes.

📘

AWS Glue ETL Pipelines

Use of the AWS Glue ETL pipelines will incur an extra cost based on workload and traffic.

AWS Oracle RDS

To use Oracle databases, TetraScience recommends that you consider AWS Oracle RDS, and you review the information in this table:

Question

Answer

Does Oracle RDS provides license?

  • For the Bring Your Own License (BYOL) model, you will need to bring your own Oracle license to use the Oracle Enterprise Edition.
  • However, there is also a License Included model. For more information, see this document.

Can the data be encrypted using Oracle's Transparent Data Encryption?

Yes, this is compatible with AWS RDS Oracle.

Does RDS oracle support Spring Data JPA?

Yes, this is compatible with AWS RDS Oracle. Reference:
Using Amazon Web Services Relational Database Service for your Spring Boot App


Did this page help you?