Overview (Archived)

❗️

DEPRECATED PAGE

This page is deprecated. The new deployment pages are here:

Introduction

This document's purpose is to outline the deployment requirements and procedure for clients who choose to perform a single-tenant deployment in their private AWS environments.

TetraScience Data Integration Platform is a multi-tenant SaaS platform that can be configured for use by a single tenant only and deployed into any AWS account that meets the installation requirements. The AWS account can be controlled and managed by the customer or by TetraScience.

We highly recommend our clients to use our multi-tenant SaaS platform as it is the most reliable, secure and cost-effective option, that will save clients significant effort and have the most up-to-date features. When using multi-tenant, the only component that needs to be deployed is the TetraScience DataHub, which is also required for single-tenant deployments.

The TetraScience Data Integration Platform is deployed from two CloudFormation stacks, each containing multiple nested stacks. The stacks are packaged, versioned and made available to clients as AWS ServiceCatalog products.
Application code is containerized and runs on Amazon’s Elastic Container Service and Elastic Kubernetes Service. Some code is also running in AWS Lambda. Data is stored in S3 while resource-intensive auxiliary services like ElasticSearch, EKS and Postgres Database have their dedicated clusters. This diagram from the Security section shows the interaction between components.

A full list of currently required AWS services is included below. New features and updates may require additional AWS services not mentioned in the list.

The TetraScience Data Integration Platform is not a simple application and cannot be treated as such. It is a collection of tightly coupled open and closed source software that heavily leverages AWS services for reliability, ease of use, scalability and low cost. While certain feature modules may be disabled, the core platform is fixed.

Installation Requirements

  1. A dedicated greenfield AWS account
  2. Deployment is performed by an user with AWS full admin privileges
  3. At least two /24 private subnets in different AZs
  4. At least two /28 public subnets in different AZs (if the deployment will be accessible from Internet)
  5. VPC DHCP options allow resolving Route53 domains
  6. All AWS Service Endpoints are reachable from the subnets

Security and AWS IAM

We use AWS Identity and Access Management (IAM) to manage permissions for the deployed services. In a nutshell, AWS IAM controls who (authenticated entity/user) can do what (permission) and to whom (resource).

IAM Policies define what can be done and to whom.

  • Policies can be grouped in IAM Roles; they can also be attached to a User, or to another entity as an inline policy.
  • Roles can be assigned to IAM Users.
  • Code running inside AWS (like ECS containers or Lambda Functions), has its permissions restricted via the same mechanisms (IAM Roles and Policies).

In our case, Single-Tenant deployments from ServiceCatalog need to be performed by an user with Admin rights in the target AWS account. During deployment, CloudFormation will create various Roles and Policies which are then attached to our resources, allowing, for instance, a service to publish messages to a queue, or a lambda function to write data to a S3 bucket.

When assigning permissions, we follow the principle of least privilege, giving each entity the minimum permissions required to do the job, and no more:

  • We do NOT create any IAM users via CloudFormation
  • We do NOT grant any cross-account permissions
  • All the roles and policies are defined in CloudFormation templates which are shared with the customers and can be inspected at will.
    TetraScience engineers have by default no access to the deployed infrastructure and data, however it is strongly recommended for them to be given at least read-only CloudWatch logs access for troubleshooting and root cause analysis.

Currently Used AWS Services

The target environment must have the ability to support any and all AWS services. The Required AWS Service topic provides more details on the major services required for the deployment of the Data Integration Platform.

Commonly Encountered Issues

  • Custom DNS resolvers: Some organizations configure custom DNS resolvers (via DHCP) for EC2 instances in their VPCs. For the TetraScience platform to function correctly, it's essential that the DNS resolver delegate to AWS's Route53 resolver for domain names such as *.internal that are outside the organization's private cloud/on-premises network.

  • Rollback: Some organizations install TetraScience using an IAM user that has permissions to create certain resources, but not destroy them. This can be a problem during rollback scenarios, because it becomes impossible to undo the upgrade and leaves the system in an intermediate state that requires manual maintenance. Organizations should provide a user with full permissions, or plan ahead for manual workarounds.