Disaster Recovery

Disaster recovery sites (DR sites) provide the infrastructure required for resuming operations on a secondary site following a disaster that has greatly impaired the main production site. To preserve data and restore service following a catastrophic event that renders a TDP production site inoperable, TetraScience creates DR sites in a second AWS Region in a different geography (DR Region) for each Tetra hosted deployment. All data within each TDP environment, including all user files and platform state, is replicated to the DR Region.

For deployments in the European Union (EU), data is not replicated in AWS Regions outside of the EU. For US deployments, data is not replicated in Regions outside of the United States.

🚧

IMPORTANT

Legacy customer hosted TDP deployments require customers to create and manage their own DR sites for disaster recovery services to be available. For more information, see Disaster Recovery for Customer Hosted Deployments in the TetraConnect Hub. For access, see Access the Tetra Connect Hub.

Disaster Recovery Service Objectives

The Recovery Time Objective (RTO) for TDP deployments with DR sites configured is 12 hours. This is the maximum likely time period in which the production enviornment is unavailable because of a disaster.

The Recovery Point Objective (RPO) for TDP deployments with DR sites configured varies between 15 minutes and 12 hours, depending on the data type. This is the maximum likely time period in which your data may be lost because of a disaster.

Standard RPO Time Frame for TDP Data Types

The following table shows the standard RPO time frame for each TDP data type:

Data TypeUnderlying AWS ServiceRPO Values
Raw and processed files in the TDPAmazon Simple Storage Service (Amazon S3)15 minutes
Configurations such as pipeline settings, user permissions, and event historyAmazon Relational Database Service (Amazon RDS) and Amazon Elastic Container Service (Amazon ECS)12 hours
File indexing and search functionalityAmazon OpenSearch Service6 hours

Architecture for TDP Disaster Recovery

During disaster recovery, a TDP environment is restored from a geo-replicated, automated backup. File stores are available in read-only configuration in the secondary DR site. Connectivity settings from the primary production site are mirrored in the DR site.

The following diagram shows an example TDP disaster recovery configuration:

TDP DR architecture

For more information about data backups, see TDP Availability and Resilience.

Recovery Actions

The TDP installation process is fully automated by using Infrastructure as Code (IaC). The recovery procedure, performed in the DR Region, is similar to a new TDP installation for all stateless components. Once the recovery environment has been created, the recovery procedure uses the replicated data that is available in the DR Region.

Recovery ActionInitiated ByDescription
Stop data replication from PROD environment to DR siteTetraScienceMake sure that data is no longer replicated from the PROD environment to the DR site by deactivating data replication from the source Amazon S3 bucket.
Perform a new deployment of the TDP based on the data and configurations in the DR siteTetraScienceThe TDP is deployed in a newly provisioned environment and linked with the persisted data and configurations from the DR site.
Reinstall any Tetra Hubs and Data Hubs along with their Agents and ConnectorsCustomerAny existing, on-premises Tetra Hubs and Data Hubs along with their Agents and Connectors must be reinstalled by the customer

Disaster Recovery Testing

A disaster recovery test is performed for each major and minor TDP release.

The disaster recovery test consists of recovering a TDP environment in the DR Region from the replicated data, and then performing data validation with the data in the Tetra hosted production environment.

📘

NOTE

All TDP environments continue to run normally and are not affected by the disaster recovery test in any way.