High Availability

TDP takes full advantage of the fault tolerance build in AWS services. TDP design closely follows AWS best practices for each platform component:

  • Datalake files: stored in AWS S3, which has 99.999999999% durability and 99.99% availability per year and is designed to sustain data loss in two facilities
  • RDS database: Multi-AZ deployment. The data is permanently and synchronously replicated to a standby instance in abother availability zone. The database will automatically failover to standby in case of an infrastructure problem.
  • ElasticSearch: configured by default with 3 master nodes and 2 data nodes in 2 availability zones. An infrastructure failure in one AZ will not impact the cluster.
  • ECS services: All important platform services are running at least 2 instances, each in its own availability zone, so an instance failure will not impact the overall platform.