Tetra Amazon S3 Connector

The Tetra Amazon S3 Connector is a standalone, containerized application that automatically uploads files (objects) from your organization's Amazon Simple Storage Service (Amazon S3) buckets to the Tetra Data Platform (TDP) whenever an object is created. Amazon S3 is an object storage service that offers high scalability, data availability, security, and performance.

Design Overview

The Tetra Amazon S3 Connector communicates with S3 by receiving ObjectCreated events through an Amazon Simple Queue Service (Amazon SQS) queue. Events are sent to the queue through Amazon Simple Notification Service (Amazon SNS), and then trigger the Connector to upload the new objects to the TDP. Access permissions are configured through AWS Identity and Access Management (IAM).

Architecture

The following diagram shows an example Tetra Amazon S3 Connector workflow:

Architecture of S3 connector

Example Tetra Amazon S3 Connector workflow

The diagram illustrates the following workflow:

  1. An object is uploaded to the source Amazon S3 bucket, or an object in the S3 bucket has its metadata modified.
  2. An ObjectCreated event is sent to an Amazon SNS topic.
  3. The SNS topic sends a message that contains the S3 event to an Amazon SQS queue that is subscribed to the SNS topic.
  4. The Connector continuously polls the SQS queue to receive messages by using long polling.
  5. When the Connector receives a message, it checks that the message includes an ObjectCreated event. It also checks the key of the S3 object against an optional set of configured path patterns, defaulting to accepting all keys.
  6. If the message has the wrong type or a key that doesn't match, the message is removed from the queue.
    -or-
    If the message passes the required filters, then the Connector gets the object from the source S3 bucket and streams the data to the TDP Data Lake by using the platform's data acquisition service.
  7. After an object successfully uploads, the Connector deletes the event message from the SQS queue. If the upload fails, the message is not deleted and becomes available on the queue again after the queue's visibility timeout period.

Operational Guides

For installation and operational instructions, see the Tetra Amazon S3 Connector v1 Operational Guide.