Tetra Amazon S3 Connector

The Tetra Amazon S3 Connector is a standalone, containerized application that automatically uploads files (objects) from your organization's Amazon Simple Storage Service (Amazon S3) buckets to the Tetra Data Platform (TDP) whenever an object is created. Amazon S3 is an object storage service that offers high scalability, data availability, security, and performance.

Design Overview

The Tetra Amazon S3 Connector communicates with S3 by receiving ObjectCreated events through an Amazon Simple Queue Service (Amazon SQS) queue. Events are sent to the queue through Amazon Simple Notification Service (Amazon SNS), and then trigger the Connector to upload the new objects to the TDP. Access permissions are configured through AWS Identity and Access Management (IAM).

Architecture

The following diagram shows an example Tetra Amazon S3 Connector workflow:

The diagram illustrates the following workflow:

An object is uploaded to the source Amazon S3 bucket, or an object in the S3 bucket has its metadata modified.
An ObjectCreated event is sent to an Amazon SNS topic.
The SNS topic sends a message that contains the S3 event to an Amazon SQS queue that is subscribed to the SNS topic.
The Connector continuously polls the SQS queue to receive messages by using long polling.
When the Connector receives a message, it checks that the message includes an ObjectCreated event. It also checks the key of the S3 object against an optional set of configured path patterns, defaulting to accepting all keys.
If the message has the wrong type or a key that doesn't match, the message is removed from the queue.
-or-
If the message passes the required filters, then the Connector gets the object from the source S3 bucket and streams the data to the TDP Data Lake by using the platform's data acquisition service.
After an object successfully uploads, the Connector deletes the event message from the SQS queue. If the upload fails, the message is not deleted and becomes available on the queue again after the queue's visibility timeout period.

Prerequisites

Before you can create and use a Tetra Amazon S3 Connector, you must have the following:

An active TDP environment
Knowledge of Python
An active AWS account
AWS Command Line Interface (AWS CLI) installed and configured
An Amazon S3 bucket
An Amazon Simple Queue Service (Amazon SQS) queue
An Amazon Simple Notification Service (Amazon SNS) topic with an Access Policy that grants SNS:Publish permissions to the S3 bucket
An AWS Identity and Access Management (IAM) role that the Connector can assume (or another IAM permissions method), which grants the following permissions:

Amazon S3 Bucket Permissions
- s3:GetObject
- s3:GetBucketLocation
- s3:ListBucket
- s3:GetObjectVersion
- s3:GetObjectAttributes
Amazon SQS Queue Permissions
- sqs:DeleteMessage
- sqs:GetQueueUrl
- sqs:ReceiveMessage
- sqs:GetQueueAttributes

📘
NOTE
These permissions work with Amazon S3 buckets that are encrypted using the default of server-side encryption with Amazon S3 managed keys (SSE-S3). If your bucket uses server-side encryption with AWS Key Management Service (KMS) keys, you must add the AWS KMS Key permissions to the IAM policy. For more information, see Using IAM policies with AWS KMS in the AWS documentation.

Operational Guides

For installation and operational instructions, see the Tetra Amazon S3 Connector v1 Operational Guide.

Documentation Feedback

Do you have questions about our documentation or suggestions for how we can improve it? Start a discussion in TetraConnect Hub. For access, see Access the TetraConnect Hub.

📘
NOTE
Feedback isn't part of the official TetraScience product documentation. TetraScience doesn't warrant or make any guarantees about the feedback provided, including its accuracy, relevance, or reliability. All feedback is subject to the terms set forth in the TetraConnect Hub Community Guidelines.

Updated 28 days ago