Data Acquisition Security

This page describes the data flow, access management policies, and encryption details for Tetra Data Platform and how Tetra Data Platform handles security.

DataHub

The DataHub is the on-premise software component of the Tetra Data Platform. It facilitates secure data transfer to TetraScience through configuration called Data Connectors. Each Data Connector is responsible for pulling or receiving data from a single data source in your environment. A single DataHub may contain multiple Data Connectors.

The DataHub primarily uses these two AWS services:

  • AWS Systems Manager (SSM)
  • AWS Internet of Things (IoT)

AWS Systems Manager (SSM)

AWS Systems Manager Agent (SSM Agent) is Amazon software that runs on your Amazon EC2 instances and your hybrid instances that are configured for Systems Manager (hybrid instances). SSM Agent processes requests from the Systems Manager service in the cloud and configures your machine as specified in the request. SSM Agent sends status and execution information back to the Systems Manager service. AWS Systems Manager lets you remotely and securely manage on-premises servers and virtual machines (VMs) in your hybrid environment.

AWS Internet of Things (IoT)

AWS IoT enables Internet-connected devices to connect to the AWS Cloud and lets applications in the cloud interact with Internet-connected devices.

The DataHub uses an X.509 certificate to connect to AWS IoT using TLS mutual authentication protocols. Other AWS services do not support certificate-based authentication, but they can be called using AWS credentials in AWS Signature Version 4 format. The Signature Version 4 algorithm normally requires the caller to have an access key ID and a secret access key. AWS IoT has a credentials provider that allows you to use the built-in X.509 certificate as the unique device identity to authenticate AWS requests. This eliminates the need to store an access key ID and a secret access key on your device.

The credentials provider authenticates a caller using an X.509 certificate and issues a temporary, limited-privilege security token. The token can be used to sign and authenticate any AWS request. This way of authenticating your AWS requests requires you to create and configure an AWS Identity and Access Management (IAM) role and attach appropriate IAM policies to the role so that the credentials provider can assume the role on your behalf.

The following diagram illustrates the credentials provider workflow.

1760

Credentials provider workflow

When the DataHub is activated after installation, an IoT X.509 certificate, with an organization-specific policy is downloaded to the DataHub machine. Temporary credentials are requested every 30 minutes and are valid for 1 hour. The IoT certificate created for each DataHub can be revoked from the Tetra Data Platform if necessary.

This procedure summarizes the steps used to securely retrieve temporary credentials so that the DataHub can communicate directly with a specific set of AWS resources and services for your organization:

  1. The AWS IoT device makes an HTTPS request to the credentials provider for a security token.
    The request includes the device X.509 certificate for authentication.
  2. The credentials provider forwards the request to the AWS IoT authentication and authorization
    module to validate the certificate and verify that it has permission to request the security token.
  3. If the certificate is valid and has permission to request a security token, the AWS IoT authentication and authorization module returns success. Otherwise, it sends an exception to the device.
  4. After successfully validating the certificate, the credentials provider invokes the AWS Security Token Service (AWS STS) to assume the IAM role that you created for it.
  5. AWS STS returns a temporary, limited-privilege security token to the credentials provider.
  6. The credentials provider returns the security token to the device.
  7. The device uses the security token to sign an AWS request with AWS Signature Version 4.
  8. The requested service invokes IAM to validate the signature and authorize the request against access policies attached to the IAM role that you created for the credentials provider.
  9. If IAM validates the signature successfully and authorizes the request, the request succeeds. Otherwise, IAM sends an exception.

Encryption

Every organization on the Tetra Data Platform is automatically provisioned with a separate AWS KMS (Key Management Service) key. AWS KMS uses the Advanced Encryption Standard (AES) algorithm in Galois/Counter Mode (GCM), known as AES-GCM. AWS KMS uses this algorithm with 256-bit secret keys. Each KMS key automatically rotates yearly.

Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS) is used to encrypt all data at rest.

Sensitive configuration parameters, such as usernames and passwords for Data Connectors are stored in AWS Parameter Store, and are encrypted with your organization-specific KMS key. AWS Parameter Store is a service that provides secure, hierarchical storage for configuration data management and secrets management.

Access to encrypt and decrypt with your organization’s KMS key is only granted through IAM roles and policies as described below.

Identity and Access Management

Security and access controls for data and AWS resources are strictly enforced through AWS Identity and Access Management (IAM) policies and roles. All IAM users, policies and roles for your organization are generated automatically using AWS CloudFormation. All infrastructure changes, including these IAM resources, can only be modified through our version control, code review and TetraScience’s automated build system.

This is an example of an IAM policy for a DataHub:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "UploadToS3",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::${bucket}/YOUR_ORGANIZATION_ORG_SLUG/*"
      ]
    },
    {
      "Sid": "AccessKMSKey",
      "Effect": "Allow",
      "Action": [
        "kms:Encrypt",
        "kms:GenerateDataKey",
        "kms:DescribeKey"
      ],
      "Resource": [
        "arn:aws:kms:::key/YOUR_ORGANIZATION_KEY_ID"
      ]
    },
    {
      "Sid": "DownloadDataConnectorImages",
      "Effect": "Allow",
      "Action": [
        "ecr:GetDownloadUrlForLayer",
        "ecr:LlstImages",
        "ecr:BatchGetImage",
        "ecr:DescribeImages",
        "ecr:BatchCheckLayerAvallablllty",
        "ecr:GetReposltoryPollcy"
      ],
      "Resource": [
        "arn:aws:ecr:us-east-l:xxxxxxxxxxxx:reposltory/data-connector-*"
      ]
    }
  ]
}

This DataHub policy can:

  1. Upload only to a TetraScience’s S3 bucket, under your organization’s top level folder
  2. Specific access to encrypt data with your organization’s KMS key
  3. Download Data Connector images to collect new data from your environment

Each organization also has automatically created IAM policy which can be attached to already existing IAM user. The user with the attached policy will be able to:

  1. Access data only for your organization within a predefined TetraScience bucket
  2. Decrypt those S3 objects with your organization-specific KMS key
  3. Access the Glue database and tables for your organization
  4. Read results written by Athena to a predefined folder within Athena-results bucket that matches the orgSlug. The folder name must match the org slug.

📘

Note

It is possible to have a user created by the Tetra Data Platform, and that operation is configurable on the environment level with the deployment parameter AthenaCreateIamUser.

The auto create IAM policy name is constructed as: ts-athena-<aws-region>-<environment>-<organization-slug>-policy.

This is a policy name example: ts-athena-us-east-2-production-tetrascience-policy

To learn how to attach the policy to the existing user, click here

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3Access",
      "Effect": "Allow",
      "Action": [
        "s3:Get*"
      ],
      "Resource": [
        "arn:aws:s3:::TS_ATHENA_BUCKET/ORG_SLUG/**"
      ]
    },
    {
      "Sid": "S3ResultsAccess",
      "Effect": "Allow",
      "Action": [
        "s3:Get*"
      ],
      "Resource": [
        "arn:aws:s3:::TS_ATHENA_REUSLTS_BUCKET/ORG_SLUG",
        "arn:aws:s3:::TS ATHENA RFUSLTS BUCKET/ORG SLUG/**"
      ]
    },
    {
      "Sid": "AccessKMSKey",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey",
        "kms:DescribeKey"
      ],
      "Resource": [
        "arn:aws:kms:::key/YOUR ORGANIZATION KEY ID"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "athena:GetQueryResults",
        "athena:GetTable",
        "athena:GetTables",
        "athena:RunQuery",
        "athena:StartQueryExecutlon",
        "athena:StopQueryExecut ion"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "GluePermlsslons",
      "Effect": "Allow",
      "Action": [
        "glue:Get*"
      ],
      "Resource": [
        "arn:aws:glue:*:*:catalog",
        "arn:aws:glue:*:*:database/ORG_SLUG",
        "arn:aws:glue:*:*:table/ORG_SLUG/*"
      ]
    }
  ]
}

Organization and Infrastructure Provisioning

When you create a new organization on the Tetra Data Platform, an orgSlug is created and assigned to your organization. An orgSlug is a unique identifier used to create logical separation for data and data access. If your company is called Example Company, your orgSlug may be something like exampleco. This concept is typically hidden from you and your organization, however, if your organization needs direct access to data in S3, or access through AWS Athena, you will see references to this orgSlug.

All infrastructure changes, including organization-specific IAM policies, roles, and users, can only be modified through version control, code review, and TetraScience’s automated build system. Direct changes to infrastructure, resources, or application code are prohibited. Below is a diagram describing the flow of infrastructure changes through CloudFormation:

1175

Infrastructure Changes through CloudFormation

Data Isolation

S3

The Tetra Data Platform uses several S3 buckets to store RAW and transformed/standardized data and data pipeline artifacts. Each S3 bucket is shared among all organizations, however the top level orgSlug key is used to partition and isolate data for each organization. This isolation is enforced through IAM policy.

S3 bucket(s) Per Organization

By default, AWS only allows 100 buckets per account. While this limit can be increased per request, it should also be noted that S3 bucket names are globally unique across all AWS customers. Even if buckets are created through CloudFormation or some other automated mechanism, maintaining these resources to ensure that these endpoints are secure and avoid naming collision is a burden. Additionally, the IAM policies that you would write for a specific bucket are fundamentally the same as what TetraScience has provided on a “folder” level.