Deployment (Archived)

Suggest Edits

❗️
DEPRECATED PAGE
This page is deprecated. The new deployment pages are here:

Overview (starting point for the deployment doc): https://developers.tetrascience.com/docs/single-tenant-overview.

Requirements: https://developers.tetrascience.com/docs/requirements-for-deploying-the-tetra-data-platform

Parameters: https://developers.tetrascience.com/docs/deployment-parameters-single-tenant

Deployment: https://developers.tetrascience.com/docs/deployment-single-tenant

Post Deployment: https://developers.tetrascience.com/docs/post-deployment-single-tenant

Deployment Common Issues: https://dash.readme.com/project/data-integration-platform/v1.0/docs/installation-troubleshooting-single-tenant

Security: https://dash.readme.com/project/data-integration-platform/v1.0/docs/security-and-aws-iam

Required AWS Services: https://dash.readme.com/project/data-integration-platform/v1.0/docs/required-aws-services

VPC Endpoints: https://dash.readme.com/project/data-integration-platform/v1.0/docs/vpc-endpoints

Deployment

Deployment and related activities should be performed by an engineer with good AWS knowledge and full administrator access for the destination AWS account and Disaster Recovery AWS account, if applicable. Permissions should include IAM roles and policies creation and deletion.

Planning for Deployment

Before performing the actual installation, single-tenant customers should consult with TetraScience and decide which application features and components will be in scope:

DisasterRecovery (Y/N)- If enabled, it will automatically replicate data and backups to a different AWS account and region.
Existing VPC / Create VPC - Our stack can automatically create a new VPC and related networking items. If deploying into an existing VPC is desired, networking tasks like creating subnets and routing will be the customer's responsibility.
Public / Private Endpoint - Will the application be exposed to the Internet or not?
DNS entries - Should DNS entries be automatically created in AWS Route53 during deployment, or will the customer manage DNS separately?
Webserver Certificate - Should a HTTPS certificate be created automatically during deployment, or will the customer supply one? Only 1024 and 2048 bits RSA certificates are supported.
Enable Anylink service (Y/N)
Enable Egnyte Integration (Y/N)
Enable Box Integration (Y/N)
Sizing - Based on customer's usage estimations, TetraScience will advise on the value of sizing parameters used at deployment.
EKS worker nodes AMI - The default option, which we strongly recommend, is to use AWS provided EKS optimized images. However, it is also possible to use an AMI provided by the client, which should be 100% compatible with the AWS provided one. The client assumes all risks resulting from running a custom image, which can cause instability in operation and also unusual errors and delays in deployment.

Cloudformation Parameters

Below is the full list of parameters that have to be entered at deployment time:

Data Layer:

Parameter	Default Value	Details
CFTemplateBucket	ts-platform-artifacts	Prefix of the S3 bucket where artifacts are stored. Do not change default.
CFTemplateVersion		Must match the version of the ServiceCatalog product being installed
InfrastructureName		Customer specific. All encompassing name for the created infrastructure. Used as a root for naming. Validate with TetraScience.
Environment	production	Used internally by TetraScience. Do not change default.
IAMRolePrefix		Optional string for prefixing all created IAM roles. Leave empty if not used.
IAMBoundaryPolicy		ARN for a boundary policy that will be attached to all created roles. Leave empty if not used.
EnableDR	false	Set to true if Disaster Recovery should be implemented
DRAWSAccountId		ID of the AWS account used for Disaster Recovery. Leave empty if EnableDR is false.
DRDatalakeKMSKey		ARN of KMS key used to encrypt data in DR. Leave empty if EnableDR is false. See below Disaster Recovery section if EnableDR is true.
DRDatalakeBucket		Name of Datalake bucket for Disaster Recovery. Leave empty if EnableDR is false. See below Disaster Recovery section if EnableDR is true
DRStreamBucket		Name of Stream bucket for Disaster Recovery. Leave empty if EnableDR is false. See below Disaster Recovery section if EnableDR is true
DRBackupBucket		Name of Backup bucket for Disaster Recovery. Leave empty if EnableDR is false. See below Disaster Recovery section if EnableDR is true
DRLocalArtifactsBucket		Name of artifacts bucket used for Disaster Recovery. Leave empty if EnableDR is false. See below Disaster Recovery section if EnableDR is true
EnableElasticsearch	true	Do not change default.
EnableLogging	false	Set to false. The parameter is deprecated and will be removed in the next release.
EsMasterInstanceType	t3.medium.elasticsearch	EC2 instance type for Master ElasticSearch. Validate value with TetraScience.
EsDatanodeInstanceType	m4.large.elasticsearch	EC2 instance type for DataStore ElasticSearch. Validate value with TetraScience.
EsDatanodeInstanceCount	2	Number of EC2 instances in the cluster. Validate value with TetraScience.
EsDatanodeVolumeSize	100	EBS Volume size in GB for Elasticsearch. Validate value with TetraScience.
EsBackupInterval	6	How frequently (hours) to backup ElasticSearch to S3.
InstanceTypeRDS	db.t2.medium	EC2 instance type for the Postgres database. Default value should be enough in most cases.
RDSBackupInterval	24	How often to backup the database (in hours).
RDSBackupSchedule	0 1 * ?	Backup schedule in Cloudwatch Event cron format. Default at 1 AM UTC everyday RDSBackupRetentionDays 30
RDSBackupRetentionDays	30	Number of days to keep DB snapshots before deleting them. There is a limit of 100 snapshots per database.
RDSSnapShot		Leave empty for a standard install. To be used only when recovering from an actual disaster.
CreateVPC	true	If true, it will create a new VPC for the application, together with subnets, security groups, NAT gateways.
VpcCIDR		Network block to use for VPC. If CreateVPC is false, it should match the exiting VPC to be used. For example 10.200.0.0/16.
VPCID		ID of the existing VPC. Leave empty if CreateVPC is true.
PublicSubnetIds		Comma delimited list of subnet IDs. Leave empty if CreateVPC is true.
PrivateSubnetIds		Comma delimited list of subnet IDs. Leave empty if CreateVPC is true.
IsolatedSubnetIds		Comma delimited list of subnet IDs that will be used for Windows workers. Leave unchanged if CreateVPC is true
LogsEndpoint		FQDN of endpoint used for windows workers. Use a VPC endpoint if using isolated subnets for them.
MonitoringEndpoint		FQDN of endpoint used for windows workers. Use a VPC endpoint if using isolated subnets for them.
SqsEndpoint		FQDN of endpoint used for windows workers. Use a VPC endpoint if using isolated subnets for them.
CloudformationEndpoint		FQDN of endpoint used for windows workers. Use a VPC endpoint if using isolated subnets for them.
NotificationEmail		Email address that will be subscribed to alerts via SNS. Should be a group email, to be able to easily add/remove participants.
SourceNotificationEmail		Will be used in the "From" field of pipeline notification emails sent. Needs to be verified with SES.
LogRetentionDays	90	Days for log retention in Cloudwatch
LambdaPrefix		Leave Empty. Used internally by Tetrascience.
STBucket		Leave empty in a normal installation. Used only for DR recovery
DLBucket		Leave empty in a normal installation. Used only for DR recovery

Service Layer:

Parameter	Default Value	Details
CFTemplateVersion	v1.0.0	Must match the version of the ServiceCatalog product being installed
Branch	master	ECR repo suffix. Do not change default.
DataStack		Name of the Data Layer main stack . Can be obtained from the CloudFormation interface.
EnableLogging	false	Set to true if the ES Logging cluster in DataLayer was created.
ClusterType	Fargate	Do not change default.
InstanceTypeECS	t2.large	Legacy. No longer used.
		Domain name used by the web UI.
MinCapacity		Minimum number of ECS containers for . Set to 0 if is not used.
MaxCapacity		Maximum number of ECS containers that can scale to, in case of load. Set to 0 if is not used.
ConnectorMaxMemory	2048	Memory limit for docker containers running on the datahub machines.
TaskThroughput	20	Number of files that can be processed in parallel.
EnableWinTaskScriptService	true	Enable Windows EC2 based workers
WindowsInstanceType	t3.medium	Instance type for Windows workers.
PublicDomain		Domain name used by the web UI. It does not have to be exposed on the internet, can be company internal.
ExposedOnInternet	false	Set to true if the application should be accessible from Internet
NoDNSWeb	false	Set to true if public DNS records are NOT to be created.
PublicDomainZoneId		Public Domain Route53 Zone Id. If left empty, a public DNS hosted zone will be created, unless NoDNSWeb is set to true.
Certificate		ARN of TLS/SSL Certificate registered with ACM. See details in the Pre Deployment Tasks section. If empty, it will try to automatically create a certificate via ACM and the deployment will wait for DNS certificate validation, unless NoDNSWeb is set to true, in which case will disable HTTPS and deploy using unencrypted HTTP. Certificate validation requires a value for PublicDomainZoneId with the zone containing NS entries for the domain.
PrivateDomain	ts-dip.internal	Used for ECS inter-service communication. It can be changed to any name, but the default should work just fine.
MinCapacity	2	Minimum number of ECS containers for . Set to 0 if is not used.
MaxCapacity	4	Max number of ECS containers to scale out to, in case of heavy load.
LambdaPrefix		Leave empty. Used internally by Tetrascience.
AthenaCreateIamUser	false	Enables IAM user creation for Athena access at org creation. Leaving false will restrict service permissions so that IAM users cannot be created from the platform at runtime.
UserAuditLogGroupSuffix	user-action-audit-log	Legacy. Do not change the default value.

Service Parameters and Secrets in SSM

Containers running in ECS need runtime parameters. These parameters may contain sensitive data, such as OAuth tokens, so they are stored encrypted, using a specialized AWS service for secrets management, SSM Parameter Store. The parameters are not shared with TetraScience, so single-tenant customers will have to create them following this procedure.

Parameter	Details	Needed only if
/tetrascience/production/ECS/ts-service-link-file/BOX_CLIENT_ID	BOX Oauth 2.0 custom app Client ID. See below for details	BOX Integration is enabled
/tetrascience/production/ECS/ts-service-web/INT_BOX_CLIENT_ID	Same value as above	BOX Integration is enabled
/tetrascience/uat/ECS/ts-service-link-file/BOX_CLIENT_SECRET	BOX Oauth 2.0 custom app secret.	BOX Integration is enabled
/tetrascience/uat/ECS/ts-service-web/INT_EGNYTE_CLIENT_ID	Egnyte Client iD	Egnyte Integration is enabled

Pre Deployment Tasks

Deployer Privileges
Confirm the user performing the deployment has Full Administrator Rights, via the AWS managed IAM policy AdministratorAccess or equivalent. Anything less will likely cause the deployment to fail, requiring manual cleanup and causing lengthy delays. The application components will run with minimal privileges and administrator access is required only for deployment and upgrade sessions.
AWS CloudTrail
Confirm AWS CloudTrail is configured to save events in a S3 bucket.
DHCP Options
Make sure the VPC's DHCP optionset contains an entry domain-name-servers = AmazonProvidedDNS (only if CreateVPC parameter in DataLayer is set to false).
In case site policies dictate that client internal, non-AWS DNS servers must be used, a manual workaround can be applied:
a) . Get the RDS endpoint from the data layer outputs and inject it into all ECS containers via SSM parameter store and ultimately the POSTGRES_HOST environment variable.
b). Deploy the Service Layer following the normal procedure.
c). Create a zone in the client's DNS for ts-dip.internal and delegate authority for the zone to AWS DNS servers; check Route 53 to get the server names.
VPC and networking infrastructure (only if CreateVPC parameter in DataLayer is set to false):
The deployment VPC needs to provide for the platform's exclusive use:
at least 2 (preferably 3) private /24 or larger subnets in different AZs
at least 2 (preferably 3) public /28 or larger subnets in different AZs (used only for NAT Gateways and not required if Internet traffic will flow via the corporate network)
All AWS Service Endpoints must be reachable from all the VPC subnets; VPC Endpoints may be required. The AWS account should have at lest 3 available Elastic IP addresses, if the platform will be accessed from the Internet.
Log Policy
The ElasticSerach application logs need to be sent to CloudWatch. To allow this, the following AWS CLI command has to be run as an administrator, against the deployment AWS account and Region:

aws logs put-resource-policy --policy-name es2cloudwatch --policy-document '{ "Version": "2012-10-17", "Statement": [{ "Sid": "eslogs", "Effect": "Allow", "Principal": { "Service": "es.amazonaws.com"}, "Action":[ "logs:PutLogEvents"," logs:PutLogEventsBatch","logs:CreateLogStream"],"Resource": "arn:aws:logs:*:*:*:*"}]}'  --region <Region>

AWS Service-Linked Roles
Service-linked IAM roles must be created, if not already present in the destination AWS account.
Below are the CLI commands required:

ECS Service:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

ElasticSearch Service:

aws iam create-service-linked-role --aws-service-name es.amazonaws.com

EC2 KeyPair
The keypair will be used to allow admin access to EKS worker nodes. Follow the AWS Documentation.
TLS Web Certificate
The application can generate its own certificate using AWS ACM, but that requires the DNS domain it will use to be hosted in Route53 in the destination AWS account. If that is not the case, or if automatic generation is not wanted, the customer must obtain or self generate a TLS RSA certificate of 1024 or 2048 bits key length and import it in AWS ACM using this procedure. The certificate should cover both the future domain and its api. subdomain. The certificate ARN will be used as input for the Service Layer.
Configure AWS SES (Simple Email Service)
The platform uses AWS SES to send out notification emails like pipeline result status. The sender email address needs to be a valid email address that is validated with SES using this procedure. Also, a support ticket needs to be raised with AWS to take SES out of Sandbox mode, as documented here.
DisasterRecovery - PreInstall Component (optional)
The optional Disaster Recovery component requires another AWS account (DR account) besides the main account where the product will be installed. A small CloudFormation stack (dr.yml) provided by TetraScience will have to be deployed under the DR account in the AWS DR Region, which should be different than the main region. The stack requires these parameters:

Parameter	Default Value	Details
InfrastructureName		Customer specific. All encompassing name for the created infrastructure. Used as a root for naming. Validate with TetraScience. Same value has to be used in the main product
Environment	dr	Do not change
ProdAWSAccountId		AWS account number where the main product will be installed

After deployment the stack will generate 3 output values which will be used as parameters for the Data Layer.
11. Box.com integration (optional)

Login to your Box account
From the left side menu choose "Dev Console"
Click "Create New App", choose "Custom App" and click "Next"
Select "Standard Oauth 2.0 (User Authentication)"
Choose an appropriate name for the new app and click "Create App"
Click on "View Your App"
From "OAuth 2.0 Credentials" copy "Client ID" and "Client Secret" values

Send Details to TetraScience
TetraScience needs to receive the following data before sharing the ServiceCatalog product:

AWS Account ID where the product will be installed
AWS Region where the product will be installed
IAM username or role of the administrator who will perform the installation
The above should be sent for each environment, if the client requires multiple installations (test and prod, for instance). From a technical point of view, TetraScience will treat each of these installations as a separate production client.

Import AWS Service Catalog Portfolio
Log into the AWS account and region where the deployment will be performed as the administrator who will perform the installation. Navigate to Administration, and from under Portfolios select the Imported tab. From Actions select import portfolio and enter the code received from TetraScience. From the portfolio list, select the recently imported portfolio and then the Users,Goups,and Roles tab. Add the the list the IAM account of the admin user previously shared with TetraScience.

Performing the Deployment

Data Layer
From the AWS Service Catalog web interface select the data layer product from the Products list. Select launch product and the latest version from the list of available ones. Select a suitable name and click next. Fill in the parameters, consulting the table above. Keep clicking Next until you reach the Review stage. Double check the parameter values and if satisfied, click Launch. The deployment has started. It takes around two and up to three hours, depending on the parameters and AWS backend load.
Service Layer
Service Layer can be installed only after a successful Data Layer installation. The procedure is similar to the one for Data Layer.

Post Deployment Tasks

Alert Email Subscription Confirmation
Alert emails will be sent via AWS SNS to the address configured during the deployment of Data Layer. SNS requires the subscription to be confirmed, and sends and email with subject "AWS Notification - Subscription Confirmation". The link in that email must be clicked in order for notifications to work.
Disaster Recovery for Database
If Disaster Recovery is in scope, another small CloudFormation stack named snapshots_tool_rds_dest.json must be installed in the DR AWS account, in the same AWS Region as the main deployment. The stack takes the following parameters:

Parameter	Default	Details
CodeBucket	DEFAULT_BUCKET	Do not change. Where to get lambda code from.
CrossAccountCopy	TRUE	Do not change.
DeleteOldSnapshots	TRUE	No reason to keep snapshots in this region, since they are stored and managed in the DR Region.
DestinationRegion		Disaster Recovery AWS region. For instance us-east-2.
KmsKeyDestination		ARN of the KMS key in the destination DR region. Enter the value of DRRDSKMSKey output of the DR stack installed during pre deployment.
KMSKeySource		ARN of KMS key in the main AWS account and region used to encrypt RDS snapshots. Can be obtained from the AWS KMS interface; the key alias is ts-rds-production
LambdaCWLogRetention	7	Number of day to retain lambda function logs in CloudWatch
LogLevel	INFO	Log verbosity for functions
RetentionDays	7	How many days to keep a snapshot
SnapshotPattern	ts-platform.*	What snapshots to include. Do not change.
SourceRegionOverride	NO	Do not change.

EKS Endpoint Access Control
The AWS EKS endpoint is by default exposed to the Internet, posing a security risk. To mitigate this, the EKS cluster endpoint can be configured to work in Private mode, using this procedure. It is currently not possible to make the endpoint private from within CloudFormation templates. Once the option is made available by AWS, TetraScience will include it in the product and this manual step will no longer be required.
ElastciSearch HTTPS Enforcement
Elasticsearch, by default, also allows plain HTTP connections. To allow only HTTPS, run the following command from a terminal:

aws es update-elasticsearch-domain-config --domain-name <domain_name> --domain-endpoint-options EnforceHTTPS=true

Disabling [email protected] user
[email protected] user is created by default and has access to all organizations in the setup.
You may want to disable this user due to security concerns. To do this:

Login with [email protected]
Switch to TetraScience org
Find [email protected] in the list of users
Disable the user
You will be logged out automatically and won't be able to login with that user back

Operation is irreversible from the portal. The only way to enable [email protected] back is to update user status in database directly.

Generate secure credentials for organization
Each organization uses unique IAM roles, policies, and KMS keys. On a fresh installation or when a new organization is created, we need to generate these components.

Go to Accounts > Manage Organizations.
On TetraScience (current) org, click AWS button.

Use S3 gateway VPC endpoint
We highly recommend enabling S3 VPC Gateway Endpoint for s3 which will reduce the data transfer cost between VPC and S3. Instructions.

Single Sign-on (Optional)

To enable SSO for a deployment, setup an AWS Cognito Userpool and connect it with your identity provider.

Setting up Cognito

When setting up your Cognito user pool ensure the following:

Go to General Settings > Attributes. Ensure email, given name and family name are checked, add a custom attribute named groups with a max length set to 2048, mutable checked.
Go to General Settings > App clients. Click Show details then Set attribute read write permissions and under readable attributes check email, family name, given name and custom:groups.
Go to App integration > App client settings. Set the callback and logout urls to the deployment domain. Set {domain}/login/sso as the callback, and {domain}/logout as the logout URL. Under Allowed OAuth Scopes check email, openid and profile.
Go to Federation > Attribute mapping. Map your identity provider group membership attribute to your custom:groups attribute. Map given and family names also.

Setting up the platform

Gather the following variables and set them in AWS Systems Manager Parameter Store.

Attribute name	Where to find it	Param store location
SSO_DOMAIN	Cognito > App Integration > Domain name	/tetrascience/{environment}/ECS/ts-service-user-org/SSO_DOMAIN /tetrascience/{environment}/ECS/ts-service-web/SSO_DOMAIN
SSO_CLIENT_ID	Cognito > General Settings > App Clients > App client Id	/tetrascience/{environment}/ECS/ts-service-user-org/SSO_CLIENT_ID /tetrascience/{environment}/ECS/ts-service-web/SSO_CLIENT_ID
SSO_REDIRECT_URI	Cognito > App Integration > App Client Settings > Callback URL	/tetrascience/{environment}/ECS/ts-service-user-org/SSO_REDIRECT_URI
SSO_CLIENT_SECRET	Cognito > General Settings > App Clients > App client Secret	/tetrascience/{environment}/ECS/ts-service-user-org/SSO_CLIENT_SECRET Set as SecureString
SSO_GROUPS_ATTRIBUTE	Cognito > General Settings > Attributes > Custom Attributes	/tetrascience/{environment}/ECS/ts-service-user-org/SSO_GROUPS_ATTRIBUTE

After setting the variables into Parameter Store, restart ts-service-user-org and ts-service-web.

SSO login will become available at {domain}/login/sso.

Setting up your Organization

Click the Single sign-on button. In the modal, for each org role (admin, member, readonly) fill in the group membership for each role. For example, if all users who belong to a group named admins group should be in the org admin role, then in the input box under "admin" enter the value "admin group". Save.

Repeat for each role you wish to map to an SSO identity provider group.

Updated over 1 year ago

❗️DEPRECATED PAGE