Deployment Size Options for TDP v3.6.x

Four standard EnvironmentSize settings are available for Tetra Data Platform (TDP) version 3.6.x: Small, Medium, Large, and X-Large. Each size setting can process a different number of files. Custom environment sizes are also available on request.

The following T-Shirt Sizing for TDP v3.6.x Environments table shows the approximate number of files that each standard environment size can process. For more information, contact your customer success manager (CSM).

📘

NOTE

TDP v3.6.x includes significant performance increases from previous TDP versions. To explore optimizing your TDP deployment to improve its performance further, contact your CSM.

T-Shirt Sizing for TDP v3.6.x Environments

Key Performance Indicator (KPI)DefinitionSmall*Medium*Large*X-Large
File capacity The maximum file count that has been validated for each TDP environment size. File counts include all file types and versions stored in Amazon Simple Storage Service (Amazon S3).~20 million files~100 million files~200 million files~500 million files
Concurrent workflowsThe number of workflows that can be run in parallel during a given amount of time.

Note: There's a one- to two- hour ramp-up period for workflows to reach their peak state.
600 workflows1,000 workflows2,000 workflows** 2,000 workflows**
File registration rate per hourThe rate that files can be processed by the TDP each hour.~100,000 files~300,000 files~500,000 files~700,000 files
Workflow creation rate per hourThe rate at which trigger conditions are checked and respective workflows are created by the TDP each hour.~100,000 workflows~200,000 workflows~350,000 workflows~350,000 workflows
SearchEql API request rate per minuteThe number of API requests the Search files via Elasticsearch Query Language (/searchEql) endpoint can handle each minute.

Note: Results are based on a 5.7 MB response size.
~100 requests~150 requests~380 requests~500 requests

* Custom environment sizes are available based on specific workloads. Contact your CSM to review your platform configurations.

** Your virtual private cloud (VPC) must have enough IPs to spin up the required number of containers.

Performance Testing

To create the TDP v3.6.x environment-sizing estimates, 100,000 Empower 5MB RAW JSON files were tested by using the common/empower-raw-to-ids:v3.10.3 protocol.

Performance tests were designed to measure the TDP's KPIs independent of file size and the complexity of any specific task script. This approach was chosen because workflow completion time can vary based on file size and task script definitions.

We selected the Empower to IDS generation pipeline to validate performance, because it's one of the the most commonly used pipelines. The runtime of the test pipeline was controlled by limiting the size of the intermediate data schema (IDS) files.

📘

NOTE

KPIs for specific workflows will vary based on the number of steps in each associated task script and the runtime time of each step.

/SearchEql API Best Practices for Increasing Performance

To help optimize performance when using the Search files via Elasticsearch Query Language (/searchEql) endpoint, make sure that you do the following:

  • Don’t use a wildcard prefix(*)in searches. Queries that include wildcards aren’t as effective, take longer to run, and require more computing resources. For more information, see Wildcard Searches.
  • Don’t fetch all fields when querying a high number of files.
  • Keep response sizes near 5 MB each. To help reduce response sizes, it’s recommended that you do the following:
    • Fetch limited fields only
    • Make sure that you use the FileCategory and IDS Version in your query parameters.