TDP v3.5.0 Release Notes

Release date: 12 June 2023(Last updated: 20 March 2024)

📘

Quality Management

TetraScience is committed to creating quality software. Software is developed and tested by using the ISO 9001-certified TetraScience Quality Management system. This system ensures the quality and reliability of TetraScience software while maintaining data confidentiality and integrity.

What's New

TetraScience has released its next version of the Tetra Data Platform (TDP), version 3.5.0. This release focuses on shortening time to value for customers by making primary data and TDP artifacts more accessible and simpler to manage.

Customers can now use the Artifacts page to quickly search for Intermediate Data Schemas (IDS) and see each one’s associated protocols, task scripts, and documentation. They can also now search for artifacts referencing a specific instrument.

The new Bulk Edit of Labels feature helps customers add, remove, or update labels for more than one file at a time. A redesigned Label Details page provides clear descriptions of each label that can be edited, while also listing the values that have been set for each label and the count of files where each label appears. Improved search functionality for labels also makes it simpler for customers to filter attributes based on specific labels. The new details page for agents includes configuration details and diagnostics for each TDP agent as well.

It’s now also possible to deploy connectors independently from a TDP release by using the new Pluggable Connector framework that’s currently in beta release. The Tetra Hub, which is also in beta release, gives customers the option to release new functionalities or patches without needing to upgrade the entire TDP.

Enhancements have also been made to the Organization Settings, My Account, and File Details pages, along with several other performance and security improvements.

New Functionality

New functionalities are features that weren’t previously available in the TDP. The following are new functionalities introduced in TDP version 3.5.0.

Details Page for Agents

The new details page for agents provides general information about each TDP agent, including its configuration details and diagnostics. The new page includes the previous Agent Configure and Agent Log pages.

The following are new functionalities provided by the new details page for agents:

  • Agent-level configurations, including Amazon Simple Storage Service (Amazon S3) Destination IDs, and Path Configurations information is now located on the Configuration tab.
  • If the File Log Agent (FLA) is version 4.3 or higher, local archive and delete is enabled.
  • File paths configured in the Tetra File-Log Agent now include default label names and values. Customers can suggest new names and values by contacting their customer success manager (CSM).

For more information, see Cloud Configuration of Tetra Agents.

Artifact Management*

TDP version 3.5.0 improves the way artifacts are managed so that customers can quickly find IDSs, protocols, and task scripts, and then see how they relate to one another. This enhanced search and relationship identification functionality can help customers use Tetra Data or create self-service pipelines more efficiently.

Customers can now do the following on the Artifacts page:

  • Search for IDSs and see their related protocols, task scripts, and corresponding documentation
  • Search for artifacts that are related to a specific instrument

For more information, see View Artifact Information.

Attribute Management and Label Details*

The Attribute Management page now includes tables for metadata, tags, and labels that help customers sort by and search for specific values. Labels now also support a description field, which helps customers share what each label represents with other users, along with how each label can be interpreted. A new Label Details page outlines the different values that have been set for each label in the system and shows the number of files where those labels and values appear.

For more information, see Manage and Apply Attributes.

📘

NOTE

It’s recommended that customers use labels as the primary attribute to store each file’s contextual information, because future versions of the TDP will use the label attribute only. The delete functionality for labels, metadata, and tags has been removed from the TDP user interface (UI), but customers can still delete these attributes by using the TetraScience API. For more information, see Metadata and Tag Deprecation.

Bulk Edit of Labels

Users can now add, remove, or update labels for more than one file at a time by using the new Bulk Edit of Labels feature. This feature can help customers quickly fix a large number of files that have incorrect or incomplete labels, without needing to update each file manually. The feature can also help streamline after-the-fact data enrichment by making it simpler to add labels to multiple files at the same time. Bulk Edit of Labels is available through the TDP UI.

For more information, see Edit Labels in Bulk.

📘

NOTE

Before running a bulk label edit operation on 500,000 or more files, customers must contact their CSM to verify the action.

(Beta Release) Pluggable Connectors*

The new Pluggable Connector framework makes it possible for connectors to be released independent of a TDP release. Pluggable Connectors offer the following benefits:

  • More flexibility when deploying upgrades, because deployments can happen outside of a TDP version release
  • Ability to expedite the connector development process by introducing a common connector framework
  • More streamlined health monitoring and troubleshooting options, because customers can use Amazon CloudWatch to track each connector’s activity logs and performance metrics

Pluggable Connectors are in beta release currently and may require changes in future TDP releases. For more information, customers must contact their CSM.

(Beta Release) Tetra Hub*

The new Tetra Hub replaces the Tetra Data Hub. It gives customers the option to release new functionalities or patches without needing to upgrade the entire TDP, which can help reduce overhead and accelerate implementation. The Tetra Hub also sunsets the use of the Tetra Generic Data Connector (GDC), simplifying deployment.

Tetra Hub is in beta release currently and may require changes in future TDP releases. For more information, customers must contact their CSM.

*These features are for usability, supportability, or troubleshooting, and do not affect Intended Use for validation purposes. Beta Release features are not suitable for GxP use.

Enhancements

Enhancements are modifications to existing functionality that improve performance or usability, but don't alter the function or intended use of the system.

Organization Settings

The new Organization Settings page replaces the previous Organization Details page. The page is now located under Administration within the navigation pane and includes the following new elements:

  • High-level summaries of each organization
  • Organization settings and the ability to edit each setting
  • Two tabs that provide information about login users and service users for each organization

For more information, see Organizations in the Tetra Data Platform.

My Account

Multiple pages are now combined into the new My Account page so that customers can do all of the following in one place:

  • View account information
  • Change a password
  • Copy a personal JSON Web Token (JWT)
  • View information about the organizations that customers have accounts for
  • Switch to another organization

For more information, see View Your Account Details.

(Beta Release) Self-Service Pipelines for Multi-Tenant Customers

Multi-tenant customers can now create their own custom, self-service pipelines (SSPs). To use this feature, multi-tenant customers must do the following:

  • Contact their CSM to activate the SSP for their account
  • Upgrade to the latest version of the TDP
  • Upgrade to the latest versions of the TetraScience Software Development Kit (SDK) and TetraScience Command Line Interface (CLI)
  • Build custom artifacts for their SSPs by using the latest TetraScience product versions

SSPs for multi-tenant customers are in beta release currently and may require changes in future TDP releases. For more information, multi-tenant customers must contact their CSM.

For single-tenant customers to use the new, more secure design, they must upgrade to the latest TetraScience product versions before building custom artifacts for their SSPs. SSPs are activated for single-tenant customers by default, so there is no need for them to contact their CSM to start using the feature.

📘

NOTE

Existing SSPs and task scripts built with the previous design will continue to work during the depreciation period for the old SSP design. Deprecation for the previous SSP design is tentatively scheduled for the second half of 2024.

File Details

The redesigned File Details page helps customers see version-specific information about each file and quickly find the resources related to each file. The page includes details about how files were created as well as information about each file’s associated pipelines, workflows, and other related files. Customers can also now access a file’s related resources directly from that file’s File Details page, or by selecting the Show Related Files button.

For more information, see How to Search Files in the Data Lake.

Workflow Logs Management

To reduce the amount of time and effort needed to collect workflow logs, customers can now download logs for a specific workflow from the Workflow Details page.

For more information, see Using the Workflow Panel to View Workflows, Workflow Histories, Logs, File Properties, and to Manage Files.

Audit Trail

New audit trail entries have been added and some unnecessary audit trail entries have been removed from the Audit Trail page.

For more information, see Audit Trail.

Increased Memory Options for Running Pipelines

Customers can now configure custom memory settings for each step of a pipeline from within the Pipeline Manager. This enhancement eliminates the need to publish a new version of a protocol to specifically change the protocol’s memory and compute requirements. A protocol’s memory and compute needs can now be configured as part of the pipeline.

For more information, see Set Up and Edit Pipelines.

📘

NOTE

If a pipeline is configured to retry upon failure, it now retries three times automatically. Each pipeline retry will use double the memory of the previous attempt to run the pipeline, up to 120 GB. Compute and CPU capacity will also increase based on the amount of memory used for each retry. This increase in memory and compute usage can increase the cost of processing files significantly.

Python Version Options for Task Scripts

Customers can now choose which Python version a task script uses by adding a "runtime" parameter to the script's config.json file. Python versions 3.7, 3.8, 3.9, 3.10, and 3.11 are supported in this release.

For more information, see config.json in Create and Test Scripts.

Protocol and Task Script Build Performance

Protocol and task script build performance has been enhanced. After the initial build, subsequent builds now run up to three times faster.

New Supported Metadata Field for the Tetra File-Log Agent

For Tetra File-Log Agent v4.3.0 and higher, when the S3 Direct Upload option is configured, uploaded data now has file.osFilePath as a supported field. The previous JSON value location trace.ts_os_file_path was moved to file.osFilePath.

User Interface and Other Improvements

To help improve usability, the following changes were made to the TDP user interface (UI):

  • The name of each configuration is now in the Pipeline Edit page.
  • When editing a protocol configuration, customers now see the name of the configuration that they’re editing displayed at the top of the Configuration page.
  • The SQL Search page loads more quickly. Column names are now included in the SELECT 100 Rows query so that customers have more information about what is in a table. Column widths in the search results table can now be adjusted, too.
  • The SQL Access page has a more consistent look and feel.
  • Amazon Athena has upgraded its SQL query engine to include features from the Trino and Presto open source projects. The release of Athena engine version 3 is backwards compatible, supports all of the features of Athena engine version 2, and doesn’t change any TDP functionality. For more information, see Athena engine version 3 in the AWS documentation.
  • Customers can now perform a full text search for label names and values. However, this functionality is available only for data starting from the TDP v3.5.0 release. Older data requires re-indexing.
  • The Options button on the Search Files page that provides extended search options was renamed Label & Advanced Filters. The button was also moved to the bottom left of the page.
  • When customers switch organizations within the TDP and then switch back, the organizational listing is now presented by relevance. This can be helpful if a customer has many organizations in their organizational listing.
  • The buttons for activating compliance features have been renamed to make the action of each button clearer.
  • CSV exports produced from the Audit Trails page now contain change reasons for GxP compliance purposes. The change reason request header is now ts-audit-change-reason.
  • Login events have been removed from audit trails. On the Audit Trails page, customers can no longer see failed login attempts by users with a JSON Web Token (JWT) that’s either not valid or expired.
  • The architecture that supports the file information provided by the File Details page is now optimized so that it can scale with the size of the TDP. The page provides improved and consistent performance, no matter how much data the system is processing.
  • The file linking for documentation and image directories in task script and protocol README files was improved by implementing a consistent file upload strategy.
  • On the SQL Search page, the file_info_v1 table now contains created_at and size data for each file returned by a query.
  • The soft delete file operation (when a file remains in the Data Lake, but is marked as deleted) is now optimized for both performance and cost. Any number of delete events can now be supported by the TDP, with no restrictions.
  • On the Agents page, customers can now add metadata, tags, and labels that include approved characters and symbols.
  • On the Health Monitoring page, the Data Lake Files file-count feature has been removed to improve the performance and scaling of the TDP. To track file processing failures, customers can still use logs, metrics, and alerts along with the File Processing Failures section of the Health Monitoring page.

Bug Fixes

The following customer-reported bugs are now fixed:

  • If change reason logging is activated, it now remains activated when the TDP is upgraded. (Issue #2378)
  • On the Pipeline Configuration page, if a value is reset to null, the null value is now handled properly in the underlying code. (Issue #2187)
  • Users can now add metadata with allowed characters and symbols on the Agents page. (Issues #2255 and #2459)
  • An issue that caused Amazon Simple Queue Service (Amazon SQS) messages to sometimes be sent out of order is now resolved. Because of the enhancement, file metadata now appears consistently between the Tetra Data Lake and what’s discoverable in search. (Issue #2486)
  • Athena and Elasticsearch reconciliation job runtimes have been improved by applying a catch and retry strategy to status message collectors. Now, when status message collectors encounter an unexpected error, the reconciliation job continues processing rather than timing out. (Issues #2390 and #2464)
  • The number of the Cellario connector’s AWS Systems Manager calls was reduced to stop Parameter Store quotas from being met in non-production environments. (Issue #2398)
  • Memory usage within the Agents UI was reduced to help prevent it from sometimes crashing web browsers. (Issue #2328)
  • A delete issue was resolved that was occurring in common namespace tables within TDP deployments that have common/client table dual-write enabled. This issue did not affect the majority of TDP deployments.

Other

  • Athena query results now have a 30 day retention policy to reduce the storage costs

Deprecated Features

The following features have been deprecated for this release or are now on a deprecation path:

  • The Cloud Data Connector (CDC) has been removed and the connections are being auto-migrated to an API connection.
  • The Continuous Verification Report has been removed from TDP.
  • The TetraScience Search Files API endpoint /v1/datalake/search is removed. Customers must use the following Search files via Elasticsearch Query Language endpoint instead: /v1/datalake/searchEql
  • Python version 3.7 support will be deprecated in the second half of 2024. No fixes or security patches will be made for Python v3.7 starting with TDP v3.5.0.
  • The legacy runtime environment for self-service pipelines (SSPs) is scheduled for deprecation during the second half of 2024. The new, more secure SSP runtime environment that’s currently in beta release will replace the legacy SSP runtime environment. Customers should plan on rebuilding and rereleasing their existing protocols to use the new SSP runtime environment before the legacy one is deprecated. More information about migrating to the new SSP runtime environment will be provided during the next TDP release.
  • The nested trace object field was removed from Elasticsearch file schemas for files processed by the Tetra File-Log Agent v4.3 and higher. The trace object field had included the following information, which is now part of the file object field:
    • osFilePath
    • osFolderPath
    • osCreatedUser
      It is not recommended that customers use the previous trace object attributes when configuring integrations, because those attributes can change. Customers should use these new, immutable file object fields instead. For more information, see Elasticsearch File Schema in the TetraScience API Documentation.

For more information about TDP deprecations, see Tetra Product Deprecation Notices.

Known and Possible Issues

The following are known and possible issues for the TDP version 3.5.0 release:

  • If an IDS’s protocol doesn’t have a README file, then the ReadMe tab displays as blank on the IDS Details page for that IDS.
  • When editing a user-defined Agent, customers can create a Source Type value that includes unsupported, uppercase letters. The Source Type value doesn’t allow uppercase letters normally.
  • If customers modify an existing collection of search queries by adding a new filter condition from one of the Option modals (Basic, Attributes, Data (IDS) Filters, or RAW EQL), but they don't select the Apply button, the previous, existing query is deleted. To modify the filters for an existing collection, customers must select the Apply button in the Options modal before they update the collection. For more information, see How to Save Collections and Shortcuts.
  • Processing a single file can sometimes run a pipeline more than once. When this happens, one file can run duplicate TDP workflows or send duplicate events to downstream systems. The behavior occurs because Amazon Simple Queue Service (Amazon SQS) standard queues provide at-least-once message delivery, but don’t stop messages from being processed multiple times.
  • On the Pipeline Manager page, pipeline trigger conditions that customers set with a text option must match all of the characters that are entered in the text field. This includes trailing spaces, if there are any.
  • File edit and update operations are not supported on metadata and label names (keys) that include special characters. Metadata, tag, and label values can include special characters, but it’s recommended that customers use the approved special characters only. For more information, see Attributes.
  • The File Details page sometimes displays an Unknown status for workflows that are actually either in a Pending or Running status. Output files that are generated by intermediate files within a task script sometimes show an Unknown status, too.
  • Max Parallel Workflows and Pipeline Priority settings can sometimes not work as expected. The result can be either a slowdown in parallel processing or an increase in processing speed for all pipelines within an organization. This issue occurs only when two or more pipelines in an organization have pending workflows within the same 30-second time period and different Max Parallel Workflows and Pipeline Priority settings. A fix for this issue is in development and testing, and will be available in the next patch release. As a workaround, customers can temporarily configure all active pipelines within an organization to have the same Max Parallel Workflows setting.
  • For applications that use the /file route to access the TDP File Details page, requests sent to access the page return a 404 error code. A fix for this issue is currently in development and testing, and scheduled for the next patch release. As a workaround, customers can change the URL file path from /file to  /files (for example, change <platform-url>/file/UUID to <platform-url>/files/UUID).
  • Protocol step custom memory settings configured through the TDP UI don’t behave as expected when pipelines are retried after receiving out-of-memory errors. Retries that occur after out-of-memory errors ignore these custom memory settings when calculating the next retry attempt’s memory. A fix for this issue is in development and testing, and will be available in the TDP v3.5.2 patch release. To avoid retries that use less memory than what’s configured in the TDP UI, customers can do either of the following as a temporary workaround:
    • Configure protocol step custom memory settings outside of the TDP UI.
      -or-
    • Configure pipeline retry settings to No Retry.
  • Customer-hosted TDP deployment upgrades to TDP v3.5.0 might fail if AWS permissions for creating network access control lists (ACLs) aren’t configured for the TDP. A fix for this issue is in development and testing, and will be available in the TDP v3.5.2 patch release.
  • When installing a Tetra Hub on a host server that already has an AWS Systems Manager registration key, the Amazon ECS container agent startup fails. An AccessDenied error is then logged in the agent’s Amazon CloudWatch Logs. A fix for this issue is currently in development and testing for a future TDP release. As a workaround, customers can move the existing SSM registration key to a backup location prior to Hub installation, so that the installer won’t detect it. To move an existing SSM registration key to a backup location, run the following command in the host server’s terminal:
    mv /var/lib/amazon/ssm/Vault/Store/RegistrationKey 
    /var/lib/amazon/ssm/Vault/Store/RegistrationKey-backup-$(date +%s)
    
  • The Tetra Hub installation script doesn’t detect an existing Amazon Elastic Compute Cloud (Amazon EC2) instance role on a host server if there is one. If there is an existing AWS Identity and Access Management (IAM) role, the Hub’s Amazon ECS service will attempt to use it. The Hub’s Amazon ECS instance registration process fails when this happens. A fix for this issue is currently in development and testing for a future TDP v3.6.x patch release. As a workaround, customers can detach the Amazon EC2 IAM role from the Amazon EC2 instance, and then rerun the Hub installation script. For more information, see Detach an IAM role in the AWS documentation.

Security

TetraScience continually monitors and tests the TDP codebase to identify potential security issues. Various security related enhancements were addressed.

Upgrade Considerations

During the upgrade, there might be a brief downtime when users won't be able to access the TDP user interface and APIs. After the upgrade is complete, customers should check their pipelines for failures or cancellations and reprocess those pipelines if any are found. Customers should also do a smoke test of their configuration settings to make sure that there are no failures.

After the upgrade, the TetraScience team verifies that the platform infrastructure is working as expected through a combination of manual and automated tests. If any failures are detected, the issues are immediately addressed or the release is rolled back.

For more information about the release schedule, including the GxP release schedule and timelines, see the Product Release Schedule.

For more details about the timing of the upgrade, customers should contact their CSM.

Other Release Notes

To view other TDP release notes, see Tetra Data Platform (TDP) Release Notes.