TDP v4.2.0 Release Notes

Release date: 18 December 2024

TetraScience has released its next version of the Tetra Data Platform (TDP), version 4.2.0. This release introduces foundational capabilities for enterprise administration teams, such as providing SQL access credentials for individual users. It also simplifies how customers harmonize and engineer their data by providing a streamlined UI experience and the ability to create different pipeline versions. Customers can now track when a label was modified and the user or action that updated it through a new File Attribute History tab as well.

This release also adds several enhancements to the Data Lakehouse Architecture early adopter program (EAP), including making production-ready IDS Delta Tables available in customer-hosted TDP environments. These tables return SQL query results 50%-300% faster than the legacy SQL tables, depending on the query type. Data sharing is now streamlined between the Data Lakehouse and customers' existing Databricks accounts, too.

Here are the details for what’s new in TDP v4.2.0.

📘

Security

TetraScience continually monitors and tests the TDP codebase to identify potential security issues. Various security updates are applied to the following areas on an ongoing basis:

  • Operating systems
  • Third-party libraries

📘

Quality Management

TetraScience is committed to creating quality software. Software is developed and tested following the ISO 9001-certified TetraScience Quality Management System. This system ensures the quality and reliability of TetraScience software while maintaining data integrity and confidentiality.

New Functionality

New functionalities are features that weren’t previously available in the TDP.

📘

GxP Impact Assessment

All new TDP functionalities go through a GxP impact assessment to determine validation needs for GxP installations. New Functionality items marked with an asterisk (*) address usability, supportability, or infrastructure issues, and do not affect Intended Use for validation purposes, per this assessment. Enhancements and Bug Fixes do not generally affect Intended Use for validation purposes, and items marked as either beta release or early adopter program (EAP) are not suitable for GxP use.

Data Integrations

Tetra Empower Agent Cloud Configuration

Customers can now view and modify Tetra Empower Agent project settings directly from the TDP user interface.

System administrators can now run the following actions from the TDP Agents page, without needing to log in to a remote desktop:

  • Configuring Empower project settings
  • Initiating scan requests for Empower projects

This new functionality is available for Tetra Empower Agent v5.3.0 and higher only.

For more information, see TDP UI Configuration in the Tetra Empower Agent User Manual (v5.3.x).

Tetra Empower Agent Project Configurations

Tetra Empower Agent Project Configurations

Standalone Deployments for Pluggable Connectors*

Customers can now deploy Pluggable Connectors on-premises through Standalone deployments, which don’t require a Tetra Hub. By using a Docker container image and installation script provided by the TDP, customers can now install Pluggable Connectors on any host server that can run Docker containers. The network requirements for Pluggable Connectors deployed in this way are the same as the requirements for Connectors deployed on a Tetra Hub.

Standalone deployments of Pluggable Connectors are available on request. To deploy a standalone Connector, contact your CSM or account executive.

For more information, see Connector Deployment Options.

Data Harmonization and Engineering

Pipeline Versioning*

Customers can now create, view, and revert back to specific versions of their pipeline configurations on the new Pipeline Versions page.This new functionality makes it easier for developers to maintain and iterate on their organization’s pipelines.

For more information, see Compare and Restore Different Pipeline Versions.

Pipeline Versions page

Pipeline Versions page

File Attribute History*

The label history for any files uploaded to the TDP after customers upgrade to TDP v4.2.0 is now accessible on the File Details page in the File Attribute History tab. To help improve attribute traceability, this new tab displays changes made to a file’s labels, including when a label was modified and the user or action that updated it.

The File Attribute History tab doesn’t provide information on either metadata or tags, which are on a deprecation track. All changes to labels, metadata, and tags are still also maintained in the Audit Trail.

For more information, see File Attribute History.

File Attribute History tab

File Attribute History tab

TDP System Administration

SQL Access Credentials for Individual Users*

All Non-Admin users (excluding those with only Auditor policy permissions), can now create and rotate their own SQL access credentials on the My Account page for accessing Tetra Data in third-party applications. This new functionality removes the need for administrators to share SQL credentials and makes it easier for all users to access Tetra Data outside of the TDP.

System administrators can also now do the following on the SQL Access page, in addition managing their own SQL credentials and Service User SQL credentials:

  • See a list of all users with active SQL credentials
  • Revoke SQL credentials of Login users

For more information, see the following resources:

🚧

IMPORTANT

Generating new SQL credentials will break existing SQL connections with third-party applications. New SQL credentials now create an Amazon Athena workgroup, which is appended to SQL users' Connection URL. Customers must copy over this new Connection URL along with the newly generated Access Key and Secret Key when re-establishing connections with third-party applications.

Generate SQL credentials for individual TDP users

Generate SQL credentials for individual TDP users

Beta Release and EAP New Functionalities

The following new functionalities are behind feature flags as either part of a beta release or early adopter program (EAP) and are not suitable for GxP use. To use these new functionalities, contact your customer success manager (CSM) or account executive.

Data Retention Policies (EAP)

To help reduce data storage and cost, system administrators can now create Data Retention Policies for each organization in their tenant by using the new Data Retention tab on the Organization Settings page. These policies automatically delete an organization’s data from the TDP after a specified time period. Deleted files' metadata is still retained by the TDP for compliance and tracking purposes.

Data Retention Policies are available to all customers as part of an early adopter program (EAP) and will continue to be updated in future TDP releases. To start using Data Retention Policies, contact your CSM or account executive.

Data Retention tab

Data Retention tab

Enhancements

Enhancements are modifications to existing functionality that improve performance or usability, but don't alter the function or intended use of the system.

Performance and Scale Enhancements

3X Better Database Utilization*

Database utilization in TDP v4.2.0 is three times better than it was in previous platform versions, while still delivering the same throughput performance. This backend performance improvement helps TetraScience scale the TDP rapidly while reducing operational costs. It has no effect on customer’s TDP deployments.

Data Harmonization and Engineering Enhancements

Improved Pipeline Edit UI*

The redesigned Pipeline Edit page makes it easier and more intuitive to set up and manage pipelines.

Updates to the redesigned page allow customers to now do the following more easily:

  • Navigate the pipeline creation and editing experience
  • Filter between different artifact types (protocol or Tetraflow)

For more information, see Set Up and Edit Pipelines.

Pipeline Edit page

Pipeline Edit page

Improved Attribute Management UI*

A redesigned Attribute Management page makes it easier and more intuitive to view and manage labels, metadata, and tags (attributes).

Updates to the redesigned page allow customers to now do the following:

  • View all attributes (labels, metadata, and tags), which are now registered centrally on the Attribute Management page
  • View additional columns for a better attributes management experience

For more information, see Manage and Apply Attributes.

Attribute Management page

Attribute Management page

Improved Label Functionality*

Label functionality has been improved to further meet FAIR data principles through the following enhancements.

Attribute Management Page Enhancements

Improvements to the Attribute Management page include the following:

  • It’s now simpler to view all files with a certain label name and then filter directly to those files.
  • Customers can now filter by label names or label values, and then automatically redirect to the new Search page experience.
  • Customers can now mark labels as inactive so that users can’t select those labels moving forward.

For more information, see Manage and Apply Attributes.

Label Formatting Enhancements

To help avoid potential issues when running SQL queries in OpenSearch, label names can no longer include dot (.) characters.

For more information, see Label Formatting.

TDP System Administration Enhancements

Previously Removed Audit Trail Entities are Now Added to the System Log*

The following entities that were removed from the Audit Trail in TDP v3.6.0, are now recorded on the System Log page:

  • Auth Token
  • Database Credentials
  • Service User
  • User
  • User Setting

These entities are unrelated to user actions that create, modify, or delete electronic records, so were removed from the Part 11 Audit Trail for improved usability.

For more information, see Entities and Logged Actions for System Logs.

Beta Release and EAP Feature Enhancements

The following enhancements are behind feature flags as either part of a beta release or early adopter program (EAP) and are not suitable for GxP use . To use these enhancements, contact your customer success manager (CSM) or account executive.

Lakehouse Architecture is Now Available in Customer-Hosted TDP Environments*

Customers that host their own TDP environments can now participate in the Data Lakehouse Architecture early adopter program (EAP). The Data Lakehouse is an open data management architecture that combines the benefits of both data lakes (cost-efficiency and scale) and data warehouses (management and transactions) to enable analytics and AI/ML on all data. It is a highly scalable and performant data storage architecture that breaks data silos and allows seamless, secure data access to authorized users.

The new data storage and management architecture returns SQL query results 50% to 300% faster than the legacy SQL tables. It also provides an AI/ML-ready data storage format that operates seamlessly across all major data and cloud platform vendors.

Lakehouse Delta Tables Can Now be Accessed in Databricks Through Delta Sharing*

Customers can now access their Lakehouse Delta Tables directly through their Databricks account, if they have one. Lakehouse Delta Tables are shared with customer Databricks accounts through Delta Sharing, an open protocol for secure data sharing across organizations.

To set up the Delta Sharing, contact your CSM or account executive and provide them with your Databricks account information.

Improved Lakehouse Data Ingestion Process*

To help reduce file ingestion errors when backfilling data into the Data Lakehouse, the Lakehouse data ingestion rules no longer skip files that include the following special characters: @, #, $, %, ^, &, *, ], }, !, ?

Infrastructure Updates

The following is a summary of the TDP infrastructure changes made in this release. For more information about specific resources, contact your CSM or account manager.

New Resources

  • AWS services added: 0
  • AWS Identity and Access Management (IAM) roles added: 10 (Service layer: 9 | Data layer: 1)
  • IAM policies added: 0
  • IAM AWS managed policies added: 5 (Service layer: 5 | Data layer: 0)

Removed Resources

  • IAM roles removed: 8 (Service layer: 0 | Data layer: 8)
  • IAM policies removed: 0
  • IAM AWS managed policies removed: 1 (Service layer: 0 | Data layer: 1)

Bug Fixes

The following bugs are now fixed.

Data Harmonization and Engineering Bug Fixes

  • The process for writing IDS data to Amazon Athena tables was improved so that it’s no longer possible to duplicate data when two versions of the same file are processed by multiple pipelines in rapid succession.
  • In the Filters dialog on the Search page, the Select a filter type dropdown now provides the option to filter search results by Tags. The Tag Name filter’s dropdown also now automatically populates with all of an organization’s available tags. (Issue #3916)
  • When customers configure custom memory settings for a new Tetra Data Pipeline, the custom options that display in the Default Memory dropdown can now be selected.
  • When customers define a Source Type trigger for a new Tetra Data Pipeline, the Pipeline Manager page no longer displays the following error message: Critical: TDP service have failed to returned required data (Schemas). Please reload the page or contact your administrator for help.

Deprecated Features

There are no new deprecated features in this release.

For more information about TDP deprecations, see Tetra Product Deprecation Notices.

Known and Possible Issues

The following are known and possible issues for TDP v4.2.0.

Data Harmonization and Engineering Known Issues

  • On the Pipeline Manager page, in the Pipeline Info section, the MAX RETRIES and BASE RETRY DELAY (IN SECONDS) fields still appear even when No retry is selected for a pipeline’s RETRY BEHAVIOR. A fix for this issue is in development and testing and is scheduled for a future release.
  • On the Pipeline Edit page, in the Retry Behavior field, the Always retry 3 times (default) value sometimes appears as a null value.
  • File statuses on the File Processing page can sometimes display differently than the statuses shown for the same files on the Pipelines page in the Bulk Processing Job Details dialog. For example, a file with an Awaiting Processing status in the Bulk Processing Job Details dialog can also show a Processing status on the File Processing page. This discrepancy occurs because each file can have different statuses for different backend services, which can then be surfaced in the TDP at different levels of granularity. A fix for this issue is in development and testing.
  • Logs don’t appear for pipeline workflows that are configured with retry settings until the workflows complete.
  • Files with more than 20 associated documents (high-lineage files) do not have their lineage indexed by default. To identify and re-lineage-index any high-lineage files, customers must contact their CSM to run a separate reconciliation job that overrides the default lineage indexing limit.
  • OpenSearch index mapping conflicts can occur when a client or private namespace creates a backwards-incompatible data type change. For example: If doc.myField is a string in the common IDS and an object in the non-common IDS, then it will cause an index mapping conflict, because the common and non-common namespace documents are sharing an index. When these mapping conflicts occur, the files aren’t searchable through the TDP UI or API endpoints. As a workaround, customers can either create distinct, non-overlapping version numbers for their non-common IDSs or update the names of those IDSs.
  • File reprocessing jobs can sometimes show fewer scanned items than expected when either a health check or out-of-memory (OOM) error occurs, but not indicate any errors in the UI. These errors are still logged in Amazon CloudWatch Logs. A fix for this issue is in development and testing.
  • File reprocessing jobs can sometimes incorrectly show that a job finished with failures when the job actually retried those failures and then successfully reprocessed them. A fix for this issue is in development and testing.
  • File edit and update operations are not supported on metadata and label names (keys) that include special characters. Metadata, tag, and label values can include special characters, but it’s recommended that customers use the approved special characters only. For more information, see Attributes.
  • The File Details page sometimes displays an Unknown status for workflows that are either in a Pending or Running status. Output files that are generated by intermediate files within a task script sometimes show an Unknown status, too.

Data Access and Management Known Issues

  • When customers upload a new file on the Search page by using the Upload File button, the page doesn’t automatically update to include the new file in the search results. As a workaround, customers should refresh the Search page in their web browser after selecting the Upload File button. A fix for this issue is in development and testing and is scheduled for a future TDP release.
  • If an IDS file has a field that ends with a backslash \ escape character, the file can’t be uploaded to the Data Lakehouse. These file ingestion failures happen because escape characters currently trigger the Lakehouse data ingestion rules. The errors are recorded on the Files Health Monitoring dashboard in the File Failures section. There is no way to reconcile these file failures at this time. A fix for this issue is in development and testing and is scheduled for TDP v4.2.1.
  • IDS SQL tables with only parent_uuid and uuid columns aren’t generated during ingestion into the Data Lakehouse. A fix for this issue is in development and testing and is scheduled for TDP v4.2.1. The following IDS SQL tables are known to be affected:
    • liquid_handler_to_freezer_cell_inventory_v2_runs
    • mass_spectrometer_mzml_v2_methods
    • mass_spectrometer_mzml_v2_results
    • hardness_tester_pharmatron_8m_v2_results
  • Values returned as empty strings when running SQL queries on SQL tables can sometimes return Null values when run on Lakehouse tables. As a workaround, customers taking part in the Data Lakehouse Architecture EAP should update any SQL queries that specifically look for empty strings to instead look for both empty string and Null values.
  • The Tetra FlowJo Data App doesn’t load consistently in all customer environments.
  • Query DSL queries run on indices in an OpenSearch cluster can return partial search results if the query puts too much compute load on the system. This behavior occurs because the OpenSearch search.default_allow_partial_result setting is configured as true by default. To help avoid this issue, customers should use targeted search indexing best practices to reduce query compute loads. A way to improve visibility into when partial search results are returned is currently in development and testing and scheduled for a future TDP release.
  • Text within the context of a RAW file that contains escape (\) or other special characters may not always index completely in OpenSearch. A fix for this issue is in development and testing, and is scheduled for an upcoming release.
  • If a data access rule is configured as [label] exists > OR > [same label] does not exist, then no file with the defined label is accessible to the Access Group. A fix for this issue is in development and testing and scheduled for a future TDP release.
  • File events aren’t created for temporary (TMP) files, so they’re not searchable. This behavior can also result in an Unknown state for Workflow and Pipeline views on the File Details page.
  • When customers search for labels in the TDP UI’s search bar that include either @ symbols or some unicode character combinations, not all results are always returned.
  • The File Details page displays a 404 error if a file version doesn't comply with the configured Data Access Rules for the user.

TDP System Administration Known Issues

  • The latest Connector versions incorrectly log the following errors in Amazon CloudWatch Logs:
    • Error loading organization certificates. Initialization will continue, but untrusted SSL connections will fail.
    • Client is not initialized - certificate array will be empty
      These organization certificate errors have no impact and shouldn’t be logged as errors. A fix for this issue is currently in development and testing, and is scheduled for an upcoming release. There is no workaround to prevent Connectors from producing these log messages. To filter out these errors when viewing logs, customers can apply the following CloudWatch Logs Insights query filters when querying log groups. (Issue #2818)
      CloudWatch Logs Insights Query Example for Filtering Organization Certificate Errors
    fields @timestamp, @message, @logStream, @log
    | filter message != 'Error loading organization certificates. Initialization will continue, but untrusted SSL connections will fail.'
    | filter message != 'Client is not initialized - certificate array will be empty'
    | sort @timestamp desc
    | limit 20
    
  • If a reconciliation job, bulk edit of labels job, or bulk pipeline processing job is canceled, then the job’s ToDo, Failed, and Completed counts can sometimes display incorrectly.

Upgrade Considerations

During the upgrade, there might be a brief downtime when users won't be able to access the TDP user interface and APIs.

After the upgrade, the TetraScience team verifies that the platform infrastructure is working as expected through a combination of manual and automated tests. If any failures are detected, the issues are immediately addressed, or the release can be rolled back. Customers can also verify that TDP search functionality continues to return expected results, and that their workflows continue to run as expected.

For more information about the release schedule, including the GxP release schedule and timelines, see the Product Release Schedule.

For more details about the timing of the upgrade, customers should contact their CSM.

Other Release Notes

To view other TDP release notes, see Tetra Data Platform Release Notes.