Release date: 2 November 2023
TetraScience has released its next version of the Tetra Data Platform (TDP), version 3.6.0. This release focuses on supporting scientific outcomes by introducing significant performance and usability improvements, such as the following:
- Self-service pipelines (SSPs) are now available in all TDP deployment environments and work with an updated SDK to make it simpler and more secure to build SSPs
- Improved bulk actions for reprocessing pipeline data, editing labels, and reconciling files
- Improved search functionality to make it easier to find data by file path and content
- 2x-4x performance increases across all TDP deployment sizes
- Support for UTF-8 and Kanji characters in file names, contents, and labels
- Low-code DataWeave scripting is now supported in Pipeline configurations
- General availability of Tetra Hub v2 and the Pluggable Connector Framework (previously in beta release), which allow new Tetra Integration functionalities to be released separately from the TDP
- (Beta release) Basic search user experience for scientists
Here are the details for what’s new in TDP v3.6.0.
Keep in mind the following:
- Items labeled as New functionality include features that weren’t previously available in the TDP.
- Enhancements are modifications to existing functionality that improve performance or usability, but don't alter the function or intended use of the system.
- Features marked with an asterisk (*) are for usability, supportability, or troubleshooting, and do not affect Intended Use for validation purposes. Beta Release features are not suitable for GxP use.
Tetra Integrations automatically collect scientific data from different instruments and applications and centralize that data in the Tetra Scientific Data Cloud. You can also use them to send data to designated systems.
The following are new functionalities and enhancements introduced for data integrations in TDP v3.6.0.
The new Pluggable Connector Framework makes it possible for TetraScience to update and release new Connectors independent of a TDP release. Previously available in beta release only, Pluggable Connectors are now generally available and provide the following benefits:
- Expedite the Connector development process by introducing a common Connector framework
- Flexibility when deploying upgrades, because deployments can happen outside of a TDP version release
- Streamlined health monitoring and troubleshooting options (customers can now use Amazon CloudWatch to track each Connector’s activity logs and performance metrics)
The Tetra KEPServerEX Connector is available as a Pluggable Connector currently. The Tetra AGU SDC Connector and Tetra HRB Cellario Connector are scheduled to be released as Pluggable Connectors in 2024.
For more information, see Tetra Connectors.
A new Connectors page was added to the left navigation menu under Data Sources. The page provides general information about each Pluggable Connector that a customer has created within their organization, including its configuration details and diagnostics.
Each Pluggable Connector also now has a Connector Details page (accessed through the Connectors page) that provides more granular information about individual Connectors. Customers can now edit a Pluggable Connector’s information or change the Connector’s status by using the Connector Details page.
For more information, see Create, Configure, and Update Pluggable Connectors.
Tetra Hub is the on-premises connectivity component of the TDP. It facilitates secure data transfer to the Tetra Scientific Data Cloud through components called Connectors. Tetra Hub v2 gives customers the option to release new Hub functionalities or patches without needing to upgrade the entire TDP, which can help reduce overhead and accelerate implementation. Previously available in beta release only, Hub v2 sunsets the use of the Tetra Generic Data Connector (GDC), simplifying deployment. Hub v2 also offers the following benefits that aren’t available in Hub v1 (previously Tetra Data Hub):
- Hosts Pluggable Connectors
- Acts as a proxy for Tetra IoT Agents
For more information, see Tetra Hub.
Instruments that use the Tetra IoT Agent can now connect to the TDP through an on-premises Tetra Hub v2. Previously, the Tetra IoT Agent could connect directly to AWS through the Tetra IoT Layer only.
For more information, see Configure a Hub as a Proxy for a Tetra IoT Agent.
Infrastructure-level notifications for Tetra Hub v2s are now sent to TetraScience Support by AWS automatically. These alerts contain no sensitive information and indicate a Hub’s state and failure reason only. This information helps the TetraScience team provide timely and effective support.
For more information and a list of the notification types, see Tetra Hub v2 Monitoring and Alarms.
Tetra File-Log Agent events can now be viewed in the new Events Timeline tab on the File Details page, and the new Integration Events tab on the Health Monitoring page.
The following APIs also now return information about Tetra File-Log Agent events:
- Get activity events from all Agents—returns events from all Tetra File-Log Agents.
- Get Agent event types—returns a list of event names that a specific Tetra File-Log Agent generates.
The Get activity events from all Agents and Get Agent event types list endpoints are in beta release currently and may require changes in future TDP releases. For more information, customers must contact their CSM.
To help improve usability, the following changes were made to the TDP user interface (UI).
- When creating or updating an Agent in the TDP, customers can more easily select from pre-existing Hubs (v1 or v2) or create necessary Service Users without leaving the Agent Wizard. Also, the Install Agent page is now named Install Agent Locally. For more information, see Create a New Agent.
- The File Upload API endpoint and the Generic Data Connector (GDC) now support specifying labels when uploading files.
- The Agents page now includes an Enabled filter, which by default only displays enabled Agents. The All and No filter options still allow customers to view deactivated Agents when needed. For more information, see Cloud Configuration of Tetra Agents.
- The Archive files with no checksum option now appears on the Path Configurations pane in the TDP UI. Previously it was available in the Tetra File-Log Agent Management Console and API only. This option is available for Agents version 4.3.2 and higher only. For more information, see Configure Tetra File-Log Agent FileWatcher Service.
- Tetra File-Log Agent paths now have a backslash (
\) appended to them if there’s not one already there. Adding a backslash to the Agent’s paths ensures consistent behavior with the File-Log Agent Management Console.
- Parent proxy settings are now configurable on the Tetra Hub v2 management console only.
TetraScience provides many Tetra Data models as well as options for creating custom schemas. You can use these schematized representations of common scientific data in pipelines to automate data operations and transformations.
The following are new functionalities and enhancements introduced for data harmonization and engineering in TDP v3.6.0.
All TDP deployment environments can now create their own custom, self-service pipelines (SSPs). Previously, only customer-hosted (single tenant) deployments could use SSPs. The new SSP runtime environment also simplifies the way customers define their protocols through a new protocol.yml format.
To start using SSPs in a Tetra-hosted environment, customers must first do the following:
- Upgrade to TDP v3.6.0
- Upgrade to the latest versions of the TetraScience Software Development Kit (SDK 2.0) and TetraScience Command Line Interface (CLI)
- Build custom artifacts for their SSPs by using the latest TetraScience product versions
For more information, see Self-Service Tetra Data Pipelines.
On the IDS Details page, a new ERD tab displays an interactive Entity Relationship Diagram that represents the relational schema for an Intermediate Data Schema (IDS). This new IDS view can help customers quickly understand the relationship between their IDSs' associated Athena tables and create more effective SQL queries.
For more information, see View IDSs and Their Details.
Customers can now input DataWeave scripts directly into pipeline configurations within the TDP UI. This new functionality provides a standard way to pass parameters between a DataWeave script and a task script.
The new DataWeave protocol version isn’t backward compatible. The previous protocol expects a
fileUUIDin a parameter. The new protocol expects a DataWeave script.
The new TetraScience SDK 2.0 provides more security when creating and using self-service Tetra Data pipelines (SSPs). SDK 2.0 replaces the legacy SDK. Customers should plan on rebuilding and releasing their existing protocols to use the new SDK 2.0 before the legacy one is deprecated.
For more information, see the TetraScience SDK 2.0 Release Notes.
Existing SSPs and task scripts built with the previous design will continue to work during the deprecation period. The current estimated earliest deprecation date is Q4 of 2024.
The File Processing page now loads 20 times faster than in previous TDP versions.
Customers can now view Connector artifacts by using the Artifacts option in the left TDP navigation menu. These artifacts contain the definition, assets, and code for Tetra Connectors.
For more information, see View Connectors and Their Details.
A new Bulk Reprocess button on the File Processing page provides customers the option to quickly reprocess multiple files by the following criteria:
- WORKFLOW STATE
- FOR LAST (time range in days)
- HOW MANY FILES
- JOB NAME
For more information, see Create a Bulk Pipeline Process Job.
The Scan for Unprocessed Files action on the File Processing page uses a 30-day time period by default. To scan by a more specific or longer time frame, customers can use the Bulk Pipeline Process page.
Pipeline notification emails now include the associated organization slug and infrastructure name in the subject line and in the body of the email.
To help improve usability, the following changes were made to the TDP UI.
IDS UI Improvements
- On the IDS Details page, the Search Indices option has been removed. Removing this option from the TDP UI safeguards the system from accidental actions. To review indices, customers can contact their CSM.
Data in the Tetra Scientific Data Cloud is easily accessible through search in the TDP user interface, TetraScience API, and SQL queries. This harmonized content is standardized to allow comparisons across data sets, easy access from informatics applications, and reuse by advanced analytics and AI/ML.
The following are new functionalities and enhancements introduced for data access and management in TDP v3.6.0.
The TDP now supports UTF-8 and Kanji characters in file names, contents, and labels.
For more information, see Label Formatting.
The new Tetra Snowflake Integration provides customers the ability to access their Tetra Data directly through Snowflake, in addition to the current TDP SQL interface.
The Tetra Snowflake Integration is in beta release currently and may require changes in future TDP releases. For more information or to activate this functionality, customers must contact their CSM.
The new Basic Search page is designed for scientific users to be able to:
- Quickly search for RAW data in the TDP by the following criteria:
- Existing search bar
- File upload date
- Any populated recommended labels
- Existing saved searches (also referred to as Collections)
- Create, update, and manage saved searches
- Download multiple files at once with the new Bulk Download option.
The Basic Search page is in beta release currently and may require changes in future TDP releases. This experience is a non-breaking change and is activated for customers on request only.
For more information, see Basic Search (Beta Release).
- Results returned by the Search feature now display results based on content in the primary (RAW) and schematized (
IDS) versions of files, allowing for more powerful contextual search without metadata, tags, or labels. This enhancement is available for data that’s processed after the TDP v3.6.0 upgrade only. To apply this enhancement to historical data, customers must reindex the data by reconciling it, or contact their CSM for support. For more information, see Perform a Basic File Search.
- Customers can now enter a portion of a file path into either the TDP Search bar feature or TetraScience API search in the query_string to return results, rather than the entire file path. This enhancement is available for data that’s processed after the TDP v3.6.0 upgrade only. To apply this enhancement to historical data, customers must either reindex the data by reconciling it, or contact their CSM for support.
- You can now copy metadata, tags, or labels from an expanded search record and paste the search-ready metadata, tags, and labels string into a search box.
- Nested fields within output (
IDS) files are now searchable from the Search bar in the TDP UI. For more information, see Nested Types.
- File paths are now indexed in a way that makes it more efficient and cost-effective to run RAW EQL queries. Now, wildcard prefixes aren't needed for text searches. For more information, see Search by Using Elasticsearch Query DSL.
- The scalability of the Search files via Elasticsearch Query Language API endpoint was improved by reducing the resource requirements of requests that return large data sets.
- The Search files via Elasticsearch Query Language API endpoint now returns more specific error codes than the previous 400 status code responses.
Label values now also support backslashes (
\) in addition to letters, numbers, spaces, and the following symbols: plus signs (
+), dashes (
-), periods (
.), underscores (
For more information, see Label Formatting.
Processing a single file can no longer run a pipeline more than once and run duplicate TDP workflows, or send duplicate events to downstream systems.
Amazon Athena SQL queries no longer return any stale or incorrect data when two workflows that are working on the same IDS file run and complete actions within 20 seconds of one another.
Good Manufacturing / Laboratory / Documentation / Machine Learning Practices (GxP) help make sure products such as drugs, medical devices, or active pharmaceutical ingredients are safe, consistent, and high quality. Establishing a universal framework for managing data across R&D and manufacturing operations provides the backbone for these compliance efforts.
The following are new functionalities and enhancements introduced for GxP compliance in TDP v3.6.0.
There is no new GxP compliance functionality in this release.
The following entities and events unrelated to user actions that create, modify, or delete electronic records are now removed from the Part 11 Audit Trail for improved usability:
This change does not remove any of the affected entities or events from the system. The entities and events that were removed from the Audit Trail are still logged and remain available upon request. These additional logs will also be made available through an upcoming Activity Logs feature in TDP v3.7.0.
- Auth Token
- Database Credentials
- Filter Field
- Service User
- User Setting
- GIT Integration
- Task Script Profile
- Task Script Build
- Feature Flag
Also, Pipeline entities no longer require a change reason entry for Reprocess or Submit files for process actions when the Change Reason enabled in Audit Trail setting is activated.
The following entities are now part of the the Audit Trail for GxP compliance purposes:
- Hubs (for Tetra Hub v2s)
- Pluggable Connector
File entities also now record download events in the Audit Trail.
For more information, see Entities and Logged Actions.
By using the TDP system administration features, customers can manage organizations, users, and roles as well as access logs, metrics, alerts and more.
The following are new functionalities and enhancements introduced for TDP system administration in TDP v3.6.0.
The new Organization Certificates feature provides organization administrators the ability to upload their own self-signed Secure Sockets Layer (SSL) certificates to the TDP. After upload, Pluggable Connectors can now trust these self-signed SSL certificates when making requests to HTTPS endpoints that use those certificates for encryption.
For more information, see Manage Self-Signed SSL Certificates for Pluggable Connectors.
All new AWS Key Management Service (AWS KMS) keys now automatically rotate their key material every year (approximately 365 days from their creation). For more information, see Rotating AWS KMS keys in the AWS Documentation.
For existing AWS KMS keys that were created before TDP v3.6.0, customers must activate automatic key rotation manually in the TDP by using a ts-admin role. For instructions, see AWS KMS Key Rotation.
The new Commands page provides organization administrators the ability to view, search, and check the status of the commands run within their organization. This functionality was previously available through the TetraScience Command Service API endpoints only.
For more information, see Command Service.
The new List command actions (
commands/actions) endpoint returns a list of all distinct actions (command types) in the system.
The following search fields were also added to the Search commands API endpoint:
targetIdnow support searching for multiple values (for example,
sortBysorts response results by any of the following:
- The following date range options are also now supported:
updatedAtAfter. These are all inclusive date ranges.
Shared settings names now support underscores (
_). Previously shared settings names supported dashes (
-), periods (
.), and alphanumeric characters only.
For more information, see Add a Shared Setting.
The TDP now provides better visibility into file errors and options for fixing those errors by introducing the following improvements:
- A new top-level Bulk Actions option in the left navigation menu provides quicker access to the following:
- An easier way to view related system logs from AWS CloudWatch for errors
- On the Health Monitoring page, customers can now select between specific error codes, selected files, or only failed files when creating a Reconciliation Job (for more information, see File Failures)
When a Connector is configured to use a shared setting or secret, the Usage count is now included on the Shared Settings page.
For more information, see Access the Shared Settings Page.
TetraScience continually works to improve TDP performance and scalability. The following are performance and scale improvements for TDP v3.6.0.
Each available EnviornmentSize setting for the TDP includes the following performance enhancements for TDP v3.6.0:
- File registration rate increased 2x-4x
- Concurrent workflows and workflow creation rates increased 2x
- Search API (
SearchEql) request rate increased 4x
For more information, see Deployment Size Options for TDP v3.6.x.
TetraScience continually monitors and tests the TDP codebase to identify potential security issues. Various security updates were applied to the following areas:
- Operating systems
- Third-party libraries
There are multiple TDP deployment options available to customers, each with its own set of system requirements.
The following are new functionalities and enhancements introduced for TDP installation and deployment in TDP v3.6.0.
There is no new functionality for installation and deployment in this release.
There are no new enhancements for installation and deployment in this release.
The following customer-reported bugs are now fixed.
- When editing a user-defined Agent, customers can no longer create a Source Type value that includes unsupported, uppercase letters.
- Customers can now change the DNS used by an L7 proxy on a Connector’s Data Management page. (Issue #1786)
- The System Messages section of the Agent Management Console now accurately displays Disk Usage warnings and errors consistently. (Issue #2753)
- The Tetra Hub v1 (previously Tetra Data Hub) installation script now detects if any existing installations of Docker or the AWS Command Line Interface (AWS CLI) are compatible. (Issue #1940 and # 2373)
- The Create an agent (
POST /v1/agents) and Update an agent (
PUT /v1/agents/<agentId>) API endpoints now return a
400error code if the
"api"and a non-empty
datahubIdparameter is specified. Previously, these endpoints returned a
200error code and displayed an Agent Not Available message in the TDP UI when this happened.
- The Change agent connector (
PUT /v1/agents/<agentId>/connector) API endpoint now returns a
400error code if the
"api"and a non-empty
datahubIdparameter is specified. The endpoint also now returns a
400error code if the
datahubIdparameter isn't specified. Previously, this endpoint returned a
200error code and displayed an Agent Not Available message in the TDP UI when either of these configurations happened.
- On the Search Files page, in the Labels & Advanced Filters dialog, the Select Field drop-down list now consistently displays all of the searchable fields within an IDS. (Issue #2643 and #2728)
The following features have been deprecated for this release or are now on a deprecation path.
- The Amazon Simple Storage Service (Amazon S3) metadata
ts_processed_file_typeis now deprecated. Current and previous versions of Agents were incorrectly specifying
filefor all uploads, so future Agent versions will no longer populate this metadata key. The TDP now correctly calculates the value of the
file.typeproperty based on the actual file path and by ignoring the
ts_processed_file_typeAmazon S3 metadata.
For more information about TDP deprecations, see Tetra Product Deprecation Notices.
The following are known and possible issues for the TDP v3.6.0 release.
- When installing a Tetra Hub v2 on a host server that already has an AWS Systems Manager registration key, the Amazon ECS container agent startup fails. An AccessDenied error is then logged in the agent’s Amazon CloudWatch Logs. In TDP v3.6.0, the Hub v2 installer automatically detects the issue and provides instructions to fix it.
- The Tetra Hub v2 installation script doesn’t detect an existing Amazon Elastic Compute Cloud (Amazon EC2) instance role on a host server if there is one. If there is an existing AWS Identity and Access Management (IAM) role, the Hub’s Amazon ECS service will attempt to use it. The Hub’s Amazon ECS instance registration process fails when this happens. A fix for this issue is currently in development and testing for a future TDP v3.6.x patch release. As a workaround, customers can detach the Amazon EC2 IAM role from the Amazon EC2 instance, and then rerun the Hub installation script. For more information, see Why Did the Amazon ECS Instance Registration Process Fail During Hub v2 Installation?
- When installing or rebooting a Tetra Hub v2, the Hub’s Health status incorrectly displays as CRITICAL for a short time in the TDP UI. After the TDP receives the Hub’s initial metrics and proxy status, the Hub’s status displays as Online. No action is needed, and no alarms or notifications are generated.
- Files uploaded to the TDP by Agents that use a Tetra Hub v2 proxy incorrectly appear in the system with an
‘api'value for their
integrationType. The files also incorrectly display the following hardcoded API
'6f166302-df8a-4044-ab4b-7ddd3eefb50b'. This behavior shouldn’t impact pipelines or data processing. A fix for this issue is currently in development and testing for the TDP v3.6.1 patch release. After this issue is addressed, all new uploaded file versions will have the correct metadata (
- The Integration Events tab on the Health Monitoring Dashboard might present a spinner if an Agent is configured with no file path (
filePath) and hasn't produced any file events (
- Elasticsearch index mapping conflicts can occur when a client or private namespace creates a backwards-incompatible data type change. For example: If
doc.myFieldis a string in the common IDS and an object in the non-common IDS, then it will cause an index mapping conflict, because the common and non-common namespace documents are sharing an index. When these mapping conflicts occur, the files aren’t searchable through the TDP UI or API endpoints. As a workaround, customers can either create distinct, non-overlapping version numbers for their non-common IDSs or update the names of those IDSs.
- File reprocessing jobs can sometimes show less scanned items than expected when either a health check or out-of-memory (OOM) error occurs, but not indicate any errors in the UI. These errors are still logged in Amazon CloudWatch Logs. A fix for this issue is in development and testing.
- File reprocessing jobs can sometimes incorrectly show that a job finished with failures when the job actually retried those failures and then successfully reprocessed them. A fix for this issue is in development and testing.
- On the Pipeline Manager page, pipeline trigger conditions that customers set with a text option must match all of the characters that are entered in the text field. This includes trailing spaces, if there are any.
- File edit and update operations are not supported on metadata and label names (keys) that include special characters. Metadata, tag, and label values can include special characters, but it’s recommended that customers use the approved special characters only. For more information, see Attributes.
- The File Details page sometimes displays an Unknown status for workflows that are either in a Pending or Running status. Output files that are generated by intermediate files within a task script sometimes show an Unknown status, too.
- File events aren’t created for temporary (TMP) files, so they’re not searchable. This behavior can also result in an Unknown state for Workflow and Pipeline views on the File Details page.
- When customers search for labels that include @ symbols in the TDP UI’s search bar, not all results are always returned.
- When customers search for some unicode character combinations in the TDP UI’s Search bar, not all results are always returned.
- If customers modify an existing collection of search queries by adding a new filter condition from one of the Options modals (Basic, Attributes, Data (IDS) Filters, or RAW EQL), but they don't select the Apply button, the previous, existing query is deleted. To modify the filters for an existing collection, customers must select the Apply button in the Options modal before you update the collection. For more information, see How to Save Collections and Shortcuts.
- The latest Connector versions incorrectly log the following errors in Amazon CloudWatch Logs:
Error loading organization certificates. Initialization will continue, but untrusted SSL connections will fail.
Client is not initialized - certificate array will be empty
These organization certificate errors have no impact and shouldn’t be logged as errors. A fix for this issue is currently in development and testing, and is scheduled for an upcoming release. There is no workaround to prevent Connectors from producing these log messages. To filter out these errors when viewing logs, customers can apply the following CloudWatch Logs Insights query filters when querying log groups. (Issue #2818)
CloudWatch Logs Insights Query Example for Filtering Organization Certificate Errors
fields @timestamp, @message, @logStream, @log | filter message != 'Error loading organization certificates. Initialization will continue, but untrusted SSL connections will fail.' | filter message != 'Client is not initialized - certificate array will be empty' | sort @timestamp desc | limit 20
- If a reconciliation job, bulk edit of labels job, or bulk pipeline processing job is canceled, then the job’s ToDo, Failed, and Completed counts can sometimes display incorrectly.
During the upgrade, there might be a brief downtime when users won't be able to access the TDP user interface and APIs. There will also be a one- to three-hour data migration process running in the background. This data migration might cause the File Failures metric on the TDP Health Monitoring Dashboard to be inaccurate until the data migration is complete. No other functionality will be affected.
After the upgrade, the TetraScience team verifies that the platform infrastructure is working as expected through a combination of manual and automated tests. If any failures are detected, the issues are immediately addressed, or the release can be rolled back. Customers can also verify that TDP search functionality continues to return expected results, and that their workflows continue to run as expected.
For more information about the release schedule, including the GxP release schedule and timelines, see the Product Release Schedule.
For more details about the timing of the upgrade, customers should contact their CSM.
TetraScience is committed to creating quality software. Software is developed and tested by using the ISO 9001-certified TetraScience Quality Management system. This system ensures the quality and reliability of TetraScience software while maintaining data confidentiality and integrity.
To view other TDP release notes, see Tetra Data Platform (TDP) Release Notes.
Updated 6 days ago