TDP v3.2.0 Release Notes
Release Date: 03 March 2022
Quality Report
TetraScience is committed to creating quality software. Software is developed and tested using our TetraScience Quality System. The Quality Report for version 3.2.0 is referenced here: DI-5828.
What's New?
TetraScience has released its next version of the Tetra Data Platform (TDP) version 3.2.0. As its major focus, this release targets quality improvements (including UI, Tetra Agent, and API enhancements) made throughout the entire platform with the goal of delivering a delightful customer experience:
- Greater flexibility to manage scientific data (create and configure pipelines from the API, integrate TDP within your business processes)
- Easier and more intuitive Search feature (gain meaningful results through relevance-ranked Search, use familiar web searching query syntax, and create and save search shortcuts for quick access)
- Increased platform control (manage user access across multiple organizations and ensure data integrity using self-service methods of resolving issues across multiple systems)
For details on specific feature and functionality improvements, review these TDP version 3.2.0 release notes:
- Enhancements
- Bug fixes
- Deprecated features
- Known issues
Enhancements
Enhancements are new features and functionality that have been added to the software. Click the arrows next to each section to review the enhancements made for this version of the TDP software:
Connectors, Integrations, and Data Hub
- Tetra Agent improvements are available in the individual Tetra Agent release notes at the end of this topic.
- There were several Tetra Data Hub fixes made to improve the installation experience.
Health Monitoring
-
IT Administrators can now fix existing data problems (from previous errors or failures) and prevent new ones using the new Files Reprocessing feature from the Health Monitoring page. This helps to ensure data integrity and consistency across storage including Amazon S3, Elasticsearch (ES), and the TDP FileInfo Service. IT Admins can explore the discrepancies and fix each one of them individually, or in bulk.
-
The overall TDP Health Monitoring Dashboard has been enhanced to show an improved end-to-end snapshot of the components' health for the entire TDP ecosystem. A new Latest Status column has been added (located between the Name and Health Description columns) to the Dashboard tab on the Health Monitoring page. This column contains the View History link for each item.
Pipelines
- APIs have been added for pipelines and workflows to automate creation and management of pipelines.
- Pipelines have been improved to protect against recursions; the same file cannot trigger the same pipeline again during the same pipeline execution (workflow).
- Documentation details (“Readme” files) for specific Protocols and Task Scripts are now available from within TDP. Note: This does not replace the product documentation TetraScience.
- The maximum size for pipeline storage has been increased from 20 GB to 180 GB.
- You can now easily edit variables that contain large multi-line strings.
- Additionally, several minor enhancements were made to improve pipeline usability and performance, such as: adding the pipeline name to the Edit Pipeline page, and adding a Duration field to the Workflow Processing page that displays the total processing time.
Search
- The ability to Search files in the Tetra Lake has been greatly improved when using the Tetra Data Platform (TDP) web-based user interface. Without having to know EQL (Event Query Language), you can easily apply filters to create complex searches and fine-tune queries. The Search feature on TDP now works like a common website search. Each field you enter is analyzed as both keyword (default) and text (field.text). The default behavior for Search now sorts results by relevance or score instead of sorting by ingestion date in the Tetra Lake. Additionally, you can still organize the result set by Name, Source Type, or Last Modified by clicking the corresponding column label.
- The easy to use, and more-intuitive Search enables you to quickly find and access frequently used files and folders using saved collections and shortcuts. Using Search, you can now save a query or grouping of results as a collection (in the List view), and save file path searches as shortcuts (in the Browse view). Additionally, corresponding Shortcut APIs have been developed to provide an alternative method of using the TDP.
- The Search File Details page has been redesigned and is now the default page. These additional UI improvements include:
- These file details (namespace, slug, and version) have been added to the search results.
- The IDS option from the Platform Metadata File Category on the Search File > Options > Attributes tab was changed to IDS Type to reflect a more accurate filter descriptor.
- Removed has been renamed to Removed Sources, and a new red trash icon (indicating Removed Sources) has been added to the Browse view on the Search page.
- The Elasticsearch preview has been revised by removing these unnecessary UI items for RAW files and PROCESSED files:
labels.nameError
,labels.valueError
,labels.warning
,labels.labelId
,outputFiles
, andinputFiles
. - Navigation within the Search feature has been improved:
- To avoid unnecessary scrolling, the ability to expand and collapse Source Types and Pipelines was added to the side panel of the Search page. Additionally, you can now search within the filter or facet values. For example, you can click Source Type and then start typing “Empower”.
- You can now resize the Folder names column on the Search page in Browse mode to view folder names that appear truncated.
- In addition to clicking the ^ to close the Options screen, you can now click anywhere outside of the screen to close it and retain any settings.
- The Manual Upload folder has been renamed to Manual Uploads and re-positioned to display at the top of the list of folders in Browse mode. (Was released in v3.1.6 as DI-5877)
- You can now click a Preview icon to preview an uploaded file for these added valid files and file types:
- Images with file type: .png, .jpeg, .gif, or .bmp (including 360-degree images)
- .pdf, .csv, .xlsx, and .docx file types
- Video with file type: .mp4 or .webm
- Audio with file type .mp3
- If any file (included in the valid file types) is greater than 50 MB, then a warning message displays indicating that the file size is too large to display in a preview.
- The File-Link Integrations API is now automated and works with deleted integrations.
Other Platform Components
- With multi-organization login support, Administrators can manage multiple projects and organizations from a single instance. A user can now be added to multiple organizations using the same email.
- The organization that the user is currently logged into displays at the top of the Tetra Data Platform (TDP) page. The user can switch between organizations that they belong to, and determine which organization will be their default organization when they log in to the TDP.
Bug Fixes
Click the arrows next to each section to review the bug fixes made for this version of the TDP software:
Connectors, Integrations, and Data Hub
- DI-5186 - The correct GDC status is now reported.
- DI-5330 - The Data Hub installer script can now install AWS CLI because the Python versions now match.
Health Monitoring
- DI-5042 - The Data Hub no longer triggers an incorrect CPU and memory metrics alert on the screen.
- DI-5079 - After having only failed workflows for a period of time and no new files process, the pipeline status now reflects the correct unhealthy status.
- DI-5112 - All files on the Health Monitoring page now display their Source values.
- DI-5113 - Filtering now works correctly when filtering by status for Data Hubs or connectors.
- DI-5116 - When re-authorizing a box account, the cloud connectors now display the correct status.
- DI-5144 - Health Monitoring history now displays correctly for items in critical or unhealthy status.
- DI-5220 - A re-enabled (disabled) pipeline now shows the correct status of Enabled in the Pipelines tab on the Health Monitoring page.
- DI-5226 - The pipeline workflows now update the correct status on the Health Monitoring page.
- DI-5237 - After disabling and enabling a connector from the Agents page, the re-enabled connector now shows. the correct status of healthy in the Data Hubs tab on the Health Monitoring page._
- DI-5275 - After deleting a pipeline, it no longer displays on the Health Monitoring page.
Metadata, Tags, and Labels
- PCLR-15 - Tag changes are now recorded properly in the Audit log.
- PLCR-23/PLCR-24 - Metadata is now recorded properly in the Audit log.
- PLCR-30 - Label changes are now recorded properly in the Audit log.
Pipelines
- PIPE-202 - Auto-refresh duration for the Pipeline Dashboard has been shortened and a manual refresh button is also available.
- PIPE-283 - The Shared Setting Usage page now reflects pipelines, integrations, and data hubs that use the shared setting.
- PIPE-345 - You can now find workflows on the Pipeline Processing page. This issue, which caused workflows to not be recorded on the Pipeline Processing page, has been fixed.
- PIPE-365/366 - Pipeline scanning has been fixed.
- PIPE-373 - The differences between the execution time on the Pipeline Processing page and the Workflow page has been fixed with the addition of the Duration field.
- PIPE-396 - Parentheses, apostrophes, and these additional characters are now permitted in the pipeline description:
- ASCII printable characters (decimal codes from 32 to 126, inclusive)
- Newline characters (CR/LF...decimal codes of 10 and 13)
- PIPE-431 - Files larger than 5 GB can now be accommodated.
- PIPE-437 - For the files that have been processed by a pipeline, a link now displays that will redirect to the corresponding pipeline.
- _PIPE-450 - Both task scripts and data hubs now use the IAM boundary policy.
- PIPE-452 - The continuous spinning wheel was erroneously displaying while selecting an in-progress pipeline. This issue has been resolved by removing the spinning wheel. If a workflow is in process, it now shows the input file, historical workflows, and a dash for the output file.
- PIPE-543 - This pipeline issue caused the secret name not to appear in the workflow logs; the information now displays in the logs.
- PIPE-603 - Pipeline reprocessing now uses the latest version of a RAW file.
- PIPE-604 - Workflows no longer fail due to container shutdown issues.
- PIPE-630/PIPE-346 - Namespace errors no longer occur when creating pipelines.
Search
- DI-4245 - Unreadable or duplicate Search API error response messages no longer display for incorrect requests payload.
- DI-4373/DI-4842/DI-4853 - Opened folders in Browse mode now close and collapse properly when clicked (was released in v3.1.6 as DI-5854).
- DI-4423 - An error message no longer displays when adding attributes to a file.
- DI-4607 - All selection lists on the Search page are now sorted alphabetically.
- DI-4648 - You can no longer manually upload files with a null (empty) Source Type value.
- DI-4660 - The correct source type validation message now displays if an invalid source type is entered when configuring the Source Type for the Path Settings.
- DI-4664 - The TetraScience icon no longer overlaps the Files Counter number at the top of the column.
- DI-4780 - When filtering files by Source Type using Options, the Attributes tab now clearly indicates the selected Source Type on the side bar of the pane.
- DI-4782 - The Search files side bar no longer disappears when there are no files that match the search criteria and remains on the pane,
- DI-4830 - The Bulk Action > Deleted Selected method to delete files from the Search page now displays correctly.
- DI-5108 - The quick reference link at the top of the Search page has been updated and provides the correct URL. link to the appropriate documentation._
- DI-5146 - Folder names are no longer truncated at the bottom of the Search page in Browse mode.
- DI-5169/DI-5776 - During initial search results, Search no longer loads or returns data.results when searching for IDS files, however the JSON preview returns all fields including
data.results
. - DI-5333 - The CONFIG, AUDIT, and TMP folders no longer display as folders in Browse view on the Search page. (was released in v3.1.6 as DI-5878)
- DI-5343 - Refreshing the Search page no longer alters the previous query or Search mode.
- DI-5767 - When searching using the RAW EQL option, the correct endpoint now displays.
- DI-6552 - After processing large files (900 MB), data now displays correctly in Athena tables.
Other Platform Components
- PLCR-16 - (TDP v3.1.2) - Service users now display on the Service Users page.
- PLCR-20 - Sessions are now terminated properly when logging out.
- PLCR-21 - For SSO implementations, user name changes in LDAP are now reflected in the TDP.
- PLCR-22 - TDP failed to create an Audit trail for re-enabling the system Audit trail. If an Audit trail is re-enabled, a record of it now displays in the Audit trail.
- PLCR-25 - (TDP v3.1.2) - SSO user name settings were adjusted so the SSO user name now displays on the customer development site.
- PLCR-43 - You can now properly configure the Tetra File-Log Agent in the cloud.
Deprecated Features
There are no deprecated features in the product at this time.
Known Issues
Click the arrows next to each section to review the known issue(s) for this release of the TDP software (If available, a recommended workaround is provided for the known issue):
Health Monitoring
-
DI-6132: When uploading a new file version with a new label, the previous file version's label remained and was added to the new file version resulting with two different labels.
Workaround: First, upload the file without modifying any attributes. Then, after the new version of the file has been uploaded, add the new label. For details on how to add a label using the UI see the Applying Metadata, Tags, and Labels topic -
DI-6140: When uploading a new file version, the new version re-writes file's extension using the previously uploaded file.
Workaround: The new file version must have the same name as the previous file. -
DI-6805: Pipeline trigger and scan conditions are inconsistent.
Workaround: When creating a pipeline trigger to search for part of a file path, you should use contains instead of has path or is.
This is because the pipeline’s trigger condition treats has path or is differently:- The trigger condition requires a full match; whereas the scan condition will find partial paths due to the current implementation of how it retrieves results from Elasticsearch (ES). For example (using this file:
/user1/fileabc/test.json
), the inconsistent behavior occurs when you enter has path or is for a partial match. - If you set the trigger condition to
File Path has path test.json
orFile Path is test.json
, and then upload the file, it will not trigger the pipeline (which is the correct behavior). However, if you scan for unprocessed files for this pipeline, then it will find the file (which is the inconsistent behavior). - If you want
/user1/fileabc/test.json
to trigger the pipeline through a partial match, then you should set the trigger condition toFile Path contains test.json
. This would match the scan results. - If you only want a fully qualified path, they you can enter
File Path has path /user1/fileabc/test.json
. This would also match the scan results.
- The trigger condition requires a full match; whereas the scan condition will find partial paths due to the current implementation of how it retrieves results from Elasticsearch (ES). For example (using this file:
-
DI-6822/DI-7013: The indicated number of files satisfying the File Browse filter differs from the number of files that display in the current folder (and subfolders) that match the filter. Additionally, the left facet Search displays old values and statistics. There is currently no context awareness between the filters or dependencies.
(No available workaround at this time) -
DI-7116: IDS type schemas field names overlap each other when you open several scheme sub-branches.
Workaround: Close some of the neighboring sub-branches to clearly read their IDS type schema field names. -
DI-7189: Not all of the related files display on the Workflow Details page or File Info page due to a file display limit currently set to 50.
Workaround: To review all of the related output files, navigate to the Pipeline Processing page.
Pipelines
-
PIPE-596: The state of pipelines (Active/Disabled) on the Health Monitoring page does not match the state on the Pipeline Design page.
(No available workaround at this time) -
PIPE-619: The process list of files with Pipeline [v1/pipeline/process/{pipelineId}] did not fail on a deleted file when it should have.
Workaround: Before you submit a list of files to process with a pipeline, review to makes sure that it does not contain any deleted files. -
PIPE-642: The output files list on the File Details Page displays outputs from another pipeline(s).
(No available workaround at this time) -
PIPE-736: [Noted in v3.2.0 and v3.1.6] The workflow processing fails if the pipeline name is greater than 256 characters.
Workaround: Ensure that the pipeline name contains less than 256 characters.
Platform
- PLCR-106: When a label is deleted or changed, and a metadata change occurs with a source type 'Unknown', then the audit trail at the label level does not include a before value for the labels.
Workaround: To ensure that label deletions or modifications are logged properly in the Audit log, you must modify or delete labels using a separate update from metadata or tag updates.
Miscellaneous
For Athena, document file size processing limits are:
- The recommended file size limit is approximately 1 GB for file processing. (Note that in testing, files that are approximately 2.4 GB have been processed successfully.)
For Empower files, file limits are:
- Approximately 2.4 GB for an uncompressed Empower RAW JSON file.
TetraScience is conducting tests to determine and establish upper limits for known use cases, and will provide details once testing is completed.
Other Release Notes
Select this link to see release notes for other components.
Updated about 1 year ago