How to Search Files in the Tetra Data Lake

📘

Tetra Data Platform (TDP) Versions

The following procedures apply to TDP versions 3.2.0 and higher. For earlier TDP versions, see (Version 3.1.x) Searching the Tetra Data Lake.

You can search for data in the Tetra Data Lake in the following ways.

TDP User Interface

  • The Search Files page in the Tetra Data Platform (TDP) user interface includes the following:
    • A List view with a search bar (searches can be saved as Collections and saved across an organization).
    • A Label & Advanced Filters dialog where you can configure a custom query using a query builder or enter a RAW EQL query.
    • A Browse view that provides a traditional tree-style view of your files as you would see in Windows Explorer or macOS Finder apps. (Shortcuts can be saved for frequently visited locations in this view.)
  • The SQL Search page helps you build and run SQL statements.

📘

NOTE

A new Basic Search page that's designed for scientific users is also available starting in TDP version 3.6.0. The Basic Search page is in beta release currently and may require changes in future TDP releases. This experience is a non-breaking change and is activated for customers on request only. For more information, see Basic Search (Beta Release).

Amazon Athena Queries

For more information, see Use SQL and Athena to Query Data.

Elasticsearch Query DSL Queries

For more information, see Search by Using Elasticsearch Query DSL.

TetraScience API

For more information, see the TetraScience API Documentation.

You can easily apply filters to create complex searches and fine-tune queries. The search feature in the TDP behaves similarly to a common website search. Each field you enter is analyzed as both keyword (default) and text (field.text). The default behavior for Search sorts results by relevance or score.

📘

NOTE

The TDP Search feature uses a search engine to help you look for specific datasets and corresponding files. To explore or analyze tabular data across one or more datasets, see Use SQL and Athena to Query Data.

What's Indexed for Search?

All of the data in Intermediate Data Schema (IDS) files is indexed and available through search.

The first 1 MB of the following RAW, unprocessed file types is also indexed for search:

Media (MIME) TypePossible File Extensions
text/plain["txt","text","conf","def","list","log","in","ini"]
application/json["json","map"]
text/csv["csv"]
application/xml["xml","xsl","xsd","rng"]

📘

NOTE

You can also search for RAW, unprocessed files by filename, file metadata (such as creation time) and attributes. The default 1 MB search indexing settings for RAW files can be adjusted on request.

How is Search Used?

You can use the TDP search feature to assist in the following:

  • View and sort search results by name, source type, or upload date
  • Browse files in specific folders and subfolders
  • Filter search results by entering text in the Search box
  • Save a query or grouping of results as a collection using the List view
  • Save file path locations as shortcuts using the Browse view
  • View files from specific file categories, sources, or pipelines
  • Upload, download, preview, and delete files
  • Open the file page to view its details
  • View JSON files
  • Add or edit attributes (metadata, tags, or labels)

Access the Search Files Page

To access the Search Files page, do the following:

  1. Sign in to the TDP.
  2. In the left navigation menu, choose Search Files. The Search Files page opens.

Search Files Page Actions

You can do any of the following on the Search Files page:

  • (In List view) Create, save, and manage search queries (grouping of results) as a collection to display at the top of the Search page
  • (In Browse view) Create, save, and manage file path locations as a shortcut to display at the top of the My Home page
  • Use quick filters where you can do the following:
    • List files by category (RAW, PROCESSED, or IDS), source, or pipeline.
    • Browse files by your organization's folders in the Tetra Lake.
  • Easily search files by entering any text in a Search box and view which filters have been applied.
  • Conduct advanced searching and file uploading using additional filters and features.
  • Review the file search results in a display area sorted by relevance (by default).

The following table provides a list of the Search Files page items and their descriptions:

Search ItemDescription
All Files buttonClick to display all of the available files. All files display as the default.
Save button- From List view, click to create and save a search query as a collection.
- From Browse view, click to save a file path location as a shortcut to reuse.
For details, see How to Save Collections and Shortcuts.
List buttonClick to display the files using a list format.
Browse buttonClick to browse files within your organization's folders and subfolders.
Search box with Search buttonEnter any term or field that you want to search, and then click Search. The Search feature filters and searches on terms similarly to a popular website's Search. To search and match for an exact phrase, enclose the text with "double quotes". These are possible search examples:

- MyOrgTestFile traceId:bad94687-5cf1-4a55-9454 category:IDS
- labels.value:name NOT labels.value:nameone
- (_exists:metdata.name:country) empowr
- fileId: abcdef-1589*

For basic examples, see Search Files Page: Search Examples.
For more examples and their results, see Search Query Examples.
HIghlightHighlights matches of terms in yellow. Note that turning this on might slow down your query.
Label & Advanced FiltersSelect to implement advanced search methods. You can search using basic filters, search on attributes, Data (IDS) filters, search IDS files by schema field, and search by RAW.
Upload FileClick to upload a file.
File CategoryIndicates the type of files to show: RAW, PROCESSED, or IDS.
Source TypeWhen you click List, the available file sources display based on File Category (RAW, PROCESSED, or IDS), Source Type, or Pipeline. You can expand/collapse these source types to view the files.

When you click on an item in the Source Type list, that selection is added to the Search string as an AND item. To remove the item from the Search string, click Clear.
My HomeShows the files in your folder. This is available when you click Browse.
Tetra LakeWhen you click Browse, you can search for files based on your organization's folders and subfolders.
NameDisplays the list of file names. Additionally, you can:
- Select the box next to Name to delete the selected files in bulk.
- Click on a file in the list to toggle the display of its summary details.
Source NameIndicates the source of the file, for example: log-file-watcher.
Upload DateIndicates the date and time when the file was last modified. You can click Upload Date to sort the files chronologically from earliest to latest, or vice-versa.
Show Match DetailsClick to toggle details about where any matched terms display in the file.

Perform a Search on the Search Files Page

To perform a search on the Search Files page, do the following:

  1. From the Search Files page, choose List. The file name, source type, and the date/time that each file was last modified displays.
  2. In the Search box, enter terms and fields that you want to search for. When running a search, keep in mind the following:
    • The Search feature filters and searches on terms similar to a common website search. You can enter both full-text queries (search all of the text associated with a file) and filtered queries.
    • Filters are case-insensitive and AND is the default Boolean operator.
    • To search and match for an exact phrase, enclose the text with "double quotes" (for example "lab123 experiment5").
    • Make sure that you don’t use wildcard searches.

📘

Fuzzy Search

A fuzzy search is done by means of a fuzzy matching program, which returns a list of results based on likely relevance even though search argument words and spellings may not exactly match. By adding a tilde (~) at the end of a search term, you can make a typo (maximum of two characters), and still return relevant results. However, make sure that you don't add a tilde (~) to the end of a keyword/value of a filter, or if the keyword it contains includes any of the following wildcard characters (*, ?, or !). If you do, the query fails.

IMPORTANT: Fuzzy searches don’t always return all possible related files. Incomplete results are sometimes returned when there is a high number of files indexed that have similar field values.

  1. Choose Search. Files that match the search criteria you entered display as results in the file list. Results are sorted by relevance instead of by upload date by default. However, you can organize the result set chronologically by sorting based on Upload Date. For example searches, see Search Files Page: Search Examples.

🚧

IMPORTANT

The following feature improvements are available for data that’s processed after the TDP v3.6.0 upgrade only. To apply these enhancements to historical data, customers must reindex the data by reconciling it, or contact their CSM for support. For more information, see Reprocess Files.

Improved Context Search Feature

An improved Context Search feature now displays results returned by the TDP’s Search bar for search terms and filters that don't require an exact match based on content in the primary (RAW) and schematized (IDS) versions of files. This new functionality allows for more powerful contextual search without metadata, tags, or labels. Now, when customers search for content found in an IDS file through the Search bar, the results show information from that IDS file and its related, source RAW file as well as any related IDS files.

Improved Broad Search Feature

An improved Broad Search feature provides customers the ability to enter a portion of a file path into either the TDP Search bar feature or TetraScience /searchEql endpoint search in the query_string to return results, rather than the entire file path. For example, if you were to search for part of a filename, such as lab123 experiment5, then a file with the following path would now also be returned in the search results: /lab123/instrumentB/user1_experiment5_20231212.dat.

Perform Additional Filtering

To perform additional filtering on the Search Files page, select a source or pipeline from the left filter pane.

When running filtered queries, keep in mind the following:

  • To avoid unnecessary scrolling, you can expand and collapse the Source Types and Pipelines from the side panel of the Search page.
  • You can search within the filter or facet values. For example, you can select Source Type and enter “humidity sensor”.
  • You can also create and save a search query as a collection. For more information, see How to Save Collections and Shortcuts.
  • To further filter your search results, you can select filters from the Label & Advanced Filters tabs.
  • To clear the existing search criteria you entered in the Search box, click Clear.

View a File Summary

To view a summary of the file details, click the file from the list of files.

This table describes the list of File Summary items:

FieldDescription
Related FilesLists:
- Input Files: Files from which the current file was derived. For example, for an IDS file, the RAW file would be the input file.
- Output Files: Files that this current file produced. For example, the IDS file typically produces a JSON file.
Date CreatedLists the date and time when the file was created.
Integration TypeLists the integration (for example, datapipeline) that was used to ingest the file into the Data Lake.
ProtocolLists the steps and configurations used to process data for the pipeline, if any. The protocol consists of two files: protocol.json and script.js. The protocol is the "heart" of the pipeline.
Task ScriptLists the task script related to this file, if any. Task scripts contain the code for the business logic needed to process the data.
SchemaLists the schema (structure of the data) related to this file, if any. If a schema exists, you can click View Schema to open the Data Schema page.
NamespaceLists the namespace for the schema.
VersionLists the version of the schema (for example, v3.0.0)
Metadata, Labels, TagsDisplays relevant metadata, labels, or tags, if any.

At the upper right corner of the File Summary panel, the following icons and the More option display:

  • Open File Page
  • Download File
  • View JSON Details
  • File Preview

View the File Details Page

To view additional file details, choose Open File Page for the selected file. The File Details page appears and displays two tabs:

  • File info—shows additional file details
  • File Journey—shows the event history for a file after it's been uploaded to the Tetra Data Lake, including pipeline processing events

📘

NOTE

For TDP v3.6.x, the File Journey tab displays Tetra File-Log Agent file events only. Incorporating events from other integration types is planned for a future TDP release. To view file events generated by the Tetra File-Log Agent outside of the TDP, see Monitor Events.

Link to a File in the TDP

To link to a specific file in the TDP outside of the platform, copy the File Details page's URL for the file. Then, use that URL to create a hyperlink.

📘

NOTE

To access a file in the TDP through an external link, users must still have the permissions required to access the file.

File Info

The following information is provided on the File info tab on the File Details page for each file in the Tetra Data Lake.

SectionDescription
FILE VERSIONS Lists the total number of file versions in chronological order with the most recent file displayed at the top of the list.
You can:
- Click a version to display its details in the File Details section.
- Hover over the file to displays its full name, date/time when it was uploaded, and its full ID.
- Copy the file ID to a clipboard.
FILE ACTIONSYou can:
- Click Download to download the file to your computer.
- Click View File Info Details to open a preview of the JSON file details.
- Click Add New Version to upload a new version of the file.
- Click Add Attributes to add or edit attributes (such as metadata, tags, and labels) to the file.
- Click Remove to remove the file and its subsequent versions. This action is only available for the most recent file version.
File InfoDisplays the following file details:
- VERSION—shows the file version number
- FILE NAME—shows the file's name
- FILE ID—shows the file's ID number from the Amazon Simple Storage Service (Amazon S3) bucket
- DATE CREATED—shows the date and time when the file was uploaded
- FILE PATH—shows the location of the file in the Amazon S3 bucket (S3 Object Key) in the following format: {orgSlug}/{sourceId}/{category}/{filePath\*}
Note: The filePath\* variable is determined by the TDP component that uploaded the file. For more information, see the documentation for the component that you’re using.
- ES INDEX TIME—shows the date and time the file was indexed in Amazon Elasticsearch
- ATHENA ADD TIME—shows the date and time the file was added to Amazon Athena
- SOURCE TYPE—shows where the file came from
- SIZE—shows the compressed file size. The actual file size is larger when you download it.
AttributesDisplays any associated file attributes such as: metadata, tags, and labels.
INPUT FILEShows files from which the current file was derived. For example, for an IDS file, the RAW file would be the input file.
OUTPUT FILESShows files that the current file produced. For example, the IDS file typically produces a JSON file. This also indicates the pipeline that processed the file, the file name, the date and time that the file was created, the IDS file name, and the TMP file.
View all Files and WorkflowsShows files and workflows that are related to the current file.

File Journey

📘

NOTE

For TDP v3.6.x, the File Journey tab displays Tetra File-Log Agent file events only. Incorporating events from other integration types is planned for a future TDP release.

The File Journey tab on the File Details page shows the event history for a file after it's been uploaded to the Tetra Data Lake, including pipeline processing events.

File Journey Event Types

The File Journey tab on the File Details page reports the following event types.

Event TypeDescription
Connector Event
Connector file detectedShows when the system has detected a file for upload through a configured connector
Connector file processingShows when a file uploaded through a configured connector is processing
Connector file successShows when a file is uploaded to the TDP through a configured connector successfully
Data Lake Events
File UploadedShows when a file is uploaded to the Data Lake
Registered in TDPShows when a file is registered in the TDP
Indexed for searchShows when a file is indexed in Elasticsearch
Indexed for SQLShows when a file is indexed in Amazon Athena
Pipeline Events
Pipeline TriggeredShows when a pipeline is triggered
Workflow FailedShows when an executed workflow that was triggered failed
Workflow CompletedShows when a workflow that was triggered completed
Workflow CancelledShows when a workflow was cancelled

View All Files and Workflows

On the File Details page, select the View All Files and Workflows button to display all of the current file's related files and workflows. The following All Files and Workflows page appears.

All Files and Workflows Page Section Overview

SectionDescription
CREATEDShows the date the file was created
KINDIndicates if a file is an input or output file
TYPEShows the file type
FILE NAMEShows the file's name
PIPELINEShows the pipeline that contains the workflow that produced the file
WORKFLOWShows the workflow ID of the workflow that produced the file
WORKFLOW STATUSIndicates if the workflow that created the file was successful or not

Download a File

To download a file to your computer, you can select either of the following:

  • The Download File icon from the File Summary page.
  • The Download Version File Action from the File Details page.

Bulk File Downloads (Downloading Several Files at Once)

Bulk download allows you to quickly retrieve files sent to TDP. Instead of downloading files one at a time or having to write a custom script to send several requests to our API, you can now select up to 100 files from search results to download to your computer or device. Typically files are downloaded to the default location set in your web browser. If you want to change that location, see the documentation for your web browser.

  1. In the search page, select the files that you want to download. You can select up to 100 files to download. Note that you can select files on different search result pages.
1070

Bulk Actions

  1. Select Bulk Actions from the top of the screen.
  2. Select Download ## Files. (## is replaced by the number of files that you selected. If you select more than 100 files - which is the limit – the number is grayed out.)
985

Bulk Actions (with Files Selected)

  1. A message appears asking if you are sure that you want to download the files. Note that the minimum size estimate of the download is displayed. Note that you’ll probably need more storage space for the files than is indicated by the minimum size estimate. This is because the minimum size estimate is the size of the stored compressed files: the actual size of all of the downloaded, uncompressed files is likely to be larger than that. Click Download Files to continue.
566

Download Files

  1. Your browser might prompt to you to allow the downloading of multiple files. The files are downloaded as separate files in your default download area on your computer or device. If you want to change this location or whether you will need to be prompted to download multiple files, see your web browser’s documentation. If the download fails, try the download again.

View JSON File Details

To preview the JSON file details, you can select either of the following:

  • The View JSON icon from the from the File Summary page.
  • The View File Info Details File Action from the File Details page.

From the JSON preview window, you can view details such as: total number of items, source type, when the file was created, the location of the file (bucket), source, category, and so on.

Add and Edit Attributes

To add or edit attributes (such as metadata, tags, and labels), you can do either of the following:

  • Choose More from the File Summary page, and then select Add/Edit Attributes.
  • Select the Add Attributes File Action from the File Details page.

Follow the instructions in this topic. When you have finished adding or editing attributes, click Save.

Upload a New File or New Version of the File

📘

NOTE

When a new file version is uploaded, the TDP copies all file metadata from the previous file version, including workflow data. The system then uses this information to relate the new file version to the workflow that created the previous file version. This process results in the new file version being displayed on the File Details page and showing that the new, uploaded file version was produced by a workflow.

To upload a new version of the file, you can select any of the following:

  • Upload File at the top right of the Search page.
  • More from the File Summary page, and then select Upload New Version.
  • The Add New Version File Action from the File Details page.
  1. For a brand new file, you must select a source type for the uploaded file. Each newly uploaded file needs to be attributed to a source type.

  2. You can add a new label, if desired. Labels are applied to an existing file without creating new versions. For details, see this topic. To add metadata or tags, click Advanced Fields. These fields create new file versions and trigger new workflows when modified. Be aware that the contents of these files may be versioned across edits.

  3. Click the file upload box to select a file to upload, or drag and drop the file into the box.

  4. To preview an uploaded file, click the Preview icon. Preview is available for these valid files and file types:

    • Images with file type: .png, .jpeg, .gif, or .bmp (including 360-degree images)
    • .pdf
    • .csv
    • .xlsx
    • .docx
    • Video with file type: .mp4 or .webm
    • Audio with file type .mp3

    If any valid file is greater than 50 MB, then a warning message displays indicating that the file size is too large to display in a preview.

  5. When complete, click Upload.

📘

File Size Limitation

The maximum file size you can upload through the TDP UI is 200 MB. To upload larger files, use the TDP API or a Tetra Agent or Connector.

Delete a File

To delete a file, you can do any of the following:

  • Select one or more file(s) from list of files on the Search Files page. Then, choose Bulk Actions followed byDelete Selected.

📘

NOTE

You can select all of the files on a single Search Files results page only (up to 20 at a time). To select more files, you must open each search results page individually.

  • Select More from the File Summary page, and then choose Delete.
  • Select the Remove File Action from the File Details page. This action is only available for the most recent file version.

To confirm that you want to delete the file(s) and any subsequent versions, choose OK.

-or-

To retrain the file(s) and cancel the delete action, choose Cancel.

📘

NOTE

When you delete a file version, keep in mind the following:

  • Deleting a file version is a soft delete.
  • The file version remains in the Data Lake.
  • The file version is still displayed in file details (with a URL for the ID).
  • The file version is still available through the TetraScience API (with a URL for the ID).
  • The file version isn't available through search or SQL queries.

Browse Files in Folders

To browse files in folders, do the following:

  1. From the Search page, click Browse. The Tetra Lake folders display as the source instead of the Source Type and Pipeline.
1309

Tetra Lake folders

  1. Select the folder to browse the Tetra Lake's folder hierarchy based on your organization. You can continue to select subfolders until the files you are searching for display in the results section of the page.

Your current file path location displays at the top of the page. Additionally, you can save file path searches and add as shortcuts to the top of the My Home page. To quickly return to your home directory (and your shortcuts), you can click My Home at the top of the folder list. For more details, see How to Save Collections and Shortcuts. Any files that you removed display under the Removed Sources section at the end of the folder list indicated with the red trash icon.

  1. To further filter your search results, you can select filters from the Label & Advanced Filters tabs, or manually create a search query.

🚧

IMPORTANT

Do not use the Edit Labels on <#> Searched Files action in Browse view. It will process all of your organization’s files that are in the Data Lake, not just the searched files. A fix for this issue is in development and testing and planned for TDP v4.0.0. List view on the Search Files page is unaffected by this defect. For more information, see Edit Labels in Bulk.