EDSL Quick Start - Step 5: Searching and Viewing Processed Files



This topic is part of the Empower Data Science Link (EDSL) Quick Start guide. If you need more details about searching and viewing files, the For More Information section at the bottom of this page.


The ftopic explains how to search for and view raw and processed files.



Step 1, Step 2, Step 3, and Step 4 of this Quick Start guide should be complete.

You’ve set up your connector, set up your pipeline, set up the empower agent, and you’ve started the agent and monitored the processing.

Now, let’s learn how to view the results and end our quick start with an introduction to the IDS.

You can view the results by doing a couple of things. You can use the TDP and search for the file.
You can also use the API or use SQL to search for file information that is stored in a database table.

Once you find your files, referencing the Intermediate Data Schema (IDS) can help you understand the results better.

IDS, which was designed by TetraScience in collaboration with instrument manufacturers, scientists, and informatics teams from Life Sciences companies, is a schema that is applied to raw instrument data or generated report files. This schema is used to map vendor-specific information and formats (like the name of a field) to vendor-agnostic information and formats. The IDS standardizes the naming, data type (whether the field is a string, an integer or a date), data range (for example, something needs to be a positive number) and data hierarchy.

By doing this, the IDS harmonizes the different data sets in Life Science industry, such as instrument data, CRO assay data, and software data. This allows Life Sciences companies to consume the data in their applications, build searches and aggregations and feed the data into visualization/analysis software seamlessly because the IDS generated JSON files are predictable, consistent and vendor-agnostic.

The IDS captures as much information from the RAW files as is possible and available, such as:

  • time, the time that the data set is related to
  • system(s), the equipment used to produce the result and also software, firmware
  • user(s), the person who performed the experiment
  • sample(s), the sample used in the experiment
  • method(s), the experimental recipe and usually input parameters
  • run(s), a particular execution of the experiment
  • experiment(s), information about the experiment, id, name and etc
  • result(s), the measurement results
  • related_file(s), pointer(s) to related files. e.g. the raw experimental data sets in the vendor specific, and often proprietary, format.
  • datacubes, multi-dimensional data such as chromatogram, images, plate readings and etc.

TDP has a schema viewer that you can use to view the IDS.

There are instructions below for viewing file info as well as the IDS schema viewer. As always, if you want to dig deeper, there are links to other documentation as well.

Search for the RAW files in the TDP

To search for RAW files in the TDP, do the following.

  1. Click the main menu, then select the Search Files option.

Main Menu button

  1. Search for the files for the project you are working on. For a comprehensive discussion of how search works, see the https://developers.tetrascience.com/docs/search-ui page.

Viewing IDS Schema

  1. Select Data Schemas, then View IDS Types from the main menu.

Main menu options

  1. The View IDS Types page appears. It has three parts: a) a list of IDS Types, b) a Schema Viewer, and c) a Details section.
  2. To find the schema that you are interested in, either use the scrollbar to scroll through the list of IDS types or type a term in the search filter. For example, if you type "empower" you will see IDS types with the name "empower" included.

IDS Types page

  1. To view more information about the schema, use the drop down menu in the Details section of the window to view files. Particularly useful files to review are the README.md and expected.json.

View Table Data Using SQL

To view the data using SQL, complete the following steps.

  1. In the TDP, click the main menu and select SQL Search from the options.
  2. Find the table that you are interested in. For now, let's take a look at lcuv_empower_datacubes_data. Right-click the menu next to the name of the table.
  3. Choose "Select First 100 Rows".
  4. The SQL query begins to run. Scroll to the top of the page so that you can see the SQL query results.

For More Information

If you want more information on the topics addressed here take a look at the following documentation.