This topic is part of the Empower Data Science Link (EDSL) Quick Start guide. If you need more details about searching and viewing files, the For More Information section at the bottom of this page.
Step 1 of this Quick Start Guide should be complete.
A Tetra Data Pipeline, (usually referred to as simply a "pipeline") is a way to configure a set of actions to happen automatically each time new data is ingested into the data lake. The EDSL product has one pipeline, specifically dedicated to one of the major EDSL requirements, which is to convert RAW data to IDS format. A pipeline consists of four parts:
- notification details
- finalization details and settings
When an event occurs, such as a file is ingested into a specific location in the data lake, active data pipelines determine whether that event matches their trigger conditions. If there’s a match, the protocol, which consists of processing steps and configuration information, starts the execution of a workflow. When processing is finished, output files are indexed according to a predefined schema and stored in the data lake. Notifications are also sent to the email addresses specified in the notification details. Files can be easily searched and filtered in the TDP or another tool like Tableau or TIBCO Spotfire. Files can also be sent to a data target like an external ELN or LIMS.
The following instructions provide a quick path for the steps needed to set up a pipeline. It is not intended to replace more detailed pipeline documentation that is linked at the bottom of this page.
If you have not done so already, log into the TetraScience Data Platform (TDP) and complete the following steps.
- Open a supported browser and go to the Tetra Data Platform site.
- Enter your username and password, then click Sign In. Once signed in, click the menu button in the upper left corner of the page and select Pipelines, then Pipeline Design.
- The Manage Pipelines Page appears.
To define trigger conditions, do the following.
- On the Manage Pipelines page, click the New Pipeline button. The New Pipeline page appears.
- Select the trigger source type from the drop-down menu. We suggest that you choose File Category, select IS, then select RAW. This will allow you to see the processing of RAW instrument files from the demo database when the Agent sends them to the data lake. (This will happen in Step 4 of the Quick Start guide instructions.)
- Click the Next button.
For EDSL, there is one pipeline available: Empower RAW file to IDS. This pipeline converts the Empower RAW files to IDS format.
- In the Select Protocol section of the Managing Pipeline page, scroll down the list and select a protocol. You can also search by entering text in the Search field.
- Select Empower RAW file to IDS.
- Select the latest version of the pipeline.
- Click the Select this Protocol button.
- Click the Next button.
After you have defined your trigger and selected the protocol, set notification options. You can determine when to send notifications and who you want to send them to.
- In the Select Protocol section of the Managing Pipeline page, determine if you if you want to send an email when the pipeline completes successfully. If so, slide the Send on successful pipelines slider to the right.
If you want to get an email when the pipeline fails, slide the Send on failed pipelines slider to the right.
Add one or more emails that should receive notifications.
Consider either sending the emails to an account set up for this purpose or applying a filter for your emails, particularly if you want to be alerted for all successful processing. You can easily be sent 100s of emails in a mailbox depending on the number of files to be processed.
- Click Next.
After you have defined your trigger, selected the protocol, and set notifications, the last step is to provide details about the pipeline, such as its name, description, whether it should be active (enabled), and how many standby instances you want to include (if any).
To finalize the details, complete the following steps.
- In the Finalization Details section of the Managing Pipeline page, enter the name you want to give to the pipeline.
- Enter the pipeline description.
- Choose whether you want the pipeline to be available for processing. For this quick start, we are assuming that you want the pipeline to start running as soon as files that meet the trigger conditions are ingested in the data lake, move the Enabled slider to the right. Otherwise, leave it as is (slid to the left.)
- When complete, click the Create Pipeline button.
- If the pipeline has been enabled, it will start when the trigger conditions are met.
Note that there are other options that can set here. For more information a deeper dive into what was addressed on this page, see the following topics:
Updated 10 months ago