Set Up and Edit Pipelines
The Pipeline Manager page displays all of your pipelines in one place so that you can view details about them, set them up, and edit them.
For more information about pipelines, see Tetra Data Pipelines.
NOTE
Only organization administrators can set up, edit, deactivate, and activate pipelines.
Access the Pipeline Manager Page
To open the Pipeline Manager page, do the following:
- Sign in to the TDP as a user with an Org Admin role.
- In the left navigation menu, choose Pipelines.
- Choose Pipeline Manager. The Pipeline Manager page appears.
Pipeline Manager Page Information
The Pipeline Manager page shows the following information in a list view:
Column Name | Description |
---|---|
PIPELINE NAME | Shows the name of each pipeline |
PROTOCOL | Shows the name and version number of the protocol each pipeline uses Note: Icons underneath each protocol show the first letter of the step(s) that are used in the protocol. Hover over the step icon to see more information about each step. |
LAST CONFIG UPDATE | Shows the number of days since the configuration was last updated |
Filter the List of Pipelines and Adjust Ordering
You can filter and adjust the ordering of pipelines by doing any of the following.
Search for Pipelines by Pipeline Name or ID
To search for pipelines by a specific name or ID, do the following:
- Open the Pipeline Manager page.
- Enter the pipeline name or ID that you want to find in the upper left search field, labeled Search Pipeline name/id. As you enter the name or ID, pipelines that match the name or ID of what you enter appear.
Search for Pipelines by Protocol Namespace, Protocol Name, or Protocol Version
To search for pipelines by a specific protocol namespace, protocol name, or protocol version, do the following:
- Open the Pipeline Manager page. Then, select the Filters button. A list of filters appears.
- Select the drop-down next to the filter that you want to apply. A list of your available filter choices appears. For example, if you choose Protocol Name, the protocols available in your TDP instance appear.
- Select the filter choices that you want. The protocol versions that are available depend on the protocol name that was chosen.
- Choose Apply Filters.
View Active and Inactive Pipelines
To filter the list of pipelines based on if their active (enabled) or inactive (disabled), select one of the following options from the top of the Pipeline Manager page:
- All—shows both active and inactive pipelines
- Enabled—shows active pipelines only
- Disabled—shows inactive pipelines only
Adjust the Pipeline List Order
To reorder the list of pipelines, select the upper right ORDER BY drop-down list on the Pipeline Manager page. Then, select one of the following options based on how you want to order the list:
- Last Update, New - Old
- Last Update, Old - New
- Name , A - Z
- Name, Z - A
- Create Date, New - Old
- Create Date, Old - New
- Protocol Name, A - Z
- Protocol Name, Z - A
View More Information about a Pipeline
To see information about the steps in a pipeline protocol as well as details about the way the pipeline is set up and its workflow, you can do either of the following:
- View step summaries by hovering over the PROTOCOL step icon.
- View pipeline and workflow details by selecting a pipeline from the list.
View Step Summaries
Beneath each protocol listed on the Pipeline Manager page, there are lettered icons that show the protocol's steps. The letter corresponds to the first name of the step slug. Hovering over these lettered icons shows a summary of step information that includes the following information:
- Step Slug (a unique name for the step)
- Description
- Task script details, including the following:
- Script slug
- Script namespace
- Script version
- Function Slug
For definitions of these terms and more information about how each of these elements work, see Self-Service Tetra Data Pipelines.
View Pipeline and Workflow Details
To view pipeline and workflow details for a specific pipeline, do the following:
- Open the Pipeline Manager page. Then, select the pipeline's name. A pane appears on the right that provides details about the pipeline and its workflow.
- To review the pipeline's details, select the Pipeline tab in the right pane.
-or-
To view the workflow details, select the Workflow tab in the right pane.
Set Up a Pipeline
To set up a Tetra Data Pipeline, you must do the following:
- Step 1: Define Trigger Conditions
- Step 2: Select the Protocol
- Step 3: Set Notifications
- Step 4: Finalize the Details and Settings
Step 1: Define Trigger Conditions
IMPORTANT
Keep in mind the follwoing when configuring your pipeline's trigger conditions:
- Pipelines can run on the latest version of a file only. This behavior ensures that previous file versions don't overwrite the latest data. If a pipeline tries to process an outdated or deleted file version, the workflow errors out and the TDP now displays the following error message on the Workflow Details page:
"message":"file is outdated or deleted so not running workflow"
- You can’t use a new attribute as a pipeline trigger until you add a file to the TDP that includes the attribute. For instructions on how to manually upload files, see Upload a New File or New Version of the File.
Trigger conditions indicate the criteria a file must meet for pipeline processing to begin. There are two types of trigger conditions:
- Simple trigger conditions require files to meet just one condition to trigger the pipeline. For example, you can configure data files that have a specific label to trigger a pipeline.
- Complex trigger conditions require files to meet several conditions before they trigger the pipeline. For example, you can require a file to have both a specific label and file path to trigger a pipeline. Complex trigger conditions can be combined by using standard Boolean operators (AND/OR) and can be nested.
Trigger types are divided into two categories: Platform Metadata and Custom Metadata. Platform Metadata types are available to all TDP users. Custom Metadata types are available to your organization only.
To define your pipeline's trigger conditions, do the following:
- Open the Pipeline Manager page. Then, select the upper right New Pipeline button. The New Pipeline page appears.
- In the Define Trigger section, select a trigger type from the File Category drop-down list. For more information about each trigger type, see the following Trigger Types table.
NOTE
If a Critical: TDP service have failed to returned required data (Schemas) error appears when you select a Source Type trigger, close the Pipeline Manager page. Then, restart the pipeline creation process. The error won't appear again.
Trigger Types
Trigger Type | Description |
---|---|
Source Type | The instrument that generated the data Note: The Source Type drop-down provides both a static list of common source types and any custom sources that you've created. |
Source | The source that loaded the data into the Tetra Data Lake (for example, specific Agents, API uploads, or box uploads) |
Pipeline | The pipeline used |
IDS | The Intermediate Data Schema (IDS) used |
IDS Type | The type of IDS (for example, lcuv_empower) |
File Path | The Tetra Data Lake file path |
File Category | The file category, which can either be RAW (sourced directly from an instrument), IDS (harmonized JSON), or PROCESSED (auxiliary data extracted from a RAW file) |
Tags | Tags available to the organization |
Custom Metadata | Custom defined metadata |
Labels | Custom defined labels |
- In the middle drop-down list, enter how you want the trigger to match the value you'll set for the trigger condition by selecting the relevant conditional operator (is or is not).
- In the Value field, enter the value for the trigger condition that you want or select an option from the drop-down list.
NOTE
Using
Pipeline is not
verifies that there is a pipeline value and that it's not the value that is specified. This is FALSE for files that have not been processed through a pipeline.
- (Optional) To add another condition to your trigger, choose Add Field. Then, repeat steps 2-4. To configure the pipeline to run if the file meets both trigger conditions, select Matches All-AND from the top drop-down list. To configure the pipeline to run if the file meets at least one trigger condition, select Matches All-ANY from the top drop-down list.
- (Optional) To nest trigger conditions, choose Add Field Group. Then, repeat steps 2-6.
- Choose Next.
Step 2: Select and Configure the Protocol
After you've defined a trigger, select and configure the protocol. For more information about protocols and how they work with pipelines, see Tetra Data Pipeline Overview and Available Tetra Data Models.
To select and configure a protocol, do the following:
- In the Select Protocol section of the Pipeline Manager page, select a protocol from the list on the left. You can also search for a specific protocol by entering text in the upper left search field.
- In the Configuration section, enter the configuration options for the protocol, if there are any.
- (Optional) To see more information about the protocol the script, select the View Details button. A page that contains the protocol's
protocol.json
andscript.js
files appears. Theprotocol.json
file defines the protocol. It provides a brief description of the steps run and the configurations. Thescript.js
file shows the workflow. - (Optional) To see more information about the protocol steps, choose README in the Steps section.
- (Optional) To configure custom memory settings for each step of the pipeline, select a memory option from from the Default Memory drop-down list in the Steps section. For more information, see the Memory and Compute Settings section of this procedure.
- Choose Select this Protocol.
- Choose Next.
Step 3: Set Notifications
To configure email notifications about successful and failed pipeline executions, do the following:
- In the Set Notifications section of the Pipeline Manager page, select one of the following toggles based on the type of notifications that you want to send:
- Send on successful pipelines—sends an email after the pipeline runs successfully.
-or- - Send on failed pipelines—sends an email after the pipeline fails.
- Send on successful pipelines—sends an email after the pipeline runs successfully.
- In the Add an e-mail address field, enter the email address that you want to send the notifications to. To add more than one address, select the Add an email address field that appears below the one you just entered and enter another email address.
- Choose Next.
NOTE:
For maintenance purposes, make sure that you use a group alias for notification email addresses instead of individuals' emails.
Step 4: Finalize the Details
To configure the remaining details you need to finish creating your pipeline, enter the following information into the Finalize Details section of the Pipeline Manager page:
- For PIPELINE NAME, enter a name for the pipeline.
- For PIPELINE DESCRIPTION, enter a description of the pipeline.
- Do one of the following:
- If you want the pipeline to be available for processing files, move the ENABLED toggle to the right. It appears blue when the pipeline is active.
-or- - If you don't want the pipeline to be available for processing files yet, move the ENABLED toggle to the left. It appears gray when the pipeline is inactive.
- If you want the pipeline to be available for processing files, move the ENABLED toggle to the right. It appears blue when the pipeline is active.
- For MAX PARALLEL WORKFLOWS, it's typically a best practice to not change the default value (
0
). The default0
value indicates that the pipeline allows an unlimited number of concurrent workflows, which helps with throughput. If one of the following situations applies, contact your customer success manager (CSM) to determine the correct MAX PARALLEL WORKFLOWS setting for your use case:- For pipelines that must process one file at a time only, with zero parallel processing, MAX PARALLEL WORKFLOWS can be set to
1
. - For uploading or processing a high number of files at the same time when there are upstream systems that can't handle the required rate of parallel processing, MAX PARALLEL WORKFLOWS can be set to a low number.
- For processing a high number of files with a long-running pipeline processess, MAX PARALLEL WORKFLOWS can be set to a low number.
IMPORTANT
Changing the MAX PARALLEL WORKFLOWS setting from the default
0
value severely limits pipeline throughput and must be done in collaboration with your CSM.
- For pipelines that must process one file at a time only, with zero parallel processing, MAX PARALLEL WORKFLOWS can be set to
- For PRIORITY, select the pipeline's priority level. Increasing or decreasing this value will adjust the slot assignment prioritization for that pipeline within the organization, which raises or lowers the likelihood that the pipeline's workflows are scheduled before another pipeline's. You can assign a priority number from
1-10
, where10
is the highest priority and1
is the lowest. For example, a pipeline with a priority of1
is less likely to have a workflow scheduled before a pipeline with a priority of10
. The default setting is5
. This setting is typically used to decrease the likelihood a pipeline is run before others to prevent it from constraining resources. - (Optional) To override the pipeline's default retry behavior setting, select another option from the RETRY BEHAVIOR drop-down list. The default rety setting for pipelines that use the
protocol.yml
protocol definition file format is Exponential retry interval increasing (default). The default retry setting for all other pipelines is Always retry 3 times. For more information, see Retry Behavior Settings. - Choose Create.
NOTE
If the ENABLED toggle is set to active when you choose Create, then the pipeline will run as soon as the configured trigger conditions are met.
Retry Behavior Settings
If a pipeline fails for any reason, the TDP automatically retries running the pipeline again up to three times before the pipeline fails. You can change this default retry behavior when you create or edit a pipeline on the Pipeline Management page by selecting one of the following retry settings.
You can also manually retry a failed pipeline by selecting the Retry button that displays next to the pipeline's workflow steps in the TDP.
NOTE
Each pipeline retry uses double the memory of the previous attempt to run the pipeline, up to 120 GB. Compute and CPU capacity will also increase based on the amount of memory used for each retry. This increase in memory and compute usage can increase the cost of processing files significantly. For more information, see the Memory and Compute Settings section of this procedure.
Preconfigured Pipeline Retry Settings
Retry Setting | Description |
---|---|
Always retry 3 times | If the pipeline fails for any reason, the system retries pipeline processing up to three more times before the pipeline fails. \n \nNote: Each pipeline retry will use double the memory of the previous attempt to run the pipeline, up to 120 GB. For more information, see the Memory and Compute table. |
No Retry | Pipeline processing isn't automatically retried for any reason. |
Retry 3 times (after OOM error only) | Pipeline processing is retried only if the failure is caused by an out-of-memory (OOM) error. The system then retries the pipeline up to three more times. Each subsequent retry must also fail because of an OOM error for the system to retry the pipeline again. \n \nNote: Each pipeline retry will use double the memory of the previous attempt to run the pipeline, up to 120 GB. For more information, see the Memory and Compute table. |
Custom Pipeline Retry Settings
NOTE
Custom pipeline retry behavior settings are available for pipelines that use the
protocol.yml
protocol definition file format only. For more information, see Protocol YAML Files.
Retry Setting | Description |
---|---|
Exponential retry interval increasing (default) | Sets a specific BASE RETRY DELAY for the first retry attempt, and then doubles each time period for each following attempt exponentially. |
Constant retry interval | Sets a specific, custom time period (BASE RETRY DELAY) between each retry attempt. |
Linear retry interval increasing | Sets a specific BASE RETRY DELAY for the first retry attempt, and then adds that same time period to the delay for each following attempt. |
Memory and Compute Settings
When configuring a protocol, you can override the default memory setting with a custom memory setting for each step of a pipeline. To configure a custom memory setting for a pipeline step, select the Default Memory drop-down list in the Step section of the Pipeline Manager page.
The following table shows the available memory setting options and how much compute capacity each setting uses:
Memory Setting | Compute |
---|---|
512 MB | .25 vCPU |
1 GB | .25 vCPU |
2 GB | .5 vCPU |
4 GB | .5 vCPU |
8 GB | 1 vCPU |
16 GB | 2 vCPU |
30 GB | 4 vCPU |
60 GB | 8 vCPU |
120 GB | 16 vCPU |
Edit a Pipeline
To edit an existing pipeline, do the following:
- Open the Pipeline Manager page. Then, select the name of the pipeline that you want to edit. A pane appears on the right.
- In the right pane, under Pipeline Actions, choose Edit Pipeline. The Edit Pipeline page appears and displays the configuration name on the edit screen.
- Select the Edit button in the section of the pipeline that you want to edit. That section's configuration page opens.
- Modify the section. Then, choose Save to save the edits you've made to the section.
- Choose Save again to save the entire pipeline.
Copy a Pipeline
To copy an existing pipeline, do the following:
- Open the Pipeline Manager page. Then, select the name of the pipeline that you want to copy. A pane appears on the right.
- In the right pane, under Pipeline Actions, choose Copy Pipeline. A Copy Pipeline dialog appears.
- For New Pipeline name, enter the new pipeline's name. Then, choose Save. The new pipeline appears on the Pipeline Manager page list with a New icon next to it.
NOTE
Copied pipelines are inactive (disabled) by default. To activate a copied pipeline, follow the instructions in the Activate or Deactivate Pipelines section of this topic.
Import a Pipeline
To import a pipeline definition from another TDP environment (for example, when moving a pipeline from development to production), do the following:
- Open the Pipeline Manager page. Then, select the upper right Import Pipeline button.
- Select the pipeline that you want to import.
IMPORTANT
When importing a pipeline to a new environment, make sure that you update the pipeline's secrets configuration so that it can run in the environment that you're importing the pipeline to. For more information, see Context API.
Download a Pipeline
To download a pipeline definition to your local machine, do the following:
- Open the Pipeline Manager page. Then, select the name of the pipeline that you want to download. A pane appears on the right.
- In the right pane, under Pipeline Actions, choose Download Pipeline. The pipeline definition downloads to your local machine as a JSON file.
Activate or Deactivate Pipelines
For a pipeline to run if a file meets the defined trigger condition, you must activate it. You can make any existing pipeline activate or inactive by doing the following:
- Open the Pipeline Manager page. Then, select the name of the pipeline that you want to activate. A pane appears on the right.
- In the right pane, under Pipeline Actions, choose Edit Pipeline. The Edit Pipeline page appears.
- In the Details section, choose the Edit button.
- Do one of the following, based on if you want to activate or deactivate the pipeline:
- (To activate the pipeline) Move the ENABLED toggle to the right. It appears blue when the pipeline is active.
-or- - (To deactivate the pipeline) Move the ENABLED toggle to the left. It appears gray when the pipeline is inactive.
- (To activate the pipeline) Move the ENABLED toggle to the right. It appears blue when the pipeline is active.
- Choose Save to save the edits you've made to the section.
- Choose Save again to save the entire pipeline.
Updated 5 months ago