Set Up and Edit Pipelines

The Pipeline Manager page displays all of your pipelines in one place so that you can view details about them, set them up, and edit them.

For more information about pipelines, see Tetra Data Pipeline Overview.

📘

NOTE

Only organization administrators can set up, edit, deactivate, and activate pipelines.

Access the Pipeline Manager

To open the Pipeline Manager page, do the following:

  1. Sign in to the TDP as an admin.
  2. In the left navigation pane, select the hamburger menu icon. Then, choose Pipelines and select Pipeline Manager. The Pipeline Manager page appears.

Pipeline Manager Page Information

The Pipeline Manager page shows the following information in a list view:

Column NameDescription
PIPELINE NAMEShows the name of each pipeline
PROTOCOLShows the name and version number of the protocol each pipeline uses

Note: Icons underneath each protocol show the first letter of the step(s) that are used in the protocol. Hover over the step icon to see more information about each step.
LAST CONFIG UPDATEShows the number of days since the configuration was last updated

Filter the List of Pipelines and Adjust Ordering

You can filter and adjust the ordering of pipelines by doing any of the following.

Search for Pipelines by Pipeline Name or ID

To search for pipelines by a specific name or ID, do the following:

  1. Open the Pipeline Manager page.
  2. Enter the pipeline name or ID that you want to find in the upper left search field, labeled Search Pipeline name/id. As you enter the name or ID, pipelines that match the name or ID of what you enter appear.

Search for Pipelines by Protocol Namespace, Protocol Name, or Protocol Version

To search for pipelines by a specific protocol namespace, protocol name, or protocol version, do the following:

  1. On the Pipeline Manager page, select the Filters button. A list of filters appears.
  2. Select the drop-down list next to the filter that you want to apply. A list of your available filter choices appears. For example, if you choose Protocol Name, the protocols available in your TDP instance appear.
  3. Select the filter choices that you want. The protocol versions that are available depend on the protocol name that was chosen.
  4. Choose Apply Filters.

View Active and Inactive Pipelines

To filter the list of pipelines based on if their active (enabled) or inactive (disabled), select one of the following options from the top of the Pipeline Manager page:

  • All—shows both active and inactive pipelines
  • Enabled—shows active pipelines only
  • Disabled—shows inactive pipelines only

Adjust the Pipeline List Order

To reorder the list of pipelines, select the upper right ORDER BY drop-down list on the Pipeline Manager page. Then, select one of the following options based on how you want to order the list:

  • Last Update, New - Old
  • Last Update, Old - New
  • Name , A - Z
  • Name, Z - A
  • Create Date, New - Old
  • Create Date, Old - New
  • Protocol Name, A - Z
  • Protocol Name, Z - A

View More Information about a Pipeline

To see information about the steps in a pipeline protocol as well as details about the way the pipeline is set up and its workflow, you can do either of the following:

  • View step summaries by hovering over the PROTOCOL step icon.
  • View pipeline and workflow details by selecting a pipeline from the list.

View Step Summaries

Beneath each protocol listed on the Pipeline Manager page, there are lettered icons that show the protocol's steps. The letter corresponds to the first name of the step slug. Hovering over these lettered icons shows a summary of step information that includes the following information:

  • Step Slug (a unique name for the step)
  • Description
  • Task script details, including the following:
    • Script slug
    • Script namespace
    • Script version
  • Function Slug

For definitions of these terms and more information about how each of these elements work, see Self-Service Tetra Data Pipelines.

View Pipeline and Workflow Details

To view pipeline and workflow details for a specific pipeline, do the following:

  1. On the Pipeline Manager page, select the pipeline's name. A pane appears on the right that provides details about the pipeline and its workflow.
  2. To review the pipeline's details, select the Pipeline tab in the right pane.
    -or-
    To view the workflow details, select the Workflow tab in the right pane.

Set Up a Pipeline

To set up a Tetra Data Pipeline, you must do the following:

  • Step 1: Define Trigger Conditions
  • Step 2: Select the Protocol
  • Step 3: Set Notifications
  • Step 4: Finalize the Details and Settings

Step 1: Define Trigger Conditions

🚧

IMPORTANT

You can’t use a new attribute as a pipeline trigger until you add a file to the TDP that includes the attribute. For instructions on how to manually upload files, see Upload a New File or New Version of the File.

Trigger conditions indicate the criteria a file must meet for pipeline processing to begin. There are two types of trigger conditions:

  • Simple trigger conditions require files to meet just one condition to trigger the pipeline. For example, you can configure data files that have a specific label to trigger a pipeline.
  • Complex trigger conditions require files to meet several conditions before they trigger the pipeline. For example, you can require a file to have both a specific label and file path to trigger a pipeline. Complex trigger conditions can be combined by using standard Boolean operators (AND/OR) and can be nested.

To define your pipeline's trigger conditions, do the following:

  1. On the Pipeline Manager page, select the upper right New Pipeline button. The New Pipeline page appears.
  2. In the Define Trigger section, select a trigger type from the File Category drop-down list. For more information about each trigger type, see the following Trigger Types table.

📘

NOTE

Trigger types are divided into two categories: Platform Metadata and Custom Metadata. Platform Metadata types are available to all TDP users. Custom Metadata types are available to your organization only.

Trigger Types

Trigger TypeDescription
Source TypeThe instrument that generated the data

Note: The Source Type drop-down provides both a static list of common source types and any custom sources that you've created.
SourceThe source that loaded the data into the Tetra Data Lake (for example, specific Agents, API uploads, or box uploads)
PipelineThe pipeline used
IDSThe Intermediate Data Schema (IDS) used
IDS TypeThe type of IDS (for example, lcuv_empower)
File PathThe Tetra Data Lake file path
File CategoryThe file category, which can either be RAW (sourced directly from an instrument), IDS (harmonized JSON), or PROCESSED (auxiliary data extracted from a RAW file)
TagsTags available to the organization
Custom MetadataCustom defined metadata
LabelsCustom defined labels
  1. In the middle drop-down list, enter how you want the trigger to match the value you'll set for the trigger condition by selecting the relevant conditional operator (is or is not).
  2. In the Value field, enter the value for the trigger condition that you want or select an option from the drop-down list.

📘

NOTE

Using Pipeline is not verifies that there is a pipeline value and that it's not the value that is specified. This is FALSE for files that have not been processed through a pipeline.

  1. (Optional) To add another condition to your trigger, choose Add Field. Then, repeat steps 2-4. To configure the pipeline to run if the file meets both trigger conditions, select Matches All-AND from the top drop-down list. To configure the pipeline to run if the file meets at least one trigger condition, select Matches All-ANY from the top drop-down list.
  2. (Optional) To nest trigger conditions, choose Add Field Group. Then, repeat steps 2-6.
  3. Choose Next.

Step 2: Select and Configure the Protocol

After you've defined a trigger, select and configure the protocol. For more information about protocols and how they work with pipelines, see Tetra Data Pipeline Overview and Available Tetra Data Models.

To select and configure a protocol, do the following:

  1. In the Select Protocol section of the Pipeline Manager page, select a protocol from the list on the left. You can also search for a specific protocol by entering text in the upper left search field.
  2. In the Configuration section, enter the configuration options for the protocol, if there are any.
  3. (Optional) To see more information about the protocol the script, select the View Details button. A page that contains the protocol's protocol.json and script.js files appears. The protocol.json file defines the protocol. It provides a brief description of the steps run and the configurations. The script.js file shows the workflow.
  4. (Optional) To see more information about the protocol steps, choose README in the Steps section.
  5. (Optional) To configure custom memory settings for each step of the pipeline, select a memory option from from the Default Memory drop-down list in the Steps section. For more information, see the Memory and Compute Settings section of this procedure.
  6. Choose Select this Protocol.
  7. Choose Next.

Step 3: Set Notifications

To configure email notifications about successful and failed pipeline executions, do the following:

  1. In the Set Notifications section of the Pipeline Manager page, select one of the following toggles based on the type of notifications that you want to send:
    • Send on successful pipelines—sends an email after the pipeline runs successfully.
      -or-
    • Send on failed pipelines—sends an email after the pipeline fails.
  2. In the Add an e-mail address field, enter the email address that you want to send the notifications to. To add more than one address, select the Add an email address field that appears below the one you just entered and enter another email address.
  3. Choose Next.

📘

NOTE:

For maintenance purposes, make sure that you use a group alias for notification email addresses instead of individuals' emails.

Step 4: Finalize the Details

To configure the remaining details you need to finish creating your pipeline, enter the following information into the Finalize Details section of the Pipeline Manager page:

  1. For PIPELINE NAME, enter a name for the pipeline.
  2. For PIPELINE DESCRIPTION, enter a description of the pipeline.
  3. Do one of the following:
    • If you want the pipeline to be available for processing files, move the ENABLED toggle to the right. It appears blue when the pipeline is active.
      -or-
    • If you don't want the pipeline to be available for processing files yet, move the ENABLED toggle to the left. It appears gray when the pipeline is inactive.
  4. For MAX PARALLEL WORKFLOWS, it's typically a best practice to not change the default value (0). The default 0 value indicates that the pipeline allows an unlimited number of concurrent workflows, which helps with throughput. If one of the following situations applies, contact your customer success manager (CSM) to determine the correct MAX PARALLEL WORKFLOWS setting for your use case:
    • For pipelines that must process one file at a time only, with zero parallel processing, MAX PARALLEL WORKFLOWS can be set to 1.
    • For uploading or processing a high number of files at the same time when there are upstream systems that can't handle the required rate of parallel processing, MAX PARALLEL WORKFLOWS can be set to a low number.
    • For processing a high number of files with a long-running pipeline processess, MAX PARALLEL WORKFLOWS can be set to a low number.

      🚧

      IMPORTANT

      Changing the MAX PARALLEL WORKFLOWS setting from the default 0 value severely limits pipeline throughput and must be done in collaboration with your CSM.

  5. For PRIORITY, select the pipeline's priority level. Increasing or decreasing this value will adjust the slot assignment prioritization for that pipeline within the organization, which raises or lowers the likelihood that the pipeline's workflows are scheduled before another pipeline's. You can assign a priority number from 1-10, where 10 is the highest priority and 1 is the lowest. For example, a pipeline with a priority of 1 is less likely to have a workflow scheduled before a pipeline with a priority of 10. The default setting is 5. This setting is typically used to decrease the likelihood a pipeline is run before others to prevent it from constraining resources.
  6. (Optional) To override the pipeline's default retry behavior setting (Always retry 3 times), select another option from the RETRY BEHAVIOR drop-down list. For more information, see the following Retry Behavior Settings section of this topic.
  7. Choose Create.

📘

NOTE

If the ENABLED toggle is set to active when you choose Create, then the pipeline will run as soon as the configured trigger conditions are met.

Retry Behavior Settings

If a pipeline fails for any reason, the TDP automatically retries running the pipeline again up to three times before the pipeline fails. You can change this default retry behavior when you create or edit a pipeline on the Pipeline Management page by selecting one of the following retry settings.

📘

NOTE

You can also manually retry a failed pipeline by selecting the Retry button that displays next to the pipeline's workflow steps in the TDP.

Pipeline Retry Settings

SettingDescription
(Default setting) Always retry 3 timesIf the pipeline fails for any reason, the system retries pipeline processing up to three more times before the pipeline fails.

Note: Each pipeline retry will use double the memory of the previous attempt to run the pipeline, up to 120 GB. For more information, see the Memory and Compute table.
Always retry 3 times (After OOM Error Only)Pipeline processing is retried only if the failure is caused by an out-of-memory (OOM) error. The system then retries the pipeline up to three more times. Each subsequent retry must also fail because of an OOM error for the system to retry the pipeline again.

Note: Each pipeline retry will use double the memory of the previous attempt to run the pipeline, up to 120 GB. For more information, see the Memory and Compute table.
No RetryPipeline processing isn't automatically retried for any reason.

📘

NOTE

Each pipeline retry uses double the memory of the previous attempt to run the pipeline, up to 120 GB. Compute and CPU capacity will also increase based on the amount of memory used for each retry. This increase in memory and compute usage can increase the cost of processing files significantly. For more information, see the Memory and Compute Settings section of this procedure.

Memory and Compute Settings

When configuring a protocol, you can override the default memory setting with a custom memory setting for each step of a pipeline. To configure a custom memory setting for a pipeline step, select the Default Memory drop-down list in the Step section of the Pipeline Manager page.

The following table shows the available memory setting options and how much compute capacity each setting uses:

Memory SettingCompute
512 MB.25 vCPU
1 GB.25 vCPU
2 GB.5 vCPU
4 GB.5 vCPU
8 GB1 vCPU
16 GB2 vCPU
30 GB4 vCPU
60 GB8 vCPU
120 GB16 vCPU

Edit a Pipeline

To edit an existing pipeline, do the following:

  1. On the Pipeline Manager page, select the name of the pipeline that you want to edit. A pane appears on the right.
  2. In the right pane, under Pipeline Actions, choose Edit Pipeline. The Edit Pipeline page appears and displays the configuration name on the edit screen.
  3. Select the Edit button in the section of the pipeline that you want to edit. That section's configuration page opens.
  4. Modify the section. Then, choose Save to save the edits you've made to the section.
  5. Choose Save again to save the entire pipeline.

Copy a Pipeline

To copy an existing pipeline, do the following:

  1. On the Pipeline Manager page, select the name of the pipeline that you want to copy. A pane appears on the right.
  2. In the right pane, under Pipeline Actions, choose Copy Pipeline. A Copy Pipeline dialog appears.
  3. For New Pipeline name, enter the new pipeline's name. Then, choose Save. The new pipeline appears on the Pipeline Manager page list with a New icon next to it.

📘

NOTE

Copied pipelines are inactive (disabled) by default. To activate a copied pipeline, follow the instructions in the Activate or Deactivate Pipelines section of this topic.

Activate or Deactivate Pipelines

For a pipeline to run if a file meets the defined trigger condition, you must activate it. You can make any existing pipeline activate or inactive by doing the following:

  1. On the Pipeline Manager page, select the name of the pipeline that you want to activate. A pane appears on the right.
  2. In the right pane, under Pipeline Actions, choose Edit Pipeline. The Edit Pipeline page appears.
  3. In the Details section, choose the Edit button.
  4. Do one of the following, based on if you want to activate or deactivate the pipeline:
    • (To activate the pipeline) Move the ENABLED toggle to the right. It appears blue when the pipeline is active.
      -or-
    • (To deactivate the pipeline) Move the ENABLED toggle to the left. It appears gray when the pipeline is inactive.
  5. Choose Save to save the edits you've made to the section.
  6. Choose Save again to save the entire pipeline.