Protocol YAML Files

šŸš§

IMPORTANT

Starting with Tetra Data Platform (TDP) version 3.6.0 and the TetraScience Software Development Kit (SDK) 2.0 release, the protocol.yml file format replaces both the previous protocol.json and script.js file formats when creating self-service Tetra Data pipelines (SSPs).

Protocols define the business logic of your pipeline by specifying the steps and the functions within task scripts that run those steps. You can configure your own custom protocols for use in an SSP by creating a protocol.yml file.

šŸ“˜

NOTE

For instructions on how to create and deploy a custom protocol, see Create and Deploy a Protocol in the ā€œHello, World!ā€ SSP Example.

protocol.yml File Example

protocolSchema: "v3"
name: "Example Protocol"
description: "This is just to show a basic protocol.yml file"

config:
  idsFileCategory:
    label: "IDS File Category"
    description: "Optional File Category"
    type: "string"
    required: false
    default: "IDS"

steps:
  - id: raw-to-ids
    task:
      namespace: common
      slug: akta-raw-to-ids
      version: v3.1.6
      function: akta-raw-to-ids
    input:
      input_file_pointer: $( workflow.inputFile )
      ids_file_category: $( config.idsFileCategory )

Expressions

Instead of directly specifying everything in a protocol, you can use expressions to programmatically control the workflow at run time.

Expressions are specified by enclosing them in the following: $( ... )

Within an expression, you have access to several contexts that contain data and a limited list of functions. These contexts can be combined however you want within a single expression.

Expression Example

input_file_pointer: $( workflow.inputFile )

Contexts

The following contexts contain data and a limited list of functions that you can combine however you want within a single expression.

workflow Context

The workflow context contains data relevant to the current workflow.

workflow Context Examples

ValueContents
workflow.idCurrent workflow id
worfklow.inputFilePointer to the file that triggered the workflow
workflow.startedAtISO 8601-formatted string of the date and time when the workflow started
workflow.pipeline.idid of the pipeline that triggered the workflow

config Context

The config context contains data configured by the pipeline user.

config Context Example

config:
  foo:
    label: Foo
    type: string
  bar:
    label: Hey Bear
    type: object

šŸ“˜

NOTE

For this config context example, you could write $( config.foo ) or $( config.bar ), but $( config.baz ) would result in an error.

constants Context

Contains all the constant values defined in the protocol. If the constant contains an expression, it will be evaluated before access.

constants Context Example

constants:
  foo: "bar"
  baz:
    - 1
    - 2

šŸ“˜

NOTE

For this constants context example, you could write $( constants.foo ) or $( constants.baz ) , but $( constants.qux ) would result in an error.

protocol Context

Contains manifest fields. For example, a protocol context could contain $( protocol.name ) or $( protocol.version ). If the value is not specified in the protocol.yml file, it will return a null value.

protocol Context Examples

For example protocol context values, see the Manifest Fields section.

steps Context

Contains all of the step results. A step can be accessed in this context only once it has been run, or skipped. Steps can be referenced either by their id or by their index within the steps list.

For example, if the first step has id: stepOne, then its output could be accessed by any of the following:

  • steps[0].output
  • steps['stepOne'].output
  • steps.stepOne.output

steps Context Examples

ValueContents
steps.\<step_id>.output / steps[*].outputThe output returned from the task script in the referenced step
steps.\<step_id>.error / steps[*].errorErrors returned from the task script

Note: This value is returned only if the referenced task script has its continueOnError: value set to true.
steps.\<step_id>.status / steps[*].statusStatus of the referenced step, listed as one of the following values:

- success

- failed

- skipped
steps.\<step_id>.isSuccess / steps[*].isSuccessContains true if the step was successful, or false if the step failed or was skipped
steps.\<step_id>.isFailed / steps[*].isFailedContains true if the step failed, or false if the step was successful or was skipped
steps.\<step_id>.isSkipped / steps[*].isSkippedContains true if the step was skipped (using if), or false if the step was either successful or failed

Functions

You can use the following function: values in a protocol.yml file to alter data within expressions.

šŸš§

IMPORTANT

Any functionality beyond the ones listed in the following table should be added to a task script, not a protocol.

FunctionDescriptionExample
coalesce(...args)Returns the first value that isnā€™t null, or null if all values are nullcoalesce(null, null, "red", "blue") returns "red"
not(value)Returns the negation of the valuenot(true) returns false
isNumber(value)Returns true if the value is a numberisNumber(5) returns true

isNumber("5") returns false
isString(value)Returns true if the value is a stringisString(5) returns false

isString("5") returns true
isBoolean(value)Returns true if the value is a boolean data typeisBoolean(false) returns true

isBoolean("false") returns false
isNull(value)Returns true if the value is nullisNull(null) returns true

isNull(4) returns false

protocolSchema: ā€œv3ā€ (Required)

Type: String (must be ā€œv3")

The protocolSchema: "v3" property is required to indicate that this protocol is in the v3 format.

Manifest Fields

The following manifest fields are optional, top-level metadata fields.

Field NameTypeDescription
typeString (must be protocol)Artifact type
namespaceString (must be common or a string starting with client- or private-)Protocol namespace
versionString (must be in the following form: vMAJOR.MINOR.PATCH, where MAJOR, MINOR, and PATCH are all numbers)Protocol version number
slugStringProtocol slug/identifier
nameStringProtocol name
descriptionStringProtocol description
catalog_keysList of stringsUsed to look up the appropriate labels so that they can be added to the labels array in the protocolā€™s manifest.json file
labelsList of key-value pairs

Example
- name: LABEL_NAME value: LABEL_VALUE - name: LABEL_NAME2 value: LABEL_VALUE2
Used to annotate files and trigger pipelines without changing any file data

For more information, see Labels.

config

Type: Object

The config property is an object that maps a configuration ID to for a config object. Each config ID is unique so that it can be referenced in the steps property.

šŸš§

IMPORTANT

Config objects canā€™t contain expressions.

config Property Example

config:
  numberConfig:
    label: "Enter a Number"
    description: "This number will be used by the task scripts"
    type: number
    default: 9000
  secretExample:
    label: "ELN Password"
    description: "The password to push data to some ELN"
    type: secret

config.<config_id>.label (Required)

Type: String

The config label thatā€™s rendered in the TDPā€™s Pipeline Manager page.

config.<config_id>.type (Required)

Type: String

The config type determines how the TDP UI renders the configuration element and how it is passed into the workflow.

The config.<config_id>.type value must be one of the following:

  • "number"
  • "string"
  • "text"
  • "boolean"
  • "secret"
  • "object"

config.<config_id>.required (Required)

Type: Boolean

Determines if the config is required or not. This requirement is reflected in the TDP UI and is also checked at workflow run time. If not set, this defaults to false.

.config.<config_id>.description

Type: String

A short description of the config property. This description is whatā€™s rendered on the Pipeline Manager page in the TDP.

config.<config_id>.default

Type: String

The default config property. This default property is used if the config property isnā€™t configured in the TDP UI. This property should contain a value whose type matches the config's type.

Constants

Type: Object

The constants property is an object that maps a constant ID to any value. This property allows complex protocols to define common values once, instead of having them listed multiple times throughout the protocol.yml file.

Each constant ID is unique so that it can be referenced in the steps property, or from other constants. Constants can contain expressions. However, those expressions don't have access to the steps context, because no steps have run when constants are evaluated.

šŸš§

IMPORTANT

Circular dependencies arenā€™t allowed and are detected by the TDP automatically.

constants Property Example

constants:
  # a number
  REMOTE_PORT: 8080
  # a string
  REMORE_URL: "http://example.com"
  # a list
  VALID_VALUES:
    - 80
    - 443
    - $( constants.REMOTE_PORT )
  # an object
  CONFIG_OBJECT:
    url: $( constants.REMORE_URL )
    ports: $( constants.VALID_VALUES )

Steps (Required)

Type: List

The steps property defines the tasks that the protocol runs and their inputs. The steps property contains a list of step objects. Each step object runs a task script.

steps Property Example

steps:
  - id: stepOne
    task:
      namespace: common
      slug: raw-to-ids
      version: v1.0.0
      function: main
    input:
      input_file_pointer: $( workflow.inputFile )
  - task:
      namespace: common
      slug: push-to-eln
      version: v2.1.0
      function: "push-it"
    input:
      ids_file: $( steps.stepOne.output.ids_file )
    options:
      timeoutInSec: $( config.pushToElnTimeout )

steps[*].id

Type: String

Defines a unique identifier for the step. The allows the step object to be referenced by its id in later steps. If no id is specified for a step, then later steps must refer to it by its index.

steps[*].if

Type: Boolean or expression

If this property is set, then the step will run only if the value is true, or if the expression evaluates to true. If the steps[*].if property isnā€™t set, then the step always runs.

šŸ“˜

NOTE

You can use the steps[*].if property to conditionally run steps based on the output of a previous step or a configuration value.

steps[*].continueOnError

Type: Boolean or expression

If this property is set and the task fails, then the protocol continues to run only if the value is true, or the expression evaluates to true. If the steps[*].continueOnError property isnā€™t set, then the protocol always ends in failure if the task fails.

šŸ“˜

NOTE

You can use the steps[*].continueOnError property to handle errors generated by failing task scripts. For example, you can use this property to programmatically run a cleanup task after a failed task.

steps[*].description

Type: String

A short description of the step.

steps[*].task (Required)

Type: Task object

Defines the task that the step runs.

steps[*].task.namespace (Required)

Type: String

Defines the stepā€™s namespace. The value can be any one of the following: common, client, or private.

steps[*].task.slug (Required)

Type: String

Defines the stepā€™s task slug, such as akta-raw-to-ids.

steps[*].task.version (Required)

Type: String

Defines the stepā€™s task version number, such as v3.1.6.

steps[*].task.function (Required)

Type: String

Identifies the function that the task script runs. Task scripts can define multiple functions, so this property is required and must match one of the function slugs defined by the task script.

steps[*].input

Type: Any

The input value thatā€™s evaluated and passed directly into the task script function.

šŸ“˜

NOTE

The steps[*].input property can contain expressions.

steps[*].options

Type: Object

Specifies default options for the task script runner.

steps[*].options.memoryInMB

Type: Number or expression

Determines the default memory used to run this stepā€™s task.

šŸ“˜

NOTE

The steps[*].options.memoryInMB property can be overridden on the Pipeline Manager page in the TDP UI.

steps[*].options.timeoutInSec

Type: Number or expression

Determines how long before a task is considered failed, even if it hasn't completed.