Decorate Data by Using SSPs: Add Labels to Files
You can create self-service Tetra Data pipelines (SSPs) to add labels to files, which makes the files more discoverable through search. For example, you can use SSPs to programmatically add information about samples, experiment names, and laboratories.
This topic provides an example setup for adding labels to files by using an SSP.
Architecture
The following diagram shows an example SSP workflow for adding labels to files:
The diagram shows the following workflow:
- The “Hello, World!” SSP Example
sspdemo-taskscript
task script version is updated tov2.0.0
. Theconfig.json
file has two exposed functions:- print-hello-world (
main.print_hello_world
), which is theprint_hello_world
function found in themain.py
file. - decorate-input-file (
main.decorate_input_file
), which is thedecorate_input_file
function found in themain.py
file.
- print-hello-world (
- A protocol named
decorate (v1.0.0)
is created. Theprotocol.yml
file provides the protocol name, description, and a configuration item namedlabels_json
. It also outlines one step:decorate-input-file-step
. This step points to thesspdemo-taskscript
task script and the exposed function,decorate-input-file
. The inputs to this function are the input file that kicked off the pipeline workflow and thelabels_json
configuration item.
NOTE
For an example SSP folder structure, see SSP Folder Structure in the "Hello, World!" SSP Example.
Create and Deploy the Task Script
Task scripts are the building blocks of protocols, so you must build and deploy your task scripts before you can deploy a protocol that uses them.
Task scripts require the following:
- A
config.json
file that contains configuration information that exposes and makes your Python functions accessible so that protocols can use them. - A Python file that contains python functions (
main.py
in the following examples) that include the code that’s used in file processing. - A
requirements.txt
file that either specifies any required third-party Python modules, or that is left empty if no modules are needed.
To create and deploy a task script that decorates a file with labels, do the following.
NOTE
For more information about creating custom task scripts, see Task Script Files. For information about testing custom task scripts locally, see Create and Test Custom Task Scripts.
Create a config.json
File
config.json
FileCreate a config.json
file in your code editor by using the following code snippet:
{
"language": "python",
"runtime": "python3.11",
"functions": [
{
"slug": "print-hello-world",
"function": "main.print_hello_world"
},
{
"slug": "decorate-input-file",
"function": "main.decorate_input_file"
}
]
}
NOTE
You can choose which Python version a task script uses by specifying the
"runtime"
parameter in the script'sconfig.json
file. Python versions 3.7, 3.8, 3.9, 3.10, and 3.11 are supported currently. If you don't include a"runtime"
parameter, the script uses Python v3.7 by default.
Create a main.py
File
main.py
FileCreate a main.py
file in your code editor by using the following code snippet:
from ts_sdk.task.__task_script_runner import Context
def print_hello_world(input: dict, context: Context):
print("Hello World!")
return "Hello World!"
def decorate_input_file(input: dict, context: Context) -> dict:
print("Start 'decorate_input_file' function...")
input_file_pointer = input["input_file_pointer"]
file_name = context.get_file_name(input_file_pointer)
labels_json = input["labels_json"]
added_labels = context.add_labels(
file=input_file_pointer,
labels=labels_json,
)
print("'decorate_input_file' completed")
return input_file_pointer
Context API
In the Python code provided in this example setup, the Context API is used by importing it in the main.py
file (from ts_sdk.task.__task_script_runner import Context
). The Context
section provides the necessary APIs for the task script to interact with the TDP.
This example setup uses the following Context API endpoints:
- context.get_file_name: Retrieves the filename of a file that isn't downloaded locally
- context.add_labels: Adds labels to a file
File Pointers
File pointers are dictionaries containing the file location information stored in the TDP. File pointers are used throughout task scripts as Python function inputs/outputs and as inputs/outputs to Context API functions.
In the Python code provided in this example setup, one of the inputs to the decorate_input_file
function is a file pointer. After decorate
, the return value is the same file pointer.
File Pointer Dictionary Example
{
"type": "s3file",
"bucket": "datalake",
"fileKey": "<AWS S3 path/to/file>",
"version": "<AWS S3 file version ID>"
}
Create a Python Package
Within the task script folder that contains the config.json
and main.py
files, use Python Poetry to create a Python package and the necessary files to deploy them to the TDP.
Poetry Command Example to Create a Python Package
poetry init
[import packages with "poetry add"]
poetry export --without-hashes --format=requirements.txt > requirements.txt
NOTE
If no packages are added, this
poetry export
command example produces text inrequirements.txt
that you must delete to create an emptyrequirements.txt
file. Arequirements.txt
file is required to deploy the package to the TDP.
Deploy the Task Script
To the deploy the task script, run the following command from your command line (for example, bash):
ts-sdk put task-script private-{TDP ORG} sspdemo-taskscript v2.0.0 {task-script-folder} -c {auth-folder}/auth.json
NOTE
Make sure to replace
{TDP ORG}
with your organization slug,{task-script-folder}
with the local folder that contains your protocol code, and{auth-folder}
with the local folder that contains your authentication information.Also, when creating a new version of a task script and deploying it to the TDP, you must increase the version number. In this example command, the version is increased to
v2.0.0
.
Create and Deploy a Protocol
Protocols define the business logic of your pipeline by specifying the steps and the functions within task scripts that execute those steps. For more information about how to create a protocol, see Protocol YAML Files.
In the following example, there’s one step: decorate-input-file-step
. This step uses the decorate-input-file
function that’s in the sspdemo-taskscript
task script.
Create a protocol.yml
File
protocol.yml
FileCreate a protocol.yml
file in your code editor by using the following code snippet:
protocolSchema: "v3"
name: "Decorate - v3 protocol"
description: "Protocol that decorates file by adding labels."
config:
labels_json:
label: "Labels that can be added to file."
description: "A json of labels that can be added to a file"
type: "object"
required: false
steps:
- id: decorate-input-file-step
task:
namespace: private-training-sspdemo
slug: sspdemo-taskscript
version: v2.0.0
function: decorate-input-file
input:
input_file_pointer: $( workflow.inputFile )
labels_json: $( config.labels_json )
NOTE
When using a new task script version, you must use the new version number when we’re calling that task script in the protocol step. This example
protocol.yml
file refers tov2.0.0
.
Configuration Items
The config property item in protocol.yml
files provides the structure of the UI configuration element present when using this protocol in a pipeline created on the TDP. By using these elements, you can create sets of pipelines that are identical, except for a difference in a supplied value.
For example, you can create two pipelines that have different triggers, so they supply files with different sets of labels.
The configuration IDs (for example, labels_json
) must be used as protocol steps inputs (for example, $( config.labels_json )
). Then, extracted from the input dictionary within Python functions (for example, labels_json = input["labels_json"]
).
Deploy the Protocol
To the deploy the protocol, run the following command from your command line (for example, bash):
ts-sdk put protocol private-{TDP ORG} decorate v1.0.0 {protocol-folder} -c {auth-folder}/auth.json
NOTE
Make sure to replace
{TDP ORG}
with your organization slug,{protocol-folder}
with the local folder that contains your protocol code, and{auth-folder}
with the local folder that contains your authentication information.
NOTE
To redeploy the same version of your code, you must include the
-f
flag in your deployment command. This flag forces the code to overwrite the file. The following are example protocol deployment command examples:
ts-sdk put protocol private-xyz hello-world v1.0.0 ./protocol -f -c auth.json
ts-sdk put task-script private-xyz hello-world v1.0.0 ./task-script -f -c auth.json
For more details about the available arguments, run the following command:
ts-sdk put --help
Create a Pipeline That Uses the Deployed Protocol
To use your new protocol on the TDP, create a new pipeline that uses the protocol that you deployed. Then, upload a file that matches the pipeline’s trigger conditions.
For the configuration element in the UI configuration of your pipeline, you can use the following JSON code snippet to add labels:
[
{
"name": "test_label_name1",
"value": "test_value1"
},
{
"name": "test_label_name2",
"value": "test_value2"
}
]
Updated about 2 months ago