Task Script Files
Task scripts are the building blocks of protocols. You must build and deploy your task scripts before you can deploy a protocol that uses them in a self-service Tetra Data pipeline (SSP).
Task scripts require the following files:
config.json
: Contains configuration information that exposes and makes your Python functions accessible so that protocols can use themmain.py
: Contains the code that’s used in file processing (Python is the only supported programming language)requirements.txt
: Specifies any required third-party Python modules
NOTE
You can also create an optional task script
README.md
file that provides additional contextual information about the script.
You can then use Python Poetry to create a Python package and the necessary files to deploy your task script to the Tetra Data Platform (TDP).
For instructions on how to create and deploy a custom task script, see Create and Deploy a Task Script in the “Hello, World!” SSP Example. For information about testing custom task scripts locally, see Create and Test Custom Task Scripts.
config.json
Files
config.json
FilesThe config.json
file contains configuration information that exposes and makes your Python functions accessible so that protocols can use them.
config.json
File Example
{
"language": "python",
"runtime": "python3.11",
"functions": [
{
"slug": "process_file",
"function": "main.process_file"
}
]
}
For each object in the functions
array, the slug
value is a name that you define to invoke the function from the protocol. It must be unique within this task script. It’s recommended that you make the slug
value the same as the Python function’s name.
The function
value is a reference to the Python function, including the module where it’s defined, separated by a dot (.
).
NOTE
You can choose which Python version a task script uses by specifying the
"runtime"
parameter in the script'sconfig.json
file. Python versions 3.7, 3.8, 3.9, 3.10, and 3.11 are supported currently. If you don't include a"runtime"
parameter, the script uses Python v3.7 by default.
main.py
Files
main.py
FilesThe main.py
Python file includes the code that’s used in file processing for your task script.
main.py
File Example
def process_file(input: dict, context: object):
"""
Logic:
1. Get input file length
2. Get offset from pipeline config
3. Write a text file to Data Lake
Args:
input (dict): input dict passed from master script
context (object): context object
Returns:
None
"""
print("Starting task")
input_data = context.read_file(input["inputFile"])
length = len(input_data["body"])
offset = int(input["offset"])
context.write_file(
content=f"length + offset is {length + offset}",
file_name="len.txt",
file_category="PROCESSED"
)
print("Task completed")
In this example main.py
file, the process_file
value is the entry point to the main business logic. There are two arguments passed in input
and context
.
The input value is defined in the protocol.yml file.
The input["inputFile"]
value is a reference to a file in the Tetra Data Lake.
The context
value provides the Context APIs that are required for the task script to interact with the TDP.
NOTE
The example
main.py
file shows the following workflow:
- The file is read by using the context.read_file function.
- The
offset
is returned from the input object.- A new file is written to the Data Lake by using the context.write_file function.
requirements.txt
Files (for Third-Party Python Modules Only)
requirements.txt
Files (for Third-Party Python Modules Only)If you’re using third-party Python modules in your Python scripts, you must create a requirements.txt
file. The requirements.txt
file must be placed in the root of your task script folder (at the same level as the config.json
file). This configuration makes it so that when you create the Python package for your task script, the required third-party Python packages are also installed.
To create a requirements.txt file, run either of the following commands in your local command line:
For poetry
poetry export --without-hashes --format=requirements.txt > requirements.txt
For pipenv
pipenv lock -r > requirements.txt
For pip
pip freeze > requirements.txt
requirements.txt
File Example
NOTE
For an example
requirements.txt
file, see the Python Packaging User Guide in the Python Packaging Authority (PyPA) documentation.
Updated 7 months ago