Task Script Files

Task scripts are the building blocks of protocols. You must build and deploy your task scripts before you can deploy a protocol that uses them in a self-service Tetra Data pipeline (SSP).

Task scripts require the following files:

  • config.json: Contains configuration information that exposes and makes your Python functions accessible so that protocols can use them
  • main.py: Contains the code that’s used in file processing (Python is the only supported programming language)
  • requirements.txt: Specifies any required third-party Python modules

📘

NOTE

You can also create an optional task script README.md file that provides additional contextual information about the script.

You can then use Python Poetry to create a Python package and the necessary files to deploy your task script to the Tetra Data Platform (TDP).

For instructions on how to create and deploy a custom task script, see Create and Deploy a Task Script in the “Hello, World!” SSP Example. For information about testing custom task scripts locally, see Create and Test Custom Task Scripts.

config.json Files

The config.json file contains configuration information that exposes and makes your Python functions accessible so that protocols can use them.

config.json File Example

{
  "language": "python",
  "runtime": "python3.11",
  "functions": [
    {
      "slug": "process_file",         
      "function": "main.process_file" 
    }
  ]
}

For each object in the functions array, the slug value is a name that you define to invoke the function from the protocol. It must be unique within this task script. It’s recommended that you make the slug value the same as the Python function’s name.

The function value is a reference to the Python function, including the module where it’s defined, separated by a dot (.).

📘

NOTE

You can choose which Python version a task script uses by specifying the "runtime" parameter in the script's config.json file. Python versions 3.7, 3.8, 3.9, 3.10, and 3.11 are supported currently. If you don't include a "runtime" parameter, the script uses Python v3.7 by default.

main.py Files

The main.py Python file includes the code that’s used in file processing for your task script.

main.py File Example

def process_file(input: dict, context: object):
  	"""
    Logic:
    1. Get input file length
    2. Get offset from pipeline config
    3. Write a text file to Data Lake

    Args:
        input (dict): input dict passed from master script
        context (object): context object

    Returns:
        None
    """
    print("Starting task")
    
    input_data = context.read_file(input["inputFile"])
    length = len(input_data["body"])
    offset = int(input["offset"])                      
    context.write_file(                                
        content=f"length + offset is {length + offset}",
        file_name="len.txt",
        file_category="PROCESSED"
    )
    
    print("Task completed")

In this example main.py file, the process_file value is the entry point to the main business logic. There are two arguments passed in input and context.

The input value is defined in the protocol.yml file.

The input["inputFile"] value is a reference to a file in the Tetra Data Lake.

The context value provides the Context APIs that are required for the task script to interact with the TDP.

📘

NOTE

The example main.py file shows the following workflow:

  1. The file is read by using the context.read_file function.
  2. The offset is returned from the input object.
  3. A new file is written to the Data Lake by using the context.write_file function.

requirements.txt Files (for Third-Party Python Modules Only)

If you’re using third-party Python modules in your Python scripts, you must create a requirements.txt file. The requirements.txt file must be placed in the root of your task script folder (at the same level as the config.json file). This configuration makes it so that when you create the Python package for your task script, the required third-party Python packages are also installed.

To create a requirements.txt file, run either of the following commands in your local command line:

For poetry

poetry export --without-hashes --format=requirements.txt > requirements.txt

For pipenv

pipenv lock -r > requirements.txt

For pip

pip freeze > requirements.txt

requirements.txt File Example

📘

NOTE

For an example requirements.txt file, see the Python Packaging User Guide in the Python Packaging Authority (PyPA) documentation.