Create IDSs Programmatically

You can create your own Intermediate Data Schemas (IDSs) programmatically, and in a way that keeps data structures consistent across your organization's schemas, by using the Tetra Data Core Schemas Python library (package name: ts-ids-core).

Tetra Data Core Schemas are built on top of Pydantic, a Python package designed for data quality and verification. Accompanied with a growing set of domain-specific components (ts-ids-components), the package provides built-in rules that make sure the IDSs you create align with TetraScience data model best practices and ultimately allow you to create production-ready Self-Service Tetra Data Pipelines (SSPs) faster.

The ts-ids-core and ts-ids-components packages are privately hosted Python packages owned by TetraScience and published under the CC BY-NC-ND 4.0 license. To access the packages, contact your customer success manager (CSM).

Use Cases

You can use Tetra Data Core Schemas to do any of the following:

  • Create a Custom IDS: Define your own programmatic IDS by inheriting from one of the top-level IDS classes in the ts_ids_core.schema.ids_schema module:

    • IdsSchema: Contains the required metadata fields for each IDS (@idsNamespace, @idsType, @idsVersion, $id, and $schema) and provides the minimum required validation to make an IDS compliant with the Tetra Data Platform (TDP).
    • TetraDataSchema: Inherits from IdsSchema, providing all the functionality of the IdsSchema schema class, while also enforcing internal TetraScience modeling conventions.
  • Use Domain-Specific Components: ts-ids-components is home to domain-specific common components for use in programmatic IDSs. These components can be used in any IDS as a way to build upon existing data models. When multiple IDSs use the same common components, accessing data becomes more consistent by enabling reuse of search queries and application code.

  • Modify an Existing IDS: Migrate from an existing JSON Schema to a programmatic IDS definition by using ts-ids-core and the provided code generation tool. This tool is exposed as the CLI script import-schema.

Architecture

The following diagram shows an example workflow of how Tetra Data Core Schemas work with SSPs:

Tetra Data Core Schemas and SSP workflow

The diagram shows the following workflow:

  1. An IDS is programmatically created by using ts-ids-core and ts-ids-components, which is then applied to raw instrument data or report files and used to map vendor-specific information (like the name of a field) to vendor-agnostic information.
  2. Task scripts, which are the building blocks of protocols, contain the code for the business logic needed to process data. These task scripts are written in Python.
  3. Protocols are written in a YAML file (protocol.yml), which specifies configuration elements and outlines the execution order of task script functions.
  4. Task scripts and protocols are deployed to the TDP by using the TetraScience Software Development Kit (SDK) 2.0.
  5. Once these artifacts are on the TDP, you can create an SSP to automatically process ingested files by doing the following:
    • Specifying trigger conditions
    • Providing configuration values
    • Leveraging the protocol that you’ve created by using either commonly available task scripts or ones you’ve created yourself

Installation

Prerequisites

  • Familiarity with Git, Python, Pydantic, and SSPs

Request the Packages from TetraScience

TetraScience provides access to an authenticated private package feed (ts-pypi-external) to integrate into your own internal package source. For access, contact your CSM.

Documentation

For detailed instructions on how to use the Tetra Data Core Schemas Python library, run the following command on the ts-ids-core package: ts-ids docs