You can use the following API endpoints to interact with files in the Tetra Data Lake.
Endpoints
NOTE
You can use API endpoints individually or sequentially. For example, you can use the Search files via Elasticsearch Query Language endpoint to retrieve information about Intermediate Data Schema (IDS) files that match specific criteria. Then, use the Get File Information endpoint to retrieve information about the RAW file it was generated from.
Search
- Search files via Elasticsearch Query Language returns information about files that match the search query.
- Retrieve a File returns a specific file.
- Get File Information returns information about a specific file.
- Get File Versions returns information about all of the versions of a specific file.
- Retrieve Metadata and Tags of a File returns metadata and tags for a specific file.
- List Schemas returns information about one or more Intermediate Data Schemas (IDSs).
Upload
- Upload a File uploads a file to the Data Lake.
Delete
- Delete a File soft deletes a file.
Edit Attributes
- Add Labels (Post) adds a label to a file.
- Delete Labels (Delete) deletes labels from a file.
- Add Metadata and Tags to a File adds metadata and tags to a file.
- Update Metadata and Tags for a File replaces metadata and tags for a specific file.
File Types
The data lake has three file types: RAW, IDS, and PROCESSED.
RAW Files
RAW files are unprocessed files, such as the ones generated by instruments.
You can search for RAW files by filename, file metadata (such as creation time) and attributes.
IDS Files
IDS files are schemas that are applied to raw instrument data or report files. These schemas are used to map vendor-specific information (like the name of a field
) to vendor-agnostic information. To learn more, see Intermediate Data Schemas.
To search for an IDS and view its visualization and associated artifacts, see View Artifact Information.
You can also list or find schemas by using the List schemas endpoint, which returns the following information for each IDS:
- The associated JSON schema
- The associated OpenSearch mapping and also fields that are not indexed in OpenSearch
NOTE
If certain properties are included in
nonSearchableFields
, it means that those properties won't be returned in the/searchEql
API. You can get the properties included innonSearchableFields
back by using theretrieve
API.
PROCESSED Files
PROCESSED files are derived from a RAW file and generated from a Tetra Data Pipeline. For example, a processed file could be a .zip file that was unzipped into many smaller files, or image thumbnails from a large microscopy image.
OpenSearch File Schema
Every file is indexed in OpenSearch by using Query DSL, no matter what category the file is.
To view an example OpenSearch File Schema, see the 200
response example in the Search files via Elasticsearch Query Language endpoint.
NOTE
To retrieve a file and use it’s data, make sure that you use the Retrieve a File endpoint.
OpenSearch Mapping Rules
The following dynamic mapping rules are applied to achieve flexible and consistent data indexing. These mapping rules are applied because the content in each IDS file varies based on its specific data sources.
For more information about OpenSearch mapping, see Mapping in the OpenSearch documentation.
const mapping = {
dynamic_templates: [
{
// This dynamic mapping rule maps all "string" to "keyword"
// and create a field of type "text"
string2keyword: {
match_mapping_type: 'string',
mapping: {
type: 'keyword',
fields: {
text: {
type: 'text',
},
},
},
},
},
{
// OpenSearch will automatically detect integer match it to type "long"
// For non-integer, OpenSearch will automatically match it to type "double"
// this dynamic mapping rule maps all numerical value to "double"
long2double: {
match_mapping_type: 'long',
mapping: {
type: 'double',
},
},
},
{
// For custom fields, try to map them to double
// If it is a string, ignore malformed value and map them to keyword and text in the subfield
customField: {
path_match: '*.custom_field.*',
mapping: {
type: 'double',
ignore_malformed: true,
fields: {
keyword: {
type: 'keyword'
},
text: {
type: 'text'
}
}
}
}
},
],
}