Search files within TetraScience Data Lake using Elasticsearch Query Language (EQL). Simply put the JSON in the request body. You can read more about the query JSON that Elasticsearch uses on their website and it is widely used within the community, thus you should have no problem to find examples and support online. If you have questions how to write your queries, feel free to reach out to TetraScience via
[email protected].

📘

Note 1: Pagination in ES queries

If you run into performance issues with huge responses (i.e. >5000 docs), you can try pagination in your query. For more notes on pagination, see the ElasticSearch docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html

{
  "from": 0,
  "size": 200,
  "query": {
    "match": {
      "data.run.project_id": "tetra-project"
    }
  }
}

📘

Note 2: Aggregations.

You can also run aggregations using the same API. By passing in something like the following you are able to count the number of files in the Data Lake with each distinct assay sub_types. To read more about how to construct Elasticsearch aggregations, refer to this documentation

{
  "aggs": {
    "assay_count": {
      "terms": {
        "field": "data.assay.sub_type"
      }
    }
  }
}

In order to write the queries, you will need to know Elasticsearch mapping of each data model (or Elasticsearch index). To query the Elasticsearch mapping, use the List schemas API.

Language
Authorization
Click Try It! to start a request and see the response here!