Search by Using Elasticsearch Query DSL
Running direct Elasticsearch Query Domain Specific Language (DSL) searches in the TDP user interface can help you test your search syntax before applying it programmatically in another system.
There are two ways to run direct Query DSL queries on the Search Files page:
- The Label & Advanced Filters menu
- The Search bar
To run searches programmatically, see Search files via Elasticsearch Query Language in the TetraScience API documentation. For more information, see Elasticsearch Query DSL Best Practices and Query DSL in the Elasticsearch documentation.
Run a Query DSL Query by Using the Label & Advanced Filters Menu
To run a direct Query DSL query by using the Label & Advanced Filters menu, do the following:
- Open the Search Files page.
- Choose Label & Advanced Filters. A dialog appears that includes filter options as different tabs.
- Select the Raw EQL tab. The Search files via Elasticsearch Query Language endpoint (
/searchEql
) request displays. - Edit the query as needed for your use case. (See the Elasticsearch Query DSL Best Practices)
- (Optional) To run the query without validating the query structure, select the No validation check box.
- Choose Run EQL. The query runs and displays the Response.
IMPORTANT
To search for a specific file path use the
filePath
field. To search for a part of the file path, use theenhancedSearchContext.filePathParts
field. This field allows you to target specific tokens in the file path. Don’t use a wildcard prefix (*
). Queries that include wildcards aren’t as effective, take longer to run, and require more computing resources. For more information, see Wildcard Searches.
Run a Query DSL Query by Using the Search Bar
NOTE
The Search bar uses the Elasticsearch query_string query, which parses the input and splits text around operators. Each textual part is analyzed independently of each other.
To run a query using the Search bar, do the following:
- Open the Search Files page.
- In the Search bar, enter either parts of a file path (separated by spaces), or a complete file path. Don’t use wildcard searches.
- Choose Search. Files that match the search criteria you entered display as results in the file list. Results are sorted by relevance instead of by upload date by default.
NOTE
Search results that include Intermediate Data Schema (IDS) output files also return the associated RAW input files.
Elasticsearch Query DSL Best Practices
When creating Elasticsearch Query DSL queries, keep in mind the following:
- Search terms must be specified in lowercase, unless you want to target an exact match for a term. To target an exact match for a search term, place the term in quotes (for example,
"My Specifically Cased Phrase"
). - For programmatic use cases, make sure that you use pagination features and target only the information you need.
- To limit the amount of data returned from the data lake, it’s a best practice to do the following:
- Specify the following fields in your queries:
size
(determines the maximum number of hits to return) and_source
(determines the source fields to return) - Specify an
index
parameter in the/searchEql
endpoint's URL query string to determine one or more specific indexes to return results from (for more information, see Target Your Search byindex
)
- Specify the following fields in your queries:
Target Your Search by index
index
By default, each /searchEql
call searches all of an organization's Elasticsearch indexes, which can put a high compute load on the system. To help reduce this load and optimize your queries, it's recommended that you specify an index
URL query string parameter to determine the specific indexes that you want to return results from.
The index
URL query string parameter has four possible values that you can either list multiple times in each query or as a comma-separated string:
raw
targets all files indexed as RAW documentsprocessed
targets all files indexed as PROCESSED documentsids
targets all files indexed as IDS documents<idsName>
targets all files associated with a specific IDS<idsName:v.x.x>
targets all files associated with a specific IDS version (you can also use an underscore (_
) instead of a colon (:
) to separate the IDS name from its version number)
index
Field Example: Listed Multiple Times
POST https://{{dlsvc_host}}/v1/datalake/searchEql
?index=raw
&index=lcuv-empower:v12*
index
Field Example: Comma-Separated List
POST https://{{dlsvc_host}}/v1/datalake/searchEql
?index=raw,processed,lcuv-empower:v12*
NOTE
Wildcard (
*
) searches are supported but not recommended. For more information, see Wildcard Searches.
Targeting More than One index
index
Query DSL queries that target multiple indexes will not cause an HTTP 400 request failure when individual index type failures occur. The query will still return an HTTP 200 response. If an index
type failure occurs, it will appear in a _shards.failed
field within the response.
For example, consider if two targeted indexes include a fieldA
, but in index1
, it’s a boolean value, and in index2
, it’s a text value. If you write a valid Query DSL query that searches fieldA
with a text value, then the query will return an HTTP 200 response that includes information from index2
, but nothing from index1
. In this case, the response will also contain a reference to the failure to search index1
for that field within the _shards.failed
field.
Elasticsearch Query DSL Example
NOTE
This example query includes the following variables:
- Example files that have the following
filePath
value:chromeleon/ChromeleonLocal/MYUSER/HPLC_Run_20230817_myuser.seq
- Query parameters that specify that the file path must match the the following values:
hplc
,myuser
, and20230817
- Query parameters that specify that the files returned must be indexed in Elasticsearch with the following values:
raw
,processed
, andchromeleon-raw-to-ids:v6.0.0
GET /your_index/_search
{
"size": 100,
"_source": {
"includes": ["filePath", "fileId", "category", "createdAt", "labels"]
},
"query": {
"bool": {
"must": [
{ "match": { "enhancedSearchContext.filePathParts": "hplc" } },
{ "match": { "enhancedSearchContext.filePathParts": "myuser" } },
{ "match": { "enhancedSearchContext.filePathParts": "20230817" } }
]
}
}
}
Updated 6 months ago