Monitor Files Health

Data consistency is crucial for data stored and accessed within the Tetra Data Platform (TDP). The files health dashboard on the Health Monitoring page helps you monitor and report on the health of your data files in the TDP.

Data is currently stored in the following locations:

  • Tetra Data Lake (Amazon Simple Storage Service (Amazon S3))
  • System Properties (FileInfo) Service
  • Search Indices (Elasticsearch)
  • Amazon Athena

As an administrator, you can use the files health dashboard to quickly identify any inconsistencies that exist across these services.

View the Files Health Dashboard

  1. Sign in to the TDP with an admin account.
  2. In the left navigation pane, choose the hamburger menu icon. Then, choose Health Monitoring. The Health Monitoring page appears.
  3. Select the Files tab. The files health dashboard appears.

Files Health Dashboard Contents

The files health dashboard has two sections:

  • Elasticsearch (ES) cluster health
  • File processing failures

ES Cluster Health

The ES CLUSTER HEALTH section at the top of the files health dashboard shows health status of the ES cluster as reported by the Amazon CloudWatch service. This section also shows the number of files processed for each of the TDP services.

Three potential statuses are indicated by the following colors:

  • Green—indicates that the ES cluster is healthy and no action is required.
  • Yellow—indicates that one or more of the replica shards on the ES cluster are not allocated to a node.
  • Red—indicates that at least one primary shard is not allocated to any node.

📘

Yellow or Red Status?

If your ES CLUSTER HEALTH status is yellow or red, please contact your customer success manager (CSM) and/or your AWS IT team.

ES Cluster Storage

To review free and used storage space details for the ES cluster, you can hover over the information icon next to the ES Cluster Health status.

Three potential available storage space amounts are indicated for the ES cluster by the following colors:

  • Green—indicates greater than 50% free storage space.
  • Yellow—indicates less than 30% to 50% free storage space.
  • Red— indicates less than 30% free storage space.

Number of Files Processed for Each TDP Service

The bottom of the ES Cluster Health section shows an overview of the number of files processed for each of the TDP services. Data in the overview table refreshes every four hours.

The criticality of any file inconsistency is indicated by the following colors:

  • Green—indicates no error.
  • Orange—indicates one to nine errors.
  • Red—indicates 10 or more errors.

If any file discrepancies exist, you can select the Files Reprocessing button to perform a system file cleanup and return to a consistent data state. For more information about how to reprocess files, see How to Reprocess Files.

📘

NOTE

Totals indicated in the number of files processed section don't include any files processed within the last four hours.

Processed Files Information

The number of files processed for each TDP service list provides the following information:

Column NameDescription
CategoryData Lake file categories. These categories include TOTAL, RAW, IDS, PROCESSED, and TMP (files or artifacts from a pipeline). By default, the TOTAL category is selected and its corresponding files display.
Last UpdateA time/date value that indicates when the statistics were last calculated.
Files UploadedThe number of files that were uploaded to the category. Only the latest version of a file (excluding those files marked for deletion) is uploaded.
DL to FileInfoThe number of files uploaded from the Tetra Data Lake (Amazon S3) to the FileInfo Service is listed in green. The percentage of file discrepancies is listed in red. The number of expected files is also listed.
FileInfo to ESThe number of files uploaded from the FileInfo Service to ES Indices is listed in green. The percentage of file discrepancies is listed in red. The number of expected files is also listed.
FileInfo to AthenaThe number of files uploaded from the FileInfo Service to Amazon Athena is listed in green. The percentage of file discrepancies is listed in red. The number of expected files is also listed.

File Processing Failures

The File Processing Failures section shows file event failures only. You can view general information about each file event failure in the list provided, or select a specific file to see the event's error details.

You can search for specific files or sort the list by doing the following:

  • To search for a specific file, enter a file name or file ID in the search field.
  • To filter the list by error code types, select an error code type from the Filter Error Codes drop-down list.
  • To sort the list by component type, select one of the following options that are listed next to Component Type at the top of the list: All, FileInfo, Athena, or Elasticsearch.

File Processing Failures Information

The File Processing Failures list provides the following information:

Column NameDescription
CATEGORYData Lake file categories. These categories include TOTAL, RAW, IDS, PROCESSED, and TMP (files or artifacts from a pipeline).
FILE NAMEName of the file. To see the file's path, hove over the file name. To copy the file, select the copy file icon next to the file name.
FILE IDUnique identifier of a file in TDP (primary key). To see the entire file ID, hover over the ID. To copy the file ID, select the copy file icon next to the ID.
FAILURE TIMEDate and time of the failed event.
COMPONENTSource of the file discrepancies.

File Processing Failure Details

To see the details of a specific file failure, select the file name from the File Processing Failures list. Additional rows appear that provide more information about the failure, the error message it returned, and the event's associated logs.

To see the logs for a specific event, select the arrow icon in the RAW EVENT column for that event.