Health Monitoring App

The Health Monitoring App (previously known as the Superset Monitoring App) provides a comprehensive dashboard in the Tetra Data Platform (TDP) user interface to help you gain an end-to-end understanding of data downtime. The dashboard provides a set of metrics for ingestion failures and latency for on-premises Tetra File-Log Agents, Tetra Empower Agents, Tetra Chromeleon Agents and Tetra Data Pipelines.

This guide explains how to activate and use the app to monitor the health and performance of your data integrations.

Prerequisites

For Customer-Hosted TDP Environments Only

  • The TDP's Transport Layer Security (TLS) certificate must validate the following endpoint: *.data-apps.tdp-hostname.com
  • The Domain Name Server (DNS) zone for tdp-hostname.com must have a CNAME record routing *.data-apps.tdp-hostname.com to tdp-hostname.com

Activate the Health Monitoring App

To activate the Health Monitoring App in your TDP environment, contact your customer success manager (CSM) or account executive.

Activation Considerations

When activating the Health Monitoring App, keep in mind the following:

  • After being activated, the app may return a 500 internal server error as it establishes a new user database. To resolve the issue, log out of the TDP, close your web browser, and then reopen the browser and sign back in to the TDP.
  • The app might also display a Workgroup Not Found error when you open it the first time. To resolve the issue, you must rotate your organization-level SQL Credentials. Keep in mind, the new set of generated credentials will cause the previous set of credentials to stop working.

Access the Health Monitoring App Dashboard

To access the Health Monitoring App dashboard, do the following:

  1. Sign in to the TDP.
  2. In the left navigation menu, choose Health Monitoring. The Health Monitoring App dashboard loads, displaying an overview of your TDP integrations. If the app is activated in your account, NEW UI | WELCOME TO OUR NEW HEALTH MONITORING EXPERIENCE displays at the top of the page.

📘

NOTE

If you don't see the Health Monitoring App dashboard, verify that your TDP administrator has activated the app for your account.

The Health Monitoring App dashboard is organized into five tabs, each focusing on different aspects of your TDP integrations:

Each tab provides a set of visualizations, filters, and tables to help you monitor and troubleshoot your TDP integrations.

FLA Tab

The FLA tab provides metrics specific to Tetra File-Log Agents.

FLA Tab Metrics

The FLA tab displays the following metrics:

Performance MetricMetric TypeDescription
Path Scan FailuresKPINumber of scans failed from all started path scans
Path Scan Failures Over TimeChartNumber of path scans that failed from all path scans that started over a specific time range
Top 10 Paths by Path Scan ErrorsTableThe 10 scan paths that had the highest File Scan error rate
File Ingestion Failure PercentKPIPercent of files that failed to appear in File Search from all started path scans
File Ingestion Failure Over TimeChartPercentage of scanned files that failed to appear in search over a specific time range
Top 10 File Upload ErrorsTableThe 10 scan paths that had the highest File Upload error rate
Average Path Scan DurationKPIAverage time it took for all succeeded and failed path scans to complete
Path Scan Duration Over TimeChartAverage time it took for all succeeded and failed path scans to complete over a specific time range
Top 10 Paths by Longest Path Scan DurationTableThe 10 scan paths that took the longest to complete each scan
Average File Ingestion DurationKPIAverage time it took for files to be scanned and then become available through search in the TDP
File Ingestion Duration Over TimeChartAverage time it took for scanned files to appear in search over a specific time range
Top 10 Files by Longest File Ingestion DurationTableThe 10 files that took the longest to become available through search in the TDP after being scanned
File Ingestion JourneyChartThe current number of files in each stage of the file ingestion journey
Path Status (Latest Scan)TableShows the scan time and status for the latest scan on each scan path
File StatusTableShows key events in the file ingestion journey for each ingested file

Available FLA Tab Filters

You can filter the FLA tab data using the following options:

  • Time Range: Select a predefined time range or set a custom date range
  • FLA Agent Name: Filter by specific FLA name
  • FLA Agent Id: Filter by specific FLA id
  • Path: Filter by specific scan path
  • File id: Filter by file id

Empower Tab

The Empower tab provides metrics specific to Tetra Empower Agents.

Empower Tab Metrics

The Empower tab displays the following metrics:

Performance MetricMetric TypeDescription
Injection Upload Failures For TodayKPIDaily count of failed Empower injection uploads due to generated, but failed injection uploads
Injection Upload Failures Over TimeTableList of failed Empower injection upload attempts within a specified time range, including the project path for each attempt.
Connection Failures For TodayKPIDaily count of failed Empower (Server) Connection Status Checks, if checks are enabled in the Empower Agent
Connection Failures Over TimeTableList of Empower (Server) Connection failed attempts within a specified time range
Injection Generation Failures For TodayKPIDaily count of Empower injections that failed to generate in the Empower Agent
Injection Generation Failures Over TimeTableList of Empower injections that failed to generate within a specified time range, including the project path for each attempt
Project Scan Failures For TodayKPIDaily count of Empower Projects that failed to scan
Project Scan Failures Over TimeTableList of Empower Projects that failed to scan within a specified time range
Project Archive Failures For TodayKPIDaily count of Empower Projects that failed to archive
Project Archive Failures Over TimeTableList of Empower Projects that failed to archive within a specified time range
Project Archived For TodayKPIDaily count of Empower Projects successfully archived
Project Archived Over TimeTableList of Empower Projects that successfully archived within a specified time range
Hourly Injection Generation ThroughputTableNumber of injections generated each hour
Hourly Injection Generation LatencyTableLatency calculated on an hourly basis for all high-priority Empower projects
Average daily latency (in seconds) per projectTableAverage daily latency between injection upload and platform registration, for each Empower project, calculated in seconds
Latency per injection (in seconds) per projectTableLatency between injection upload and platform registration, in seconds, grouped by Empower project
Average Daily Latency (in seconds) Compared to Recent TrendTableLatency between injection upload and platform registration
Empower Files per dayKPINumber of Empower files processed per day
Active Projects per dayKPINumber of Empower project paths observed on each day
Empower Injections per dayKPINumber of Empower injections processed per day
Empower Latency - Scan vs GenTableEmpower latency broken out into generation, scan, and pipeline latencies
Injection SummaryTableThe number of failed, successful, and pending injections
Project SummaryTableThe number of projects split up by their priority level
Top Injection Uploads by Project PathTableThe most active projects that were high priority in the selected range, sorted by number of successful injections

Available Empower Tab Filters

You can filter the Empower tab data using the following options:

  • Time Range: Select a predefined time range or set a custom date range
  • Empower Agent Name: Filter by specific Empower Agent name
  • Empower Agent Id: Filter by specific Empower Agent id
  • Project Path: Filter by specific Empower project

Chromeleon Tab

The Chromeleon tab provides metrics specific to Tetra Chromeleon Agents.

Chromeleon Tab Metrics

The Chromeleon tab displays the following metrics:

Performance MetricMetric TypeDescription
Injection Upload Errors For TodayKPINumber of failed Chromeleon injection upload attempts where an injection is generated but failed to upload for Today
Injection Upload Errors Over TimeTableList of the most recent Chromeleon injection attempts that failed to upload over the specified time range in the filter, limit 1000
Injection Generation Errors For TodayKPINumber of Chromeleon injections that failed to generate in the Chromeleon Agent for Today
Injection Generation Failures Over TimeTableList of the most recent Chromeleon injections that failed to generate over the specified time range in the filter, limit 1000
Data Server Scan Errors For TodayKPINumber of Chromeleon Agent data server scan errors for Today
Data Server Scan Errors Over TimeTableList of the most recent Chromeleon Agent data server scan errors over the specified time range in the filter, limit 1000
Data Vault Scan Errors For TodayKPINumber of Chromeleon Agent data vault scan errors for Today
Data Vault Scan Errors Over TimeTableList of the most recent Chromeleon Agent data vault scan errors over the specified time range in the filter, limit 1000
Sequence Scan Errors For TodayKPINumber of Chromeleon Agent sequence scan errors for Today
Sequence Scan Errors Over TimeTableList of the most recent Chromeleon Agent sequence scan errors over the specified time range in the filter, limit 1000
Sequence Report Upload Errors For TodayKPINumber of Chromeleon Agent sequence report upload errors for Today
Sequence Report Upload Errors Over TimeTableList of the most recent Chromeleon Agent sequence report upload errors over the specified time range in the filter, limit 1000
Sequence Report Generation Errors For TodayKPINumber of Chromeleon Agent sequence report generation errors for Today
Sequence Report Generation Errors Over TimeTableList of the most recent Chromeleon Agent sequence report generation errors over the specified time range in the filter, limit 1000
Average Electronic Report Latency For TodayKPIAverage latency for Chromeleon Agent to generate Electronic reports for Today
Top Electronic Report LatenciesTableList of the highest latencies for the Chromeleon Agent to generate Electronic report over the specified time range in the filter, limit 1000
Average Template Report Latency For TodayKPIAverage latency for the Chromeleon Agent to generate Template reports for Today
Top Template Report LatenciesTableList of the highest latencies for the Chromeleon Agent to generate Template reports over the specified time range in the filter, limit 1000
Average Latency Detecting New Sequences For TodayKPIAverage latency to detect new sequences by the Chromeleon Agent for Today
Top Latencies Detecting New SequencesTableList of the highest latencies for new sequence detection by Chromeleon Agent, limit 1000
Average Latency Detecting Changes To Existing Sequences For TodayKPIAverage latency to detect changes to existing sequences by the Chromeleon Agent for Today
Top Latencies Detecting Changes To Existing SequencesTableList of the highest latencies for existing sequence detection by Chromeleon Agent over the specified time range, limit 1000

Available Chromeleon Tab Filters

You can filter the Chromeleon tab data using the following options:

  • Time Range: Select a predefined time range or set a custom date range
  • Chromeleon Agent Name: Filter by specific Chromeleon Agent name
  • Chromeleon Agent Id: Filter by specific Chromeleon Agent id
  • Data Vault Name: Filter by specific Chromeleon Data Vault name
  • Data Server Name: Filter by specific Chromeleon Data Server name
  • Report Type: Filter by specific Chromeleon Report Type

Pipelines Tab

The Pipelines tab provides metrics specific to Tetra Data Pipelines.

Pipeline Tab Metrics

The Pipelines tab displays the following metrics:

Performance MetricMetric TypeDescription
Workflow Failure PercentKPIPercent of pipeline workflows that failed from all of the workflows that ran
Top 10 Pipelines by Workflow FailuresTableThe 10 pipelines that had the highest workflow failure rate
Average Workflow Run DurationKPIAverage time it took for pipelines to run each successful workflow
Top 10 Pipelines by Longest Workflow DurationTableThe 10 pipelines that had the longest average workflow runtime duration
Workflow StatusTableShows the workflow status for each pipeline

Available Filters

You can filter the Pipelines tab data using the following options:

  • Time Range: Select a predefined time range or set a custom date range
  • Pipeline Id: Filter by specific pipeline id

Agent Connectivity Tab

The Agent Connectivity tab provides information about the connection status of all or your supported Agents (Tetra File-Log Agents and Tetra Empower Agents).

Available Metrics

The Agent Connectivity tab displays the following metrics:

Performance MetricMetric TypeDescription
Last Agent HeartbeatTableLast heartbeat received from each agent

Available Agent Connectivity Tab Filters

You can filter the Agent Connectivity tab data using the following options:

  • Agent Name: Filter by specific agent name
  • Agent Id: Filter by specific agent id

Limitations

The Health Monitoring App has the following limitations:

  • Metrics data isn't backfilled when the app is activated, so the available data spans one day in duration only when the new Health Monitoring dashboard first appears in customers' TDP environments.
  • If the available data spans one day in duration only (for example, when the new dashboard is first activated) then charts in the new Health Monitoring dashboard will appear empty. If customers hover their cursors over an empty chart, they will see a single data point, which is a summary statistic for that day. Once customers have two-days of data available, then the charts and tables will populate as normal.
  • The File Ingestion Journey chart on the FLA tab won't show any file search indexing events or other downstream file events if those events occur outside of the Time range filter defined by customers. This behavior occurs because all File Ingestion Journey events are a subset of the original files that were scanned during the specified time range.
  • There is a maximum number of table rows available for each of the following troubleshooting metrics:
    • Last Agent Heartbeat: 100 row maximum
    • File Status: 1,000 row maximum
    • Workflow Status: 1,000 row maximum
    • Path Status (Latest Scan): 1,000 row maximum

Troubleshooting

Dashboard Not Loading

If the Health Monitoring dashboard fails to load:

  1. Refresh your browser.
  2. Clear your browser cache.
  3. Ensure you have the necessary permissions to access the dashboard.
  4. Check if your TDP version meets the minimum requirements.
  5. Contact your TDP administrator if the issue persists.

Missing or Incomplete Data

If you notice missing or incomplete data in the dashboard:

  1. Check if the time range selection includes periods before the app was activated.
  2. Verify that your agents are online and sending heartbeats.
  3. Ensure that your integrations are properly configured.
  4. For customer-hosted environments, check that the required DNS and TLS configurations are in place.
  5. Contact TetraScience support if the issue persists.

Slow Dashboard Performance

If the dashboard is loading slowly:

  1. Try reducing the selected time range.
  2. Apply filters to limit the amount of data being processed.
  3. Close other browser tabs and applications.
  4. For large environments with many agents, consider using more specific filters.
  5. Check your network connection.

Error Messages

Common error messages and their resolutions:

  • "No data available": Ensure that your integrations are running and generating data.
  • "Failed to load data": Refresh the dashboard or check your network connection.
  • "Access denied": Contact your TDP administrator to verify your permissions.
  • "Service unavailable": The app service may be temporarily down; try again later.

For additional help or to report issues, submit a support ticket.