Health Monitoring App
The Health Monitoring App (previously known as the Superset Monitoring App) provides a comprehensive dashboard in the Tetra Data Platform (TDP) user interface to help you gain an end-to-end understanding of data downtime. The dashboard provides a set of metrics for ingestion failures and latency for on-premises Tetra File-Log Agents, Tetra Empower Agents, Tetra Chromeleon Agents and Tetra Data Pipelines.
This guide explains how to activate and use the app to monitor the health and performance of your data integrations.
Prerequisites
- TDP v4.3.0 or higher
For Customer-Hosted TDP Environments Only
- The TDP's Transport Layer Security (TLS) certificate must validate the following endpoint:
*.data-apps.tdp-hostname.com
- The Domain Name Server (DNS) zone for
tdp-hostname.com
must have a CNAME record routing*.data-apps.tdp-hostname.com
totdp-hostname.com
Activate the Health Monitoring App
To activate the Health Monitoring App in your TDP environment, contact your customer success manager (CSM) or account executive.
Activation Considerations
When activating the Health Monitoring App, keep in mind the following:
- After being activated, the app may return a
500
internal server error as it establishes a new user database. To resolve the issue, log out of the TDP, close your web browser, and then reopen the browser and sign back in to the TDP. - The app might also display a Workgroup Not Found error when you open it the first time. To resolve the issue, you must rotate your organization-level SQL Credentials. Keep in mind, the new set of generated credentials will cause the previous set of credentials to stop working.
Access the Health Monitoring App Dashboard
To access the Health Monitoring App dashboard, do the following:
- Sign in to the TDP.
- In the left navigation menu, choose Health Monitoring. The Health Monitoring App dashboard loads, displaying an overview of your TDP integrations. If the app is activated in your account, NEW UI | WELCOME TO OUR NEW HEALTH MONITORING EXPERIENCE displays at the top of the page.
NOTE
If you don't see the Health Monitoring App dashboard, verify that your TDP administrator has activated the app for your account.
The Health Monitoring App dashboard is organized into five tabs, each focusing on different aspects of your TDP integrations:
- FLA: provides metrics specific to Tetra File-Log Agents
- Empower: provides metrics specific to Tetra Empower Agents
- Chromeleon: provides metrics specific to Tetra Chromeleon Agents
- Pipelines: provides metrics specific to Tetra Data Pipelines
- Agent Connectivity: provides Agent heartbeat and connection status
Each tab provides a set of visualizations, filters, and tables to help you monitor and troubleshoot your TDP integrations.
FLA Tab
The FLA tab provides metrics specific to Tetra File-Log Agents.
FLA Tab Metrics
The FLA tab displays the following metrics:
Performance Metric | Metric Type | Description |
---|---|---|
Path Scan Failures | KPI | Number of scans failed from all started path scans |
Path Scan Failures Over Time | Chart | Number of path scans that failed from all path scans that started over a specific time range |
Top 10 Paths by Path Scan Errors | Table | The 10 scan paths that had the highest File Scan error rate |
File Ingestion Failure Percent | KPI | Percent of files that failed to appear in File Search from all started path scans |
File Ingestion Failure Over Time | Chart | Percentage of scanned files that failed to appear in search over a specific time range |
Top 10 File Upload Errors | Table | The 10 scan paths that had the highest File Upload error rate |
Average Path Scan Duration | KPI | Average time it took for all succeeded and failed path scans to complete |
Path Scan Duration Over Time | Chart | Average time it took for all succeeded and failed path scans to complete over a specific time range |
Top 10 Paths by Longest Path Scan Duration | Table | The 10 scan paths that took the longest to complete each scan |
Average File Ingestion Duration | KPI | Average time it took for files to be scanned and then become available through search in the TDP |
File Ingestion Duration Over Time | Chart | Average time it took for scanned files to appear in search over a specific time range |
Top 10 Files by Longest File Ingestion Duration | Table | The 10 files that took the longest to become available through search in the TDP after being scanned |
File Ingestion Journey | Chart | The current number of files in each stage of the file ingestion journey |
Path Status (Latest Scan) | Table | Shows the scan time and status for the latest scan on each scan path |
File Status | Table | Shows key events in the file ingestion journey for each ingested file |
Available FLA Tab Filters
You can filter the FLA tab data using the following options:
- Time Range: Select a predefined time range or set a custom date range
- FLA Agent Name: Filter by specific FLA name
- FLA Agent Id: Filter by specific FLA id
- Path: Filter by specific scan path
- File id: Filter by file id
Empower Tab
The Empower tab provides metrics specific to Tetra Empower Agents.
Empower Tab Metrics
The Empower tab displays the following metrics:
Performance Metric | Metric Type | Description |
---|---|---|
Injection Upload Failures For Today | KPI | Daily count of failed Empower injection uploads due to generated, but failed injection uploads |
Injection Upload Failures Over Time | Table | List of failed Empower injection upload attempts within a specified time range, including the project path for each attempt. |
Connection Failures For Today | KPI | Daily count of failed Empower (Server) Connection Status Checks, if checks are enabled in the Empower Agent |
Connection Failures Over Time | Table | List of Empower (Server) Connection failed attempts within a specified time range |
Injection Generation Failures For Today | KPI | Daily count of Empower injections that failed to generate in the Empower Agent |
Injection Generation Failures Over Time | Table | List of Empower injections that failed to generate within a specified time range, including the project path for each attempt |
Project Scan Failures For Today | KPI | Daily count of Empower Projects that failed to scan |
Project Scan Failures Over Time | Table | List of Empower Projects that failed to scan within a specified time range |
Project Archive Failures For Today | KPI | Daily count of Empower Projects that failed to archive |
Project Archive Failures Over Time | Table | List of Empower Projects that failed to archive within a specified time range |
Project Archived For Today | KPI | Daily count of Empower Projects successfully archived |
Project Archived Over Time | Table | List of Empower Projects that successfully archived within a specified time range |
Hourly Injection Generation Throughput | Table | Number of injections generated each hour |
Hourly Injection Generation Latency | Table | Latency calculated on an hourly basis for all high-priority Empower projects |
Average daily latency (in seconds) per project | Table | Average daily latency between injection upload and platform registration, for each Empower project, calculated in seconds |
Latency per injection (in seconds) per project | Table | Latency between injection upload and platform registration, in seconds, grouped by Empower project |
Average Daily Latency (in seconds) Compared to Recent Trend | Table | Latency between injection upload and platform registration |
Empower Files per day | KPI | Number of Empower files processed per day |
Active Projects per day | KPI | Number of Empower project paths observed on each day |
Empower Injections per day | KPI | Number of Empower injections processed per day |
Empower Latency - Scan vs Gen | Table | Empower latency broken out into generation, scan, and pipeline latencies |
Injection Summary | Table | The number of failed, successful, and pending injections |
Project Summary | Table | The number of projects split up by their priority level |
Top Injection Uploads by Project Path | Table | The most active projects that were high priority in the selected range, sorted by number of successful injections |
Available Empower Tab Filters
You can filter the Empower tab data using the following options:
- Time Range: Select a predefined time range or set a custom date range
- Empower Agent Name: Filter by specific Empower Agent name
- Empower Agent Id: Filter by specific Empower Agent id
- Project Path: Filter by specific Empower project
Chromeleon Tab
The Chromeleon tab provides metrics specific to Tetra Chromeleon Agents.
Chromeleon Tab Metrics
The Chromeleon tab displays the following metrics:
Performance Metric | Metric Type | Description |
---|---|---|
Injection Upload Errors For Today | KPI | Number of failed Chromeleon injection upload attempts where an injection is generated but failed to upload for Today |
Injection Upload Errors Over Time | Table | List of the most recent Chromeleon injection attempts that failed to upload over the specified time range in the filter, limit 1000 |
Injection Generation Errors For Today | KPI | Number of Chromeleon injections that failed to generate in the Chromeleon Agent for Today |
Injection Generation Failures Over Time | Table | List of the most recent Chromeleon injections that failed to generate over the specified time range in the filter, limit 1000 |
Data Server Scan Errors For Today | KPI | Number of Chromeleon Agent data server scan errors for Today |
Data Server Scan Errors Over Time | Table | List of the most recent Chromeleon Agent data server scan errors over the specified time range in the filter, limit 1000 |
Data Vault Scan Errors For Today | KPI | Number of Chromeleon Agent data vault scan errors for Today |
Data Vault Scan Errors Over Time | Table | List of the most recent Chromeleon Agent data vault scan errors over the specified time range in the filter, limit 1000 |
Sequence Scan Errors For Today | KPI | Number of Chromeleon Agent sequence scan errors for Today |
Sequence Scan Errors Over Time | Table | List of the most recent Chromeleon Agent sequence scan errors over the specified time range in the filter, limit 1000 |
Sequence Report Upload Errors For Today | KPI | Number of Chromeleon Agent sequence report upload errors for Today |
Sequence Report Upload Errors Over Time | Table | List of the most recent Chromeleon Agent sequence report upload errors over the specified time range in the filter, limit 1000 |
Sequence Report Generation Errors For Today | KPI | Number of Chromeleon Agent sequence report generation errors for Today |
Sequence Report Generation Errors Over Time | Table | List of the most recent Chromeleon Agent sequence report generation errors over the specified time range in the filter, limit 1000 |
Average Electronic Report Latency For Today | KPI | Average latency for Chromeleon Agent to generate Electronic reports for Today |
Top Electronic Report Latencies | Table | List of the highest latencies for the Chromeleon Agent to generate Electronic report over the specified time range in the filter, limit 1000 |
Average Template Report Latency For Today | KPI | Average latency for the Chromeleon Agent to generate Template reports for Today |
Top Template Report Latencies | Table | List of the highest latencies for the Chromeleon Agent to generate Template reports over the specified time range in the filter, limit 1000 |
Average Latency Detecting New Sequences For Today | KPI | Average latency to detect new sequences by the Chromeleon Agent for Today |
Top Latencies Detecting New Sequences | Table | List of the highest latencies for new sequence detection by Chromeleon Agent, limit 1000 |
Average Latency Detecting Changes To Existing Sequences For Today | KPI | Average latency to detect changes to existing sequences by the Chromeleon Agent for Today |
Top Latencies Detecting Changes To Existing Sequences | Table | List of the highest latencies for existing sequence detection by Chromeleon Agent over the specified time range, limit 1000 |
Available Chromeleon Tab Filters
You can filter the Chromeleon tab data using the following options:
- Time Range: Select a predefined time range or set a custom date range
- Chromeleon Agent Name: Filter by specific Chromeleon Agent name
- Chromeleon Agent Id: Filter by specific Chromeleon Agent id
- Data Vault Name: Filter by specific Chromeleon Data Vault name
- Data Server Name: Filter by specific Chromeleon Data Server name
- Report Type: Filter by specific Chromeleon Report Type
Pipelines Tab
The Pipelines tab provides metrics specific to Tetra Data Pipelines.
Pipeline Tab Metrics
The Pipelines tab displays the following metrics:
Performance Metric | Metric Type | Description |
---|---|---|
Workflow Failure Percent | KPI | Percent of pipeline workflows that failed from all of the workflows that ran |
Top 10 Pipelines by Workflow Failures | Table | The 10 pipelines that had the highest workflow failure rate |
Average Workflow Run Duration | KPI | Average time it took for pipelines to run each successful workflow |
Top 10 Pipelines by Longest Workflow Duration | Table | The 10 pipelines that had the longest average workflow runtime duration |
Workflow Status | Table | Shows the workflow status for each pipeline |
Available Filters
You can filter the Pipelines tab data using the following options:
- Time Range: Select a predefined time range or set a custom date range
- Pipeline Id: Filter by specific pipeline id
Agent Connectivity Tab
The Agent Connectivity tab provides information about the connection status of all or your supported Agents (Tetra File-Log Agents and Tetra Empower Agents).
Available Metrics
The Agent Connectivity tab displays the following metrics:
Performance Metric | Metric Type | Description |
---|---|---|
Last Agent Heartbeat | Table | Last heartbeat received from each agent |
Available Agent Connectivity Tab Filters
You can filter the Agent Connectivity tab data using the following options:
- Agent Name: Filter by specific agent name
- Agent Id: Filter by specific agent id
Limitations
The Health Monitoring App has the following limitations:
- Metrics data isn't backfilled when the app is activated, so the available data spans one day in duration only when the new Health Monitoring dashboard first appears in customers' TDP environments.
- If the available data spans one day in duration only (for example, when the new dashboard is first activated) then charts in the new Health Monitoring dashboard will appear empty. If customers hover their cursors over an empty chart, they will see a single data point, which is a summary statistic for that day. Once customers have two-days of data available, then the charts and tables will populate as normal.
- The File Ingestion Journey chart on the FLA tab won't show any file search indexing events or other downstream file events if those events occur outside of the Time range filter defined by customers. This behavior occurs because all File Ingestion Journey events are a subset of the original files that were scanned during the specified time range.
- There is a maximum number of table rows available for each of the following troubleshooting metrics:
- Last Agent Heartbeat: 100 row maximum
- File Status: 1,000 row maximum
- Workflow Status: 1,000 row maximum
- Path Status (Latest Scan): 1,000 row maximum
Troubleshooting
Dashboard Not Loading
If the Health Monitoring dashboard fails to load:
- Refresh your browser.
- Clear your browser cache.
- Ensure you have the necessary permissions to access the dashboard.
- Check if your TDP version meets the minimum requirements.
- Contact your TDP administrator if the issue persists.
Missing or Incomplete Data
If you notice missing or incomplete data in the dashboard:
- Check if the time range selection includes periods before the app was activated.
- Verify that your agents are online and sending heartbeats.
- Ensure that your integrations are properly configured.
- For customer-hosted environments, check that the required DNS and TLS configurations are in place.
- Contact TetraScience support if the issue persists.
Slow Dashboard Performance
If the dashboard is loading slowly:
- Try reducing the selected time range.
- Apply filters to limit the amount of data being processed.
- Close other browser tabs and applications.
- For large environments with many agents, consider using more specific filters.
- Check your network connection.
Error Messages
Common error messages and their resolutions:
- "No data available": Ensure that your integrations are running and generating data.
- "Failed to load data": Refresh the dashboard or check your network connection.
- "Access denied": Contact your TDP administrator to verify your permissions.
- "Service unavailable": The app service may be temporarily down; try again later.
For additional help or to report issues, submit a support ticket.
Updated 12 days ago