Monitor the Tetra Data Platform Health
The overall TDP Health Monitoring Dashboard (named Dashboard) provides an end-to-end snapshot of the components' health and performance for the entire TDP ecosystem. Additionally, you can view the performance and quick health visualizations for each of these TDP components:
To access and view the overall TDP Health Monitoring Dashboard:
- Log in to TDP using an Administrator user account.
- In the Tetra Data Platform, click the Hamburger icon at the top left corner of the page to expand the TDP menu options (or hover over the list of icons to display the menu options):
- Select Health Monitoring from the list of menu options that appears on the left side of the page.
The Health Monitoring page displays with the Dashboard tab selected by default. If not, then click Dashboard to view an end-to-end snapshot of the components' health for the entire TDP ecosystem.
Each dashboard is divided into two sections:
- Component health status
- Problems
Component Health Status
The top section of the dashboard indicates the health status of the TDP component by displaying a statistical visualization of how many instances of a component are in a healthy, unhealthy, or critical state; and provides a file total for the component.
Possible States | Definition | Example |
---|---|---|
Healthy | Indicates that the component is operating optimally within specified parameters. What is considered optimal differs by component. | A healthy connector makes a connection within three attempts. |
Unhealthy | Indicates that the component is not operating optimally, but has not failed. What is considered optimal differs by component. | An unhealthy datahub has a memory usage value that is greater than 80% but less than, or equal to 90%, for the past five contiguous minutes. |
Critical | Indicates that the component has failed or is well outside the specified parameters. Similar to the Healthy and Unhealthy states, the exact elements that contribute to the Critical state differs by component. | If the percentage of used disk space for a Tetra Agent is greater than 90%, then that component is in a critical state. |
Problems
The bottom section of the dashboard displays a detailed list of component instances that you can filter by type.
This section is an aggregate list of all components with critical issues. To review information for components that are not in the critical state, you must view the specific dashboard for each TDP component. You can filter problems by these types: All, Agents, Data Hubs, Data Hub Connectors, and Cloud Connectors. The list of problems include the following information for each component:
Field | Description |
---|---|
Health | Status; for the overall TDP Health Monitoring Dashboard only Critical issues display. |
Name | Name (and representative icon) of the component instance that is currently in the critical state. To sort the list of components by name, click the arrow next to Name at the top of the column. You can sort items alphabetically, or in reverse order. To view additional details about the component, you can hover over the name. The information that displays is customized based on the component type. To copy the unique ID for the component instance, click the copy file icon. |
Latest Status | When the latest status (Date/Time format) was assigned, and whether its currently Active or has been Disabled. To review a component's status history, click the View History link below the status. |
Health Description | Explains why the component has been assigned the critical state (shown by issues and/or warnings). By default, only one issue is shown. If there is more than one issue, then a link displays (for example, +1 More) indicating there are additional issues to review. |
Link | Provides a link you can click to review the configuration details for the TDP component. |
View History
If available, you can click View History from the Latest Status column on some dashboards to review a component's list of status changes. TDP polls the component every five minutes. If a change of status occurs during the five minute polling interval, then an entry is added to the component's status history. For example, this shows the history of a selected cloud connector:
Historical data includes:
- Time: In date/time format indicating when a status change occurred
- Change: Shows the status change
- Errors: Displays any error(s) that triggered the healthy status of the component to change
- Warnings: Displays any warning(s) that triggered the healthy status of the component to change
Updated about 1 year ago