Monitor the Tetra Data Platform Health

Suggest Edits

The overall TDP Health Monitoring Dashboard (named Dashboard) provides an end-to-end snapshot of the components' health and performance for the entire TDP ecosystem. Additionally, you can view the performance and quick health visualizations for each of these TDP components:

To access and view the overall TDP Health Monitoring Dashboard:

Log in to TDP using an Administrator user account.
In the Tetra Data Platform, click the Hamburger icon at the top left corner of the page to expand the TDP menu options (or hover over the list of icons to display the menu options):

Select Health Monitoring from the list of menu options that appears on the left side of the page.

The Health Monitoring page displays with the Dashboard tab selected by default. If not, then click Dashboard to view an end-to-end snapshot of the components' health for the entire TDP ecosystem.

Each dashboard is divided into two sections:

Component health status
Problems

Component Health Status

The top section of the dashboard indicates the health status of the TDP component by displaying a statistical visualization of how many instances of a component are in a healthy, unhealthy, or critical state; and provides a file total for the component.

Possible States	Definition	Example
Healthy	Indicates that the component is operating optimally within specified parameters. What is considered optimal differs by component.	A healthy connector makes a connection within three attempts.
Unhealthy	Indicates that the component is not operating optimally, but has not failed. What is considered optimal differs by component.	An unhealthy datahub has a memory usage value that is greater than 80% but less than, or equal to 90%, for the past five contiguous minutes.
Critical	Indicates that the component has failed or is well outside the specified parameters. Similar to the Healthy and Unhealthy states, the exact elements that contribute to the Critical state differs by component.	If the percentage of used disk space for a Tetra Agent is greater than 90%, then that component is in a critical state.

Problems

The bottom section of the dashboard displays a detailed list of component instances that you can filter by type.

This section is an aggregate list of all components with critical issues. To review information for components that are not in the critical state, you must view the specific dashboard for each TDP component. You can filter problems by these types: All, Agents, Data Hubs, Data Hub Connectors, and Cloud Connectors. The list of problems include the following information for each component:

Field	Description
Health	Status; for the overall TDP Health Monitoring Dashboard only Critical issues display.
Name	Name (and representative icon) of the component instance that is currently in the critical state. To sort the list of components by name, click the arrow next to Name at the top of the column. You can sort items alphabetically, or in reverse order. To view additional details about the component, you can hover over the name. The information that displays is customized based on the component type. To copy the unique ID for the component instance, click the copy file icon.
Latest Status	When the latest status (Date/Time format) was assigned, and whether its currently Active or has been Disabled. To review a component's status history, click the View History link below the status.
Health Description	Explains why the component has been assigned the critical state (shown by issues and/or warnings). By default, only one issue is shown. If there is more than one issue, then a link displays (for example, +1 More) indicating there are additional issues to review.
Link	Provides a link you can click to review the configuration details for the TDP component.

View History

If available, you can click View History from the Latest Status column on some dashboards to review a component's list of status changes. TDP polls the component every five minutes. If a change of status occurs during the five minute polling interval, then an entry is added to the component's status history. For example, this shows the history of a selected cloud connector:

Historical data includes:

Time: In date/time format indicating when a status change occurred
Change: Shows the status change
Errors: Displays any error(s) that triggered the healthy status of the component to change
Warnings: Displays any warning(s) that triggered the healthy status of the component to change

Updated over 1 year ago