Observing system events
Verify cluster health, maintain operational awareness and respond to system events in real time through the following core actions:
Verifying the cluster health: Maintain a real-time view of the cluster’s topology and health via the Cluster panel. Use this to ensure that coordinator, standby, and segment nodes are online and correctly configured.
Visualizing hardware performance: Use the System Metrics panel to tracking the physical health of your infrastructure. Use these charts to identify OS-level bottlenecks, such as CPU spikes, memory exhaustion, or network latency across specific hosts.
Validating database responsiveness: Ensure the database engine is actively processing requests. Use the Monitoring panel to review automated canary checks—synthetic SQL probes that verify connectivity and execution speed.
Auditing system logs: The Logs panel allows you to investigate the unified stream of system and database telemetry. Search through coordinator and segment logs to pinpoint the root cause of query failures or administrative changes.
Managing alerts: Use the Alerts panel to integrate with Prometheus Alertmanager and govern the incident lifecycle through real-time notifications.
Verifying the cluster health
Use the Cluster Overview panel to monitor real-time WarehousePG cluster health, verify node availability, and track critical connectivity metrics to ensure high availability
Visualizing hardware performance
Use the System Metrics panel to track physical host metrics, identifying resource bottlenecks, and correlating hardware spikes with database activity.
Validating database responsiveness
Use the Monitor panel to track proactive health indicators and automated canary check results to ensure database availability.
Auditing system logs
Use the Log panel to access, filter, and analyze system and database telemetry through integrated log viewers.
Managing alerts
Use the Alerts panel to integrate with Prometheus Alertmanager and govern the incident lifecycle through real-time notifications.
Could this page be better? Report a problem or suggest an addition!