Verifying the cluster health

Suggest edits

The Cluster panel provides a high-level summary of the WarehousePG (WHPG) cluster configuration and real-time health metrics. This panel is the primary starting point for verifying cluster availability and resource utilization.

Confirming core cluster availability

Use the top-level summary cards to obtain an overview of the state of your cluster. Focus on these three metrics to ensure basic service delivery:

Check operational status: Verify the overall status shows as "Healthy." If the state is "Degraded," it indicates that one or more segments have failed or that synchronization is lagging.
Track segment uptime: Ensure the count of "Up" segments matches your total segment count. Any "Down" segments represent a loss of data redundancy or processing power.
Monitor connection headroom: Compare current active connections against the maximum limit. If connections are near the ceiling, new application requests will be rejected.

Validating coordinator and standby sync

The coordinator is the entry point for all queries. Use this section to ensure the control plane is resilient:

Verify the coordinator state: Confirm the primary coordinator host is up. If it is down, application traffic cannot reach the database.
Monitor the replication mode: Check that the standby coordinator is "Synchronized." If the mode shows as "Not Synced", a failover event could result in data loss or extended downtime.

Auditing segment and mirror configuration

WarehousePG relies on a distributed architecture where primary segments handle the work and mirrors provide safety. Use the segment table to perform these actions:

Identify failed primary nodes: Search the table for any primary segments with a "Down" status. In a healthy cluster, every primary should be active.
Verify failover readiness: Ensure all mirror segments are "Up" and in their correct roles. If a mirror is down, the associated primary segment is running without a safety net.
Localize network issues: Review the Hostname and Port columns to determine if a specific physical host is responsible for multiple segment failures.

Analyzing database utilization

Identify which specific databases are consuming cluster resources to prevent one tenant from impacting others:

Compare database sizes: Identify rapidly growing databases that might require storage expansion or data vacuuming.
Track session distribution: Monitor connection counts per database to identify unauthorized access or application connection leaks.

Could this page be better? Report a problem or suggest an addition!