Analyzing data distribution

The Data Analysis panel is the primary interface for auditing database structures and diagnosing storage-related performance issues. Use these actions to maintain schema health and ensure data is distributed efficiently across the cluster.

Auditing table and object inventory

Use the Tables tab to monitor the scale of your relational data and verify that storage settings are optimized for analytical workloads.

  • Validate compression efficiency: Observe the Compression and Level columns. If the compression ratio is low for a large table, consider adjusting the algorithm (e.g., switching to zstandard) to reclaim disk space.
  • Check metadata freshness: Monitor the Last Analyze timestamp. If a table has not been analyzed recently, the query planner might use stale statistics, leading to inefficient execution plans.
  • Manage external data services: Use the External Tables tab to oversee data residing in S3 or HDFS.

Optimizing performance through indexing and partitioning

Use the Indexes and Partitions tabs to ensure your data structures support rapid query execution and simplified data lifecycles.

  • Update stale statistics: Navigate to the Missing Stats tab to identify tables that haven't been analyzed in over seven days.
Tip

Statistics help the query planner make optimal decisions. Manually run ANALYZE after any operation that modifies more than 10% of a table's data to ensure optimal query plans.

  • Reclaim wasted disk space: Review the Bloat tab to find tables with a high dead count. If bloat exceeds 20%, consider running a manual VACUUM to mark dead space for reuse.
  • Resolve data distribution hot spots: Use the Data Skew tab to find tables where data is unevenly spread across segments. A high skew % indicates that a single segment is doing more work than others, slowing down the entire cluster.
  • Address table skew: If a large table shows significant skew, investigate the distribution key. Consider using ALTER TABLE ... SET DISTRIBUTED BY to choose a column with higher cardinality (more unique values) or fewer nulls.

Visualizing storage strategy

Use the Charts tab to get a high-level view of your database’s physical composition and identify long-term storage trends.

  • Prioritize archival candidates: Review the Top 50 Tables by Size bar chart to see which objects could be candidates for partitioning or data archiving.
  • Audit storage formats: Check the Storage Format Distribution pie chart. If a majority of your data is in Heap format, plan a migration to append-only storage to optimize for high-volume analytical reads.

Could this page be better? Report a problem or suggest an addition!