Slow-running queries can degrade your Redshift cluster’s performance and lead to increased costs. Identifying the most expensive queries is crucial to optimize resource usage and improve overall system efficiency.
Step-by-Step Guide
-
Connect to your Redshift cluster.
Use your preferred SQL client or the Redshift Query Editor to establish a connection with your cluster. -
Query the
stl_querysystem table for the most resource-intensive queries.
Thestl_querytable logs all completed queries, including their runtime metrics. Use the following SQL to retrieve the top 10 queries with the longest execution time over the last 24 hours:SELECT query, userid, label, starttime, endtime, DATEDIFF(seconds, starttime, endtime) AS elapsed_seconds FROM stl_query WHERE starttime >= GETDATE() - INTERVAL '1 day' ORDER BY elapsed_seconds DESC LIMIT 10; -
Retrieve the SQL text of the expensive queries.
Use thestl_querytexttable to get the text of the queries identified:SELECT text FROM stl_querytext WHERE query = <query_id> ORDER BY sequence;Replace
<query_id>with the actual query ID from the previous step to analyze the query text for possible optimizations. -
Focus your optimization efforts.
Look for joins without indexes, large data scans, or missing filters and consider rewriting or adjusting these queries for better performance.
Conclusion
Tracking and analyzing expensive queries using Redshift’s system tables is a simple yet powerful way to maintain a healthy and efficient data warehouse environment. Regular monitoring helps reduce costs and speeds up analytics workflows.
For further reading, visit the official Amazon Redshift system tables documentation: Amazon Redshift System Tables