Monday, May 18, 2026

Optimizing Redshift Performance by Configuring WLM Queues

Efficient query performance in Amazon Redshift often comes down to how well you manage workload concurrency. Redshift's Workload Management (WLM) queues enable you to control how queries share resources, helping avoid bottlenecks during peak loads. Properly configuring WLM queues ensures critical queries get the resources they need while maximizing cluster throughput.

This post will guide you through the basics of WLM queue configuration and share actionable steps to optimize query execution and concurrency for your Redshift workloads.


Automatic WLM vs. Manual WLM

Before diving into manual configuration, it's important to note that Amazon recommends Automatic WLM as the default approach for most workloads. With Automatic WLM, Redshift dynamically determines resource utilization as queries arrive and adjusts concurrency and memory allocation on the fly — without requiring manual tuning. It maximizes total throughput and handles unpredictable or changing workloads more efficiently than static manual configuration.

Manual WLM is best suited for scenarios where your workload patterns are predictable, you need to guarantee multiple workload types run simultaneously with fixed resource allocations, or you need to throttle certain query types at specific times of day. If you proceed with manual WLM, be prepared to monitor and tune it regularly as workloads evolve.


Step-by-Step Guide (Manual WLM)

1. Understand Redshift WLM Queues and Slots
WLM enables you to define queues with resource slots. Each query consumes one or more slots depending on queue configuration. Slots allocate memory and concurrency allowing multiple queries to run simultaneously without resource contention.
Learn more about WLM fundamentals here: Redshift WLM Documentation.


2. Analyze Your Query Workload
Identify query types and their resource demands. Separate lengthy ETL or analytical queries from short, high-priority reporting queries. Segmenting workloads allows for assigning appropriate queue configurations matching query resource profiles.


3. Create or Modify Queues Based on Workload Types
For example, create a "high_priority" queue with fewer concurrency slots but higher memory allocation for fast-running reports, and a "default" queue for ad-hoc or batch queries running with more concurrency but less memory per query.


4. Adjust Slots and Memory Allocations
Use the AWS Console or Redshift parameter groups to configure slots via the wlm_json_configuration parameter. An example config snippet:

-- Example JSON for WLM queue definition in parameter group:
[
  {
    "query_group": "high_priority",
    "concurrency": 5,
    "memory_percent_to_use": 30
  },
  {
    "query_group": "default",
    "concurrency": 10,
    "memory_percent_to_use": 20
  }
]

This example balances memory and concurrency to prioritize faster queries effectively.


5. Monitor and Tune Using System Views
Query views like STV_WLM_QUERY_STATE and STL_WLM_QUERY to analyze queue usage, wait times, and query runtimes. Use this insight to iteratively adjust concurrency and memory settings.


Conclusion

Optimizing Redshift WLM queues is key to balancing concurrency and performance in mixed workloads. For most users, Automatic WLM is the recommended starting point. If manual WLM better fits your use case, classifying queries, assigning queues based on priority, and tuning memory and slot allocations will ensure efficient resource use and a better experience for high-priority queries.
Regular monitoring and tuning with Redshift system views complete the feedback loop for continuous improvement.

Official Documentation for more details:
Amazon Redshift Workload Management


Wednesday, May 13, 2026

Data Engineering Books Worth Having on Your Shelf (or your tablet)

Good documentation gets you started. Good books get you deep. After years of working with cloud data platforms, SQL engines, and machine learning pipelines, a handful of titles keep coming up as genuinely useful — not just for beginners, but for practitioners who want to go beyond the basics. Here are five books that cover the tools this blog is built around.

Disclosure: the links below are Amazon affiliate links. If you purchase through them, this site earns a small commission at no extra cost to you.


Generative AI on AWS

Chris Fregly, Antje Barth & Shelbee Eigenbrode — O'Reilly

Covers how to build context-aware, multimodal reasoning applications on AWS using generative AI. Written by AWS practitioners, it bridges the gap between the hype and the actual engineering work — prompt engineering, RAG pipelines, fine-tuning, and deploying models at scale. Useful for data engineers who are being asked to build AI-adjacent infrastructure and want a grounded, AWS-native perspective.


Amazon Redshift: The Definitive Guide

Tyler Scherr & James Tunick — O'Reilly

The most thorough single reference for Redshift available in print. Goes well beyond the documentation: cluster architecture, distribution styles, sort keys, workload management, Redshift Spectrum, and data sharing. Whether you are running a provisioned cluster or Redshift Serverless, this book will save you hours of trial and error on performance and cost optimization.


Data Science on AWS

Chris Fregly & Antje Barth — O'Reilly

A hands-on guide to building end-to-end, continuous AI and ML pipelines on AWS. Covers SageMaker, data ingestion, feature engineering, model training, deployment, and monitoring — with real code throughout. The focus on continuous pipelines rather than one-off notebooks makes this especially relevant for teams trying to productionize their ML work.


Just Use Postgres!

Brandur Leach

The title says it plainly. PostgreSQL is a remarkably capable database that teams routinely underutilize before reaching for more complex infrastructure. This book is written for application developers and database practitioners who want to squeeze more out of Postgres — from indexing strategies and transactions to JSON, full-text search, and advanced concurrency patterns. A practical counter-argument to unnecessary architectural complexity.


Practical SQL, 2nd Edition

Anthony DeBarros — No Starch Press

SQL books tend to be either too shallow or too academic. This one hits the right balance. Written by a journalist and data analyst, it teaches SQL through real datasets and actual analysis problems using PostgreSQL. The focus on finding the story inside data — rather than just memorizing syntax — makes the concepts stick. Concepts transfer directly to MySQL, SQL Server, Redshift, and most other relational engines.


A very personal note on buying technical books

Technical books go out of date, but the fundamentals they cover rarely do.
A well-chosen book on Redshift architecture or SQL query design will still be useful three or more years after purchase — the core concepts around distribution, joins, and execution plans do not change with every minor release.


Monday, May 11, 2026

Quick Tip: Check Most DTU Expensive Queries in Azure SQL Database

Optimizing Azure SQL Database performance often begins with identifying the most resource-intensive queries. Understanding which queries consume the most DTU (Database Transaction Unit) helps prioritize tuning efforts and improve overall efficiency.

Azure SQL Database provides built-in tools to easily surface the queries with the highest DTU usage, making it straightforward for data engineers and DBAs to pinpoint performance bottlenecks.

Step-by-Step Guide

1. Connect to Your Azure SQL Database Using VSCode or SQL Server Management Studio (SSMS).
These tools allow you to run queries directly against your database.

2. Query the sys.dm_db_resource_stats DMV to Get Recent Resource Usage Data.
This dynamic management view shows DTU usage metrics aggregated for your database and is updated every 15 seconds. You can correlate this with queries identified in query stats views.
Example query to find top DTU-consuming queries:

SELECT TOP 10 
    qs.sql_handle,
    qs.plan_handle,
    total_worker_time/1000 AS total_cpu_ms,
    total_logical_reads, 
    execution_count,
    (total_worker_time / execution_count) / 1000 AS avg_cpu_ms,
    SUBSTRING(st.text, (qs.statement_start_offset/2) + 1, 
        ((CASE qs.statement_end_offset 
            WHEN -1 THEN DATALENGTH(st.text) 
            ELSE qs.statement_end_offset 
          END - qs.statement_start_offset)/2) + 1) AS query_text
FROM 
    sys.dm_exec_query_stats AS qs
CROSS APPLY 
    sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY 
    total_cpu_ms DESC;

This query highlights queries with the highest total CPU time, which mostly correlates with high DTU consumption.



3. Use Azure Portal's Query Performance Insight for Visual Analysis.
Navigate to your Azure SQL Database resource in the Azure Portal, select "Query Performance Insight," and explore the top resource-intensive queries by DTU consumption over a configurable time window.
Learn more: Query Performance Insight Documentation

4. Investigate Execution Plans for Identified Queries.
Analyze execution plans to understand why queries consume high DTUs and target specific improvements such as indexing, query rewriting, or parameter sniffing fixes.

Conclusion

By regularly identifying the most DTU expensive queries using DMV queries and Azure Query Performance Insight, you can effectively target optimization efforts on your Azure SQL Database. 

This proactive monitoring helps maintain smooth performance and cost efficiency without guesswork.

Additional tips:
- Automate these checks with scheduled scripts or alerts.
- Combine resource analysis with Azure SQL Intelligent Insights for AI-driven recommendations.

Friday, May 8, 2026

PlanTrace Now Supports PostgreSQL

PlanTrace Now Supports PostgreSQL PostgreSQL

The same plan analysis you know from Redshift, now for PostgreSQL — including EXPLAIN ANALYZE, buffer stats, and JSON format.

When I built PlanTrace, it was a Redshift-only tool. Paste your EXPLAIN output, get an interactive graph, cost heatmap, and tuning insights — all client-side, nothing stored. It worked well, but Redshift is only part of the picture for most teams.

PostgreSQL is where most development happens, where staging environments live, and where a lot of production workloads run. So I extended PlanTrace to support it — same interface, same zero-friction flow, automatic engine detection.


How it works

Paste your plan. PlanTrace detects whether it's PostgreSQL or Redshift automatically — no dropdown to set, no configuration. The engine badge in the top-right corner shows what was detected, and you can override it if needed.

PostgreSQL plans unlock extra data that Redshift plans don't have: actual timing, actual row counts, loop counts, buffer stats, WAL metrics. When that data is present, PlanTrace surfaces it — misestimate badges, buffer usage sections in the node detail panel, duration bubbles on each node.

Supported EXPLAIN variants

All standard PostgreSQL EXPLAIN formats are supported:

-- Estimated plan only
EXPLAIN SELECT ...;

-- Actual timing and row counts (recommended)
EXPLAIN ANALYZE SELECT ...;

-- Buffer and I/O stats
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- Full detail, JSON format
EXPLAIN (ANALYZE, BUFFERS, WAL, VERBOSE, SETTINGS, FORMAT JSON) SELECT ...;

-- Plan shape without cost numbers
EXPLAIN (COSTS OFF) SELECT ...;

Tip: EXPLAIN ANALYZE actually executes your query. To avoid side-effects on write queries, wrap it: BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;

JSON format is auto-detected and parsed identically to text format. The QUERY PLAN header, separator lines, and trailing (N rows) footer are all stripped automatically — paste the raw output as-is.

What the PostgreSQL engine detects

The insight engine runs a set of PostgreSQL-specific rules on your plan:

  • Sequential scans on large tables — with and without filter predicates
  • Sort nodes spilling to disk (external merge sort)
  • Hash Join and HashAggregate spills (Batches > 1) — with batch count and disk size
  • Nested Loop joins with large outer sides
  • High row count misestimates between planner estimates and actual rows
  • Low-reduction aggregations (near 1:1 input/output ratio)
  • Sort inside a subquery feeding a Hash Join — wasted work

Each finding is prioritized (critical / warning / info), tied to a specific node in the plan, and comes with a concrete recommendation — not generic advice.

Try it

plantrace.studyyourdata.com
Paste any PostgreSQL or Redshift EXPLAIN output and analyze it instantly. Free, browser-based, no account required. Your query plans never leave your browser.

If you run into a plan that doesn't parse correctly or a finding that seems off, I'd like to know. The rule engine is actively being improved — feedback from real plans is the best way to sharpen it.

Wednesday, May 6, 2026

PlanTrace: Stop Reading Redshift EXPLAIN Plans. Start Seeing Them

PlanTrace: Stop Reading Redshift EXPLAIN Plans. Start Seeing Them

Introducing PlanTrace — a free, browser-based tool that turns raw execution plans into interactive graphs and actionable tuning insights.

plantrace.studyyourdata.com
PlanTrace graph view showing an interactive execution plan with color-coded cost nodes, arrows, and the insights drawer open

PlanTrace graph view — color-coded cost nodes, execution flow arrows, and the tuning insights drawer, all in one screen.

If you work with Amazon Redshift, you've been there. You run EXPLAIN on a slow query, and you get back something like this:

XN Limit  (cost=2000000000000.00..2000000000125.00 rows=100 width=256)
  ->  XN Merge  (cost=2000000000000.00..2000000000125.00 rows=100 width=256)
        ->  XN Hash Join DS_DIST_BOTH  (cost=384729183746.25..903741293847.50 rows=32000000000 width=256)
              Hash Cond: ("outer".customer_id = "inner".customer_id)
              ->  XN Hash Join DS_BCAST_INNER  (cost=182938471623.00..723481920384.00 rows=28000000000 width=224)
                    ...

Now imagine 80+ operators deep. Nested joins, multiple redistributions, broadcasts stacked on top of each other, costs in the trillions. You're supposed to manually scan this and figure out what's killing performance.

Execution plans are not logs. They are structured graphs of data movement and compute decisions. They deserve to be treated that way.

That's why I built PlanTrace.


What PlanTrace does

PlanTrace takes your raw EXPLAIN output, parses it into a structured tree, and gives you three ways to reason about it — all in your browser, with no data ever leaving your machine.

The graph view

The moment you paste a plan and click Visualize, PlanTrace renders it as an interactive graph — nodes for every operator, arrows showing execution flow, and a minimap for large plans.

Each node is color-coded by cost intensity using a 6-level semaphore — from green (cheap) to red (expensive). You stop reading plans. You see them. Hotspots are obvious in seconds.

Hover over any node to see operator documentation inline, distribution movement notes (DS_BCAST_INNER, DS_DIST_BOTH, etc.), and cost range, estimated rows, and output width. Click a node to pin its detail panel — useful for digging into join conditions without losing your place in the graph.

The table view

PlanTrace table view showing sortable columns with color-coded cost pills for each operator

Table view — sortable by any column, with cost severity pills making hotspots immediately visible.

Not a graph person? The table view flattens all operators into sortable columns — cost, rows, width, distribution code. Sort by total cost to instantly surface the most expensive operators. Toggle tree-order mode to preserve the parent/child structure while keeping sortable columns.

The cost severity pills make it hard to miss the bad stuff — red means it's on fire, and in a real production query, you'll know immediately which joins to attack first.

The chart view

PlanTrace chart view showing operators plotted on a cost vs estimated rows scatter chart with quadrant labels

Chart view — each dot is a plan operator. Top-left quadrant (high cost, few rows) flags overestimated work that needs ANALYZE. Dot size encodes row width.

The chart view plots every operator on a cost × estimated-rows scatter chart (both log scale). Four quadrants tell you immediately what kind of problem you're dealing with:

  • Top-left (Overestimated Work) — high cost, few rows. Run ANALYZE — statistics are likely stale.
  • Top-right (Balanced) — high cost, many rows. Expected for big operations; focus on join strategy.
  • Bottom-left (Ideal) — low cost, few rows. This is what you're aiming for.
  • Bottom-right (Cardinality Risk) — low cost estimate, many rows. Potential runtime spill.

Dot size encodes row width, so you can also spot wide rows that will blow up memory. Hover any dot for the full operator detail.

The operator legend

PlanTrace operator legend modal showing the Join operator documentation

Every operator type has built-in documentation — click the legend to understand exactly what each operator does and when it becomes a problem.

One of the things I wanted PlanTrace to be is educational, not just diagnostic. Every operator type in the legend links to inline documentation explaining what it does, when it's fast, and when it's a problem. You don't need to have the Redshift docs open in a second tab.

The insights engine

This is the part I'm most proud of. PlanTrace runs a rule-based insight engine across your plan and surfaces prioritized, actionable recommendations sorted by severity — as many as the plan warrants. On a complex query, you might get 14 or more distinct findings.

To give you a sense of what it catches, here's what it found on a real monster query — 118 operators, 27 tables, costs reaching 2 trillion:

118Total operators
14Tuning insights
2TMax total cost
27Tables involved

The engine flagged, among other things:

  • A redistribution chain of 6 DS_DIST_BOTH joins in the same branch — at least 6 full-cluster redistributions before any result is produced
  • 14 broadcast joins (DS_BCAST_INNER), several of them on fact-sized tables with hundreds of millions of rows
  • 8 direct fact-to-fact joins without pre-aggregation — a classic source of runaway cost
  • A useless sort inside a subquery feeding directly into a hash join, which discards sort order anyway
  • Scan amplification of 10,000,000× on one table — reading 1 billion rows to return 100
  • 27 tables flagged for ALTER DISTSTYLE AUTO with ready-to-run SQL for each

Every insight is tied to an actual signal in the plan, no generic advice, no "consider your indexes" stuffs.

PDF export

Once you've analyzed a plan, click ↓ PDF in the toolbar. PlanTrace generates a formatted A4 report with an executive summary, all insights with severity color bands, an operator distribution quadrant breakdown, and the top 10 most expensive operators.

It's a ready-to-share document you can drop into a Slack thread, attach to a Jira ticket, or present to your team without having to explain what an EXPLAIN plan is.

Privacy by design

PlanTrace is entirely client-side. Your query plans never leave your browser. No backend, no analytics on your plans, no storage. Paste your plan, get your insights, close the tab — nothing is retained.

This matters because EXPLAIN output often reveals table names, column names, join structures, and data volumes that you probably don't want floating around in some SaaS vendor's logs.


How to use it

  1. Go to plantrace.studyyourdata.com
  2. Run EXPLAIN <your query>; in your Redshift client and copy the output
  3. Paste it into the left panel and click Visualize
  4. Switch between Graph / Table / Chart views
  5. Open the insights drawer for tuning recommendations
  6. Download the PDF if you want a shareable report

It's free. No account required. No install.

What it's not

PlanTrace is deliberately focused. It is not a query runner, not a monitoring dashboard, and not a replacement for runtime telemetry from views like SVL_QUERY_SUMMARY or STL_ALERT_EVENT_LOG. Those require a live connection and runtime execution data that EXPLAIN alone can't provide.

PlanTrace's job is narrower and more immediate: take a plan you already have, make it understandable, and tell you what to look at first.


PlanTrace is free to use — no account, no install, no data leaves your browser.

plantrace.studyyourdata.com →

If you work with Redshift and have a slow query you've been staring at, paste the EXPLAIN output and see what PlanTrace finds. I'd love to hear what you think — drop a comment below or find me on LinkedIn.

Monday, May 4, 2026

AWS RDS Aurora - our first approach

When starting with AWS RDS Aurora for managing relational databases in the cloud, many data engineers face the challenge of leveraging its cutting-edge features while maintaining performance and reliability. This post provides a quick, practical approach to getting started with Aurora, especially for those transitioning from traditional databases.


Step-by-Step Guide to Your First Aurora Setup

1. Choose Your Aurora Engine
Select between Aurora MySQL or Aurora PostgreSQL depending on your familiarity and project requirements. Aurora offers compatibility with both engines, delivering up to 5x the performance of standard MySQL and 3x for PostgreSQL.

2. Create Your First Aurora Cluster
Go to the AWS Management Console > RDS > Create database > Choose "Amazon Aurora" as the engine. Specify cluster settings including instance class, number of instances, and storage. For beginners, start with the free tier eligible db.t3.medium instance.
Learn more: Aurora Documentation

3. Configure Connectivity and Security
Set up VPC, subnet groups, and security groups to control access. Ensure your client IP or application servers have permission to connect via the database port (default 3306 for MySQL, 5432 for PostgreSQL). Use IAM roles for enhanced security where applicable.

4. Connect to Your Aurora Cluster
Use standard connection tools for MySQL/PostgreSQL such as MySQL Workbench or psql. The cluster endpoint handles failover automatically, so you connect to a single endpoint.
-- Example MySQL connection command
mysql -h your-cluster-endpoint.cluster-xxxxxxxxxx.us-east-1.rds.amazonaws.com -P 3306 -u admin -p

-- Example PostgreSQL connection command
psql --host=your-cluster-endpoint.cluster-xxxxxxxxxx.us-east-1.rds.amazonaws.com --port=5432 --username=admin --dbname=yourdb
  

5. Monitor Performance & Scaling
Use CloudWatch metrics and Performance Insights to track query performance and resource utilization. Aurora automatically scales storage up to 128TB without downtime, but instance scaling requires manual adjustment or use of Aurora Serverless.
More on monitoring: Aurora Monitoring

Conclusion

Starting with AWS RDS Aurora is straightforward and offers a powerful, scalable relational database solution. By selecting the right engine, configuring connectivity securely, and monitoring your cluster actively, you can leverage Aurora’s advanced capabilities with confidence. This first approach sets a solid foundation for building scalable data solutions in the cloud.


Tuesday, April 28, 2026

How to Find Expensive Queries in Amazon Redshift

Slow-running queries can degrade your Redshift cluster’s performance and lead to increased costs. Identifying the most expensive queries is crucial to optimize resource usage and improve overall system efficiency.

Step-by-Step Guide

  1. Connect to your Redshift cluster.
    Use your preferred SQL client or the Redshift Query Editor to establish a connection with your cluster.

  2. Query the stl_query system table for the most resource-intensive queries.
    The stl_query table logs all completed queries, including their runtime metrics. Use the following SQL to retrieve the top 10 queries with the longest execution time over the last 24 hours:
    SELECT query, userid, label, starttime, endtime,
           DATEDIFF(seconds, starttime, endtime) AS elapsed_seconds
    FROM stl_query
    WHERE starttime >= GETDATE() - INTERVAL '1 day'
    ORDER BY elapsed_seconds DESC
    LIMIT 10;
        

  3. Retrieve the SQL text of the expensive queries.
    Use the stl_querytext table to get the text of the queries identified:
    SELECT text
    FROM stl_querytext
    WHERE query = <query_id>
    ORDER BY sequence;
        

    Replace <query_id> with the actual query ID from the previous step to analyze the query text for possible optimizations.


  4. Focus your optimization efforts.
    Look for joins without indexes, large data scans, or missing filters and consider rewriting or adjusting these queries for better performance.

Conclusion

Tracking and analyzing expensive queries using Redshift’s system tables is a simple yet powerful way to maintain a healthy and efficient data warehouse environment. Regular monitoring helps reduce costs and speeds up analytics workflows.

For further reading, visit the official Amazon Redshift system tables documentation: Amazon Redshift System Tables

Optimizing Redshift Performance by Configuring WLM Queues

Efficient query performance in Amazon Redshift often comes down to how well you manage workload concurrency. Redshift's Workload Manage...