Study your Data: Killing Explicit Sort Steps in Redshift with the Right SORTKEY

You paste a slow Redshift query plan into PlanTrace and one of the tuning insights reads "SORTKEY optimization candidate", pointing at an explicit sort step that's adding cost you don't need to pay.

A sort step runs at query time, every time, and on a wide table with millions of rows it can dominate the total cost.

The fix is usually a SORTKEY that matches what the query is sorting on.

What an explicit sort step actually is

When your query has an ORDER BY, a window function, or a merge join that needs ordered input, Redshift has two options. If the data is already physically stored in the order it needs, it just reads it. If not, it sorts the rows on the fly.

That on the fly sort is the explicit sort step, and it shows up as an XN Sort operation with its own Sort Key line and a cost range.

Spotting it in raw EXPLAIN text means scanning indentation and matching cost numbers by eye. This is where reading the plan as a graph helps: PlanTrace renders each node with its cost broken out, so the expensive sort is obvious instead of buried, and the insight names the exact table and column behind it. The underlying mechanics are documented in the AWS guide on reviewing query plan steps.

What causes it?

The root cause is simple: the order the query needs does not match the physical order the table is stored in. A table with no SORTKEY, or with a SORTKEY on columns the query doesn't use, forces Redshift to materialize and sort the result set at runtime. The wider the rows and the larger the row count, the more expensive that becomes. In the graph you'll see the cost concentrated on the sort while the scans underneath look comparatively cheap, which is exactly the pattern the SORTKEY insight keys off of.

How to fix it

Define a SORTKEY that matches the column the query orders or joins on. When the table is already sorted that way, Redshift can skip the runtime sort and read rows in order straight from disk.

Add the column to the SORTKEY definition with ALTER TABLE, which lets you change sort keys on existing tables without recreating them and without blocking concurrent reads or writes, as noted in the Redshift sort key recommendation announcement.

-- Add a compound sort key matching your ORDER BY / join column
ALTER TABLE lineitem
ALTER COMPOUND SORTKEY (l_quantity);

-- Or let Redshift manage it automatically
ALTER TABLE lineitem ALTER SORTKEY AUTO;

After the change, sort the existing data so the new key takes effect, then re-run EXPLAIN:

VACUUM SORT ONLY lineitem;
ANALYZE lineitem;

EXPLAIN
SELECT l_quantity, sum(l_extendedprice)
FROM lineitem
GROUP BY l_quantity
ORDER BY l_quantity;

Just note that on very large tables, ALTER table and VACUUM can take significant time to complete.

Paste the new plan back into PlanTrace and compare it against the old one.

If the SORTKEY matches, the XN Sort node either drops out of the plan or its cost falls sharply, and the candidate insight no longer fires. That before and after check is the only reliable confirmation.

A SORTKEY that doesn't line up with the query's order changes nothing. Everything runs client side in your browser, so your plans are never stored or sent anywhere.

Wrapping up

An explicit sort step is Redshift telling you the data isn't stored in the order your query wants. Match the SORTKEY to the ORDER BY or join column, vacuum and analyze, then compare the plans to confirm the sort is gone. Letting PlanTrace surface the candidate and verify the result beats squinting at cost numbers in raw text. And remember that a sort key only helps queries that actually filter, join, or order on those columns, so optimize for the patterns that matter most.

Study your Data

Monday, June 15, 2026

Killing Explicit Sort Steps in Redshift with the Right SORTKEY

What an explicit sort step actually is

What causes it?

How to fix it

Wrapping up

No comments:

Post a Comment

Programmatic Tool Calling with the Claude SDK