You paste a slow Redshift query plan into PlanTrace and one of the tuning insights reads "SORTKEY optimization candidate", pointing at an explicit sort step that's adding cost you don't need to pay.
A sort step runs at query time, every time, and on a wide table with millions of rows it can dominate the total cost.
The fix is usually a SORTKEY that matches what the query is sorting on.
What an explicit sort step actually is
When your query has an ORDER BY, a window function, or a merge join that needs ordered input, Redshift has two options. If the data is already physically stored in the order it needs, it just reads it. If not, it sorts the rows on the fly.
That on the fly sort is the explicit sort step, and it shows up as an XN Sort operation with its own Sort Key line and a cost range.
Spotting it in raw EXPLAIN text means scanning indentation and matching cost numbers by eye. This is where reading the plan as a graph helps: PlanTrace renders each node with its cost broken out, so the expensive sort is obvious instead of buried, and the insight names the exact table and column behind it. The underlying mechanics are documented in the AWS guide on reviewing query plan steps.
What causes it?
The root cause is simple: the order the query needs does not match the physical order the table is stored in. A table with no SORTKEY, or with a SORTKEY on columns the query doesn't use, forces Redshift to materialize and sort the result set at runtime. The wider the rows and the larger the row count, the more expensive that becomes. In the graph you'll see the cost concentrated on the sort while the scans underneath look comparatively cheap, which is exactly the pattern the SORTKEY insight keys off of.
How to fix it
Define a SORTKEY that matches the column the query orders or joins on. When the table is already sorted that way, Redshift can skip the runtime sort and read rows in order straight from disk.
Add the column to the SORTKEY definition with ALTER TABLE, which lets you change sort keys on existing tables without recreating them and without blocking concurrent reads or writes, as noted in the Redshift sort key recommendation announcement.
-- Add a compound sort key matching your ORDER BY / join column ALTER TABLE lineitem ALTER COMPOUND SORTKEY (l_quantity); -- Or let Redshift manage it automatically ALTER TABLE lineitem ALTER SORTKEY AUTO;
After the change, sort the existing data so the new key takes effect, then re-run EXPLAIN:
VACUUM SORT ONLY lineitem; ANALYZE lineitem; EXPLAIN SELECT l_quantity, sum(l_extendedprice) FROM lineitem GROUP BY l_quantity ORDER BY l_quantity;
Just note that on very large tables, ALTER table and VACUUM can take significant time to complete.
Paste the new plan back into PlanTrace and compare it against the old one.
If the SORTKEY matches, the XN Sort node either drops out of the plan or its cost falls sharply, and the candidate insight no longer fires. That before and after check is the only reliable confirmation.
A SORTKEY that doesn't line up with the query's order changes nothing. Everything runs client side in your browser, so your plans are never stored or sent anywhere.
Wrapping up
An explicit sort step is Redshift telling you the data isn't stored in the order your query wants. Match the SORTKEY to the ORDER BY or join column, vacuum and analyze, then compare the plans to confirm the sort is gone. Letting PlanTrace surface the candidate and verify the result beats squinting at cost numbers in raw text. And remember that a sort key only helps queries that actually filter, join, or order on those columns, so optimize for the patterns that matter most.

No comments:
Post a Comment