My last post was on the wonders of Window Functions in Batch mode (which is awesome – if you’ve not read the post then go… now…). This post will focus on any of us who don’t happen to have the latest and greatest (at time of writing) and aren’t able to force batch mode through our queries.
So, what’s the big deal? Window Functions are just Window Functions? Yes?
Well actually no. There’s a couple of key quirks that it would be useful to know when you’re coding in order to know you’re not stressing the system unnecessarily. For the purposes of this post there’s one in particular that I wanted to cover that I came across recently…
Let’s dive right in… can anyone tell me what’s wrong with this statement?
select orderDate, sum(totalDue) over(order by orderDate, salesOrderID)
No? To be honest there’s nothing wrong in principle. It’s the standard way to code a window function, it gets the desired result, returns the expected 1.25 million records… so what’s the problem?
Well, I’ll ask the following very loaded question…
“Who naturally codes this instead?”
select orderDate, sum(totalDue) over(order by orderDate, salesOrderID rows between unbounded preceding and current row)
No-one, right? It’s EXACTLY the same. The default clause on a window function is exactly that… “rows between unbounded preceding and current row”, so why on Earth would we write it?
Well actually, you really, REALLY should. I honestly don’t know why this is the case, especially when this is the internal default anyway and when the results of these are identical, but SQL Server treats these two queries in very critically different ways.
From an execution plan standpoint there’s one critical note:
These look identical except that there is an additional operator in the query in which we explicitly specify the frame. What does this do to the execution?
It’s hard to see pictures on the blog, so let’s zoom in and look at the critical pieces…
Look at the reads… there’s a HUGE difference. As I said, my box is incredibly quiet and therefore under no pressure whatsoever, but imagine your servers… imagine a SQL Server with memory pressure or spilling to tempDB with that volume of reads? Performance would tank.
Whereas with the specified frame none of these reads, and therefore the associated worries will occur.
Therefore I definitely urge you to frame ALL your Window Functions where appropriate (some, such as Row_number don’t accept frames). And no, in Batch Mode this isn’t applicable… it performs just the same either way. But for your older system, you’ll have to accept that you need a few extra keystrokes I’m afraid.
This post shows how you can potentially make good performance from existing Window Functions...