This is going to be a very basic explanation as to the differences seen in my previous post (about TF 9481) in which we encountered the new and old cardinality estimators providing different estimates to what looked like a very simplistic query.

For a much more in depth explanation of this I would recommend looking at Paul White’s blog page

**here**.

So anyway, here was the query that I was using in my last post:

`use AdventureWorks2012`

go

select *

from sales.SalesOrderHeader

where orderDate between ''2005-01-01'' and ''2005-12-31''

and subTotal between 10000 and 20000

So why did this give different estimates when ran with SQL 2014 vs the older CE? Well it all comes down to the way SQL Server deals with combining columns from the same table in a WHERE clause.

Let’s break this down and have a look (we could use DBCC SHOW_STATISTICS to see this in greater detail, but for the sake of simplicity I’m going to use a simpler method)…

Firstly we’ll ask the OLD CE for its estimates when we run the following:

`select *`

from sales.SalesOrderHeader

where orderDate between ''2005-01-01'' and ''2005-12-31''

option (querytraceon 9481)

select *

from sales.SalesOrderHeader

where subTotal between 10000 and 20000

option (querytraceon 9481)

Okay, so what is SQL Server doing behind the scenes when we ask it to use both predicates in the WHERE clause?

Well we know that the number of rows in the table is 31456 (you can do that without my help), there SQL Server applies the following calculation:

`select (1372.08 / 31456) -- proportion of rows expected to be returned for orderDate`

* (308.504 / 31456) -- proportion of rows expected to be returned for subTotal

* 31456 -- number of rows in the table

Executing this gives us:

So let’s have a look at the estimated query as a whole:

And that’s how it’s calculated in the old CE.

So what happens in the new one? Well basically there was always an issue in which columns in the same table could well be related to each other and therefore the old rule of making them mutually exclusive is simply too harsh. However, we know that most columns aren’t related with too strong a link and therefore the new CE takes the view that we might have a correlation and therefore we exponentially back off with our calculation.

We do this by taking the square root of the selectivity and therefore, because selectivity will always be between 0 and 1, this has the effect of raising the estimate.

For example, let’s say our selectivity estimate is 0.2… the square root of 0.2 is 0.44, therefore raising our selectivity estimate (hopefully you’re following this).

Also note that SQL Server will always estimate using the most selective predicate first.

We know that the subTotal column produces the fewest rows and therefore SQL Server will keep that estimate and THEN apply the orderDate estimate using the Exponential Backoff method.

Anyway, we now place this into the calculation:

`select (308.504 / 31456) -- proportion of rows expected to be returned for orderDate`

* sqrt(1372.08 / 31456) -- proportion of rows expected to be returned for subTotal

* 31456 -- number of rows in the table

And looking at the Estimated Plan with the new CE we can see:

Therefore we can now see how the new Cardinality Estimator works for multiple column AND predicates in a WHERE clause.

As a note, in this example the new CE is absolutely spot on with its estimate compared to actuals, but this isn’t always the case as this is still an estimate based on a formula which is trying for a “best fit for all situations” and therefore will inevitably have flaws… but it’s worth knowing how it’s calculated when you’re testing in regard to an upgrade to the latest version.