Automatic Update Statistics Threshold
Published: Mar 27, 2015
This is a trace flag I only learnt about recently which actually fixes something that has been a large bug bear of mine for quite some time.

Basically we all know that statistics are one of the most important things in SQL Server and therefore it’s also critical to know how they’re maintained.

Inside SQL Server, if you have Automatic Update Statistics turned on in your database, your statistics will be updated each time 20% of your data has changed. This flag changes that behaviour for the better.

For example, imagine that you have a table of 1,000,000 records. That’s pretty normal these days by many people would be considered a small table. However, even with this table you would need 200,000 records to change before SQL Server would decide to update your statistics.

Let’s consider how bad this could be:

Imagine we have a telephone directory of 1,000,000 people. That’s small, but we’ll go with it for our example.

Anyway, in our example we have just 100 people with a surname beginning with Z.

Let’s set this up with the following piece of code:

if object_id('testStat') is not null drop table testStat
go

-- create basic table with primary clustered key
create table testStat
(
  
id int identity (1, 1),
  
firstName varchar(16),
  
lastName varchar(24),
  
constraint pk_testStat primary key clustered(id)
)
go

declare @counter int = 1

begin transaction
while
@counter <= 1000000
begin
   insert into
testStat
  
select convert(varchar(16), replace(newid(), '-', '')),
          
convert(varchar(24), replace(newid(), '-', ''))

  
-- insert 1 million random character records

  
select @counter += 1
end
commit transaction
go

-- remove any record beginning with Z
update testStat
set lastName = 'S' + right(lastName, 23)
where lastName like 'Z%'
go

-- ensure there are 100 records beginning with Z
update top (100) testStat
set lastName = 'Z' + right(lastName, 23)
go

-- create an index on lastName, therefore creating associated stats
create index testStatLastName on testStat(lastName)
go


Okay… so now we have our example table, let’s run a basic query to obtain the first and last names of the 100 people with surnames beginning with Z:

select firstName, lastName
from testStat
where lastName like 'Z%'




As I would have expected, the stats told the optimizer that we would be getting roughly 100 rows (the estimate was actually 104)… therefore it wisely chose a Nested bookmark lookup. This is ideal and fast.

Perfect.

However, let’s now presume that we have a new craze and everyone changes their surname for a giggle... In this case we have 180,000 new people with Z surnames:

update top (180000) testStat
set lastName = 'Z' + right(lastName, 23)
where lastName not like 'Z%'
go


Now, looking at the estimated execution plan for our same firstname / lastname query provides the following:



Yes, it’s exactly the same!!! The optimizer still believes it’s receiving just 104 records.

The problem here is that in the real world this would likely now need to be a simple table scan to avoid massive overhead on a nested loop but less than 20% of records were changed and therefore the statistics never updated accordingly and suddenly we have cardinality issues.

So… what does this trace flag do? Well, it adjusts the percentage of records that need to change before stats are updated according to the number of records in the table.

This whole trace flag is described in detail on this MSDN page (Changes to automatic update statistics) from which I’ve taken the following graph:



Therefore you can see that for the exact example I gave, with 1 million records, the threshold is lowered to about 2.5% of records. My changing 180,000 records would therefore have triggered a rebuild. In fact we can look at this ourselves by re-running the exact same code as above, but including “dbcc traceon (2371, -1)” at the start.

Doing so gives us the same initial nested loop, but after making my 180,000 updates I now see the following:



Which is a much more sensible way to approach the issue.

Therefore, if you have large data and are struggling with cardinality and slow updating statistics, I would urge you to try out this flag in your environment.
Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

Your SQL Trainer
Kevin Urquhart

I am a SQL Server DBA, Architect, Developer, and Trainer for SQL Training. This is my blog in which I’m simply trying to share my SQL knowledge and experiences with the world.

Kevin Urquhart
SQL Server Consultant, London

Categories


Archives


Copyright © 2018 London SQL Server Training Courses @ SQL Training | Website Design From Scott Heron