Temp Tables vs Variable Tables
Published: May 19, 2014
This is possibly one of the most contentious issues I’ve come across in SQL Server. If you look online then there are numerous people arguing numerous aspects of these two types of table and weighing the benefits and pitfalls of each. This includes where they’re stored, how they’re used, which is best etc.

Well I figured that it was time for me to give my own spin on the topic so I’ve spent some time poking around in SQL Server to see what I can find and deduce. So here it is… my musings over a topic that tends to upset all and sundry which therefore means I’m likely to do the same. I’ll try not to disappoint by including wild accusations, general stabs in the dark, and back it all up with flimsy evidence at best.

No time like the present… let’s get started.

Where are they stored?

Okay, here’s a fun place to start as I’m going for the tried and tested “it depends”.

I think this is where people tend to get hung up and arguments can break out. There’s a large camp stating that table variables are held solely in memory and therefore are faster than # tables. On the flip side we have the group who strongly dispute this, stating that they are both held in tempDB and therefore this can’t be the cause of the speed difference.

Well, in my opinion (dangerous way to start a sentence), I think they’re both kind of right.

And no… before I’m accused, I’m not sitting on the fence on this one I’m actually using some logic at least. The way I view it is that SQL doesn’t tend to talk directly to disks… it reads from and writes to RAM. Therefore whichever table type you create, this will be logged in tempDB as an object, but the data you then insert into it will be in RAM. This is the same for both table types. It is logged and therefore will get written to tempDB, but the data remains in memory for both table types.

The exception to this is obviously if you have massive memory pressure and the tables spill to disk. But this can happen with both table types.

Therefore as I see it storage should be similar for both… a combination of RAM and tempDB.

Let’s have a look.

Both tables being shown residing in tempDB…

use boredDBA
go

-- clear up any rogue temp table called #myTempTable
if object_id('tempDB..#myTempTable') is not null drop table #myTempTable
go

-- shows zero user objects in the tempDB database
select *
from tempDB.sys.objects
where type = 'U'
go

-- create a temp table
create table #myTempTable
(
  
id int
)
go

-- have a look in tempDB...  there it is...
select *
from tempDB.sys.objects
where type = 'U'
go

-- create a variable table
declare @variableTable table
(
  
id int
)

-- have a look in tempDB...  yup - there it is alongside your other table!!!  In tempDB!!!
select *
from tempDB.sys.objects
where type = 'U'




Why the difference in performance?

Okay, so we’re happy that both effectively reside in tempDB. So why the performance difference? Well one hint can be gleaned from the transaction log which shows a distinct difference between the two…

if object_id('tempDB..#table') is not null drop table #table

checkpoint

declare
@table table
(
  
id int identity(1, 1),
  
things varchar(20)
)

select * from fn_dblog(null, null)
go

checkpoint

create table
#table
(
  
id int identity(1, 1),
  
things varchar(20)
)

select * from fn_dblog(null, null)
go




From this we can easily see that there is a decent overhead involved in creating a # table compared to that of a variable table. Obviously this overhead would become proportionally smaller the larger the dataset and this would maybe account for some of the performance advantages that a variable appears to have over # tables with small datasets. In addition to this, because the # table is maintained against a spid (or more if a ## table) SQL Server also has to log enough information to allow for a rollback whereas a rollback has no bearing on a variable table. This again increased the proportional overhead on a # table.

Based on this a variable looks to be a no brainer, so why is it that people tend to only use them with small datasets? Well this comes back to my comment about “proportional” overhead. Once the table reaches a certain size they have almost the same “cost” to create and populate and therefore performance should be at worst equal for the # table? Well no, because it has additional weapons which really help when larger datasets are being used.

Key Differences with larger datasets

SQL Server has one last trick up its sleeve with # tables… statistics. SQL Server will generate statistics on # tables for use with queries run against the data. This is not something it does with variable tables. Also, SQL Server allows multiple indexes to be created against a # table which again increases the speed at which it can be queried. Therefore as you can imagine, with a large dataset used in anything other than a table scan this makes a # table increasingly appealing and exponentially faster. Also, it’s useful to note that because of these statistics SQL Server can also make use of parallel processing with # tables… something it will not use with variable tables.

Here’s a quick example taken from my machine as to the difference in speeds when a large dataset and index is used…

if object_id('manyRows') is not null drop table manyRows
if object_id('tempDB..#manyRows') is not null drop table #manyRows
go

set nocount on
set statistics
io off
set statistics
time off

create table
manyRows
(
  
id int identity(1, 1) not null,
  
miscNumber int not null
)
go

declare @counter int = 1
begin transaction
while
@counter <= 1000000
begin
   insert into
manyRows
  
select floor(rand(checksum(newID()))*10000)

  
set @counter += 1
end
commit transaction

create table
#manyRows
(
  
id int not null,
  
miscNumber int not null,
  
constraint pk_manyRows primary key clustered(id)
)
create index ix_miscNumber on #manyRows(miscNumber)

declare @manyRows table
(
  
id int not null,
  
miscNumber int not null
  
unique clustered(id)
)

insert into #manyRows
select *
from manyRows

insert into @manyRows
select *
from manyRows

set statistics io on
set statistics
time on

print
'hashCount'

select count(*)
from #manyRows
where miscNumber between 1000 and 1100

print 'varCount'

select count(*)
from @manyRows
where miscNumber between 1000 and 1100






As you can see… even in this very simple example the # table has already started to outperform the variable due to indexes and statistics which you can see working in the execution plans where we can easily see the index seek versus the full table scan of the variable…





Well, hopefully this helps give you an idea as to the differences between the two alongside when and how to use them.
Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

Your SQL Trainer
Kevin Urquhart

I am a SQL Server DBA, Architect, Developer, and Trainer for SQL Training. This is my blog in which I’m simply trying to share my SQL knowledge and experiences with the world.

Kevin Urquhart
SQL Server Consultant, London

Categories


Archives


Copyright © 2018 London SQL Server Training Courses @ SQL Training | Website Design From Scott Heron