In a previous post I went through what a Slowly Changing Dimension is and how it can be a pain to all and sundry when designing a database. Before I move on to implementation in regard to code, I just wanted to touch on a few design thoughts in regard to the tables themselves as there is more than a couple of ways to store a slowly changing dimension. There’s no right or wrong, therefore these are just a few ideas to consider depending on what you feel suits you best or you’re most comfortable with.
This is the basic setup I’ll be using (single line formatting for compactness only):
drop table if exists dbo.FactSalesByMonth, dbo.DimIndustry, dbo.DimCompany
create table DimIndustry(industryID tinyint identity, industryName varchar(100) not null)
create table DimCompany(companyID tinyint identity, companyName varchar(100) not null)
create table FactSalesByMonth(salesID int identity, industryID tinyint, companyID tinyint, dateID int, salesAmt money)
alter table DimIndustry add constraint pk_industry primary key clustered(industryID)
alter table DimCompany add constraint pk_company primary key clustered(companyID)
alter table FactSalesByMonth add constraint pk_sales primary key clustered(salesID)
alter table DimIndustry add constraint fk_salesIndustry foreign key(industryID) references DimIndustry(industryID)
alter table DimCompany add constraint fk_salesCompany foreign key(companyID) references DimCompany(companyID)
insert into DimIndustry select 'Technology'
insert into DimCompany select 'Pear Computers'
insert into FactSalesByMonth select 1, 1, 201801, 1000000
insert into FactSalesByMonth select 1, 1, 201802, 1000000
insert into FactSalesByMonth select 1, 1, 201803, 1000000
insert into FactSalesByMonth select 1, 1, 201804, 1200000
insert into FactSalesByMonth select 1, 1, 201805, 1200000
insert into FactSalesByMonth select 1, 1, 201806, 1200000
select * from DimIndustry
select * from DimCompany
select * from FactSalesByMonth
We’re going to deal with the fact that Pear Computers has changed its name over the years and we need to know what it was on a Point In Time basis.
Solely Date Based
In this method we simply add To and From dates to our table in order to show when a value was active. For example, our company table changes as follows:
drop table if exists dbo.DimCompany
create table DimCompany
companyID tinyint not null,
fromDate datetime not null,
toDate datetime not null,
companyName varchar(100) not null
insert into DimCompany select 1, '2000-01-01', '2018-03-31', 'Pear Technology Ltd'
insert into DimCompany select 1, '2018-04-01', '9999-12-31', 'Pear Computers'
Summary (main points, not exhaustive)
• Nice and simple to understand
• Easy to implement
• Easy to find latest record via toDate
• Need a new Primary Key
o Composite across CompanyID & FromDate
o Add surrogate and use that
• Difficulties with Primary/Foreign key relationship to Fact tables
o Extra columns in Fact Table to support
o Pass through Surrogate key
(Not ideal for CompanyID level aggregation)
As per the above, this is easy to implement and is a clear and clean design for the Dimension table. However, it can be a problem in terms of foreign keys and aggregation within the Fact layer which could lead to degraded performance.
For example, method 1 (using full foreign key in Fact table):
This could cause problems because now you need to aggregate over 2 columns which may require additional indexing, wider indexes, and higher reads.
Using method 2 (surrogate key) we see the following:
As the above shows, if you want to aggregate over companyID then this is less clear in the Fact table and, if there were a lot of historical values for the company in question, could make for a much less performant query as a whole.
The other option is to put companySurrogateID AND companyID into the Fact table to cater for both types of query. This is similar to the “parent” option I’ll cover in my next post.