Slowly Changing Dimension Designs: Basic Date Range

In a previous post I went through what a Slowly Changing Dimension is and how it can be a pain to all and sundry when designing a database. Before I move on to implementation in regard to code, I just wanted to touch on a few design thoughts in regard to the tables themselves as there is more than a couple of ways to store a slowly changing dimension. There’s no right or wrong, therefore these are just a few ideas to consider depending on what you feel suits you best or you’re most comfortable with.

Basic Table

This is the basic setup I’ll be using (single line formatting for compactness only):

We’re going to deal with the fact that Pear Computers has changed its name over the years and we need to know what it was on a Point In Time basis.

Solely Date Based

In this method we simply add To and From dates to our table in order to show when a value was active. For example, our company table changes as follows:

Summary (main points, not exhaustive)

• Nice and simple to understand
• Easy to implement
• Easy to find latest record via toDate

• Need a new Primary Key
o Composite across CompanyID & FromDate
o Add surrogate and use that
• Difficulties with Primary/Foreign key relationship to Fact tables
o Extra columns in Fact Table to support
o Pass through Surrogate key
(Not ideal for CompanyID level aggregation)

As per the above, this is easy to implement and is a clear and clean design for the Dimension table. However, it can be a problem in terms of foreign keys and aggregation within the Fact layer which could lead to degraded performance.

For example, method 1 (using full foreign key in Fact table):

This could cause problems because now you need to aggregate over 2 columns which may require additional indexing, wider indexes, and higher reads.

Using method 2 (surrogate key) we see the following:

As the above shows, if you want to aggregate over companyID then this is less clear in the Fact table and, if there were a lot of historical values for the company in question, could make for a much less performant query as a whole.

The other option is to put companySurrogateID AND companyID into the Fact table to cater for both types of query. This is similar to the “parent” option I’ll cover in my next post.

Leave a Comment

Your email address will not be published. Required fields are marked *