Slowly Changing Dimension Designs: History tables

In a previous post I went through what a Slowly Changing Dimension is and how it can be a pain to all and sundry when designing a database. Before I move on to implementation in regard to code, I just wanted to touch on a few design thoughts in regard to the tables themselves as there is more than a couple of ways to store a slowly changing dimension. There’s no right or wrong, therefore these are just a few ideas to consider depending on what you feel suits you best or you’re most comfortable with.

Basic Table

This is the basic setup I’ll be using (single line formatting for compactness only):

We’re going to deal with the fact that Pear Computers has changed its name over the years and we need to know what it was on a Point In Time basis.

Date Based with History tables

In this method we add To and From dates to our table in order to show when a value was active, but instead of adding any additional keys to our table we keep only current data in our dbo table and log all older records out to another table, for example maybe using the “History” schema. For example, our company table changes as follows:

Summary (main points, not exhaustive)

Pros
• Clean separation of data between current and historic
• Current data can be accessed with much fewer reads
• Primary Key can be maintained in Current data
• Very good if the majority of your queries just want current data
• Allows for current and historical data to be indexed independently

Cons
• Multiple tables to keep in sync
o Can make deployments more complex
o Misaligned tables could cause code errors
• Not ideal for Point In Time queries, better for current data and infrequent historical lookups
• Needs composite Primary Key for historical data which mis-aligns with the dbo table.

Personally I like this approach simply for tracking history rather than for querying it. I think it lends itself well to an audit, but does make for confusing queries when you have to code everything to look at multiple tables when creating a report, for example. You could obviously hide the whole thing behind a view, but unless you use filters to partition your view you may find you don’t get any performance gains compared to using just one table with a date range as per my two previous posts.

Leave a Comment

Your email address will not be published. Required fields are marked *