In all databases, data mart architects and developers have inevitably been faced with the one peril we all wish was easier… Slowly Changing Dimensions.
If you’ve heard the term and know all about the frustration they can cause then please skip ahead to my next blog post about Temporal Tables which in most, but definitely not all, scenarios can be a mini saviour. However, if you aren’t aware of Slowly Changing Dimensions or aren’t sure what they are, then read on…
Basically, in an OLTP database we would likely have a heavily normalised and fast flowing system and aren’t too fussed about ease of reporting… that’s where the data warehousing comes in.
In data warehousing, whichever version you choose to look at (Kimball or Inmon), reporting is usually at the forefront and hence we denormalise the data to better support this into, not always but usually, a Star Schema with Dimension and Fact tables. (Note, this is VERY rough and not meant to be an in depth split between the types of data stores in use)
This can work just fine for a lot of companies who simply care about facts and figures, total costs grossed over groups of products etc. but it can be a real pain if people want reports at granularity in which Point In Time data is required.
Here’s a VERY simplistic example to show what I mean:
Let’s say we have two Dimensions, one holding company names, the other Industry, and one Fact table for total sales by month… they contain the following data:
All our reporting requirements need are things like:
• Company Name and Sales by Quarter
• Industry Name and Sales by Year
This is simple enough and requires no complexity. However, what if Point In Time reporting is needed? What if Pear Computers used to be called “Pear Technology Ltd” and we wish to know what the sales were against the company as a whole and for each name? Then things become more difficult as we have to track changes to Dimensions over time. Because these are Dimensions and will change infrequently (it’s not often a company changes its name, for example), these are called Slowly Changing Dimensions.
Why are they a pain? Well, I’ll discuss that more fully in upcoming posts but, for the moment, imagine what you have to do in order to meet the new reporting requirements? We could change our company table to look like this:
That solves the problem, but look at the complexity we’ve immediately added… on insert/update/delete we need to amend the from and to dates, maybe add new records etc. We will need to make different joins to resolve queries in order to ensure that the dates in the Fact table are joined to the correct dates in the Company table. We need to possibly reconsider our indexing on Dimension tables to support date based queries. Our Primary Key has now changed as companyID is no longer unique therefore do we add a new surrogate identity to the table or make the key companyID & fromDate? And this would likely be over all Dimension tables in our model which could be numerous.
As mentioned, I’ll discuss some options in upcoming posts, but this at least lets you know what a slowly changing dimension is, how and when it can occur and, most importantly, why it can be such a pain and therefore needs serious consideration and planning prior to designing any data mart solution.