Finally I think I’m done with the topic of Slowly Changing Dimensions for the moment. This doesn’t mean this is all you need to know about them, or that I’ve covered every angle and consideration (I haven’t), but this should now suffice for most people to make an informed and considered decision as to when to use them, how to use them, and how best to populate them.
But, with the latter in mind and with my previous post explaining how we use Triggers to synchronously populate our historical data, this last post is about doing so asynchronously because, contrary to popular belief, there are a couple of options you can try if you really do struggle with synchronous population becoming a bottleneck.
Triggers with CDC
Yes… Triggers. But this time ones you don’t have to wait for.
Previously I had triggers attached to my main dbo table and they were synchronously populating the history table. This is great in small volume but what if they become a bottleneck because we all know that a transaction cannot complete until the attached trigger itself has also completed. Well… why not offload the trigger somewhere else?
If we turn CDC on and begin to track our Dimension table then what happens? Well… a SQL Agent Job is created to read the Transaction Log of our database and losslessly collect and track all changes to our specified tables in a system table (in the case of dbo.DimCompany this, by default, would be cdc.dbo_DimCompany_CT). This collection is done asynchronously and has little to no impact on the performance of the server and should result in no blocking and no bottleneck.
Now, take the trigger that we wrote in the last blog, modify it slightly, and place it on the CDC system table instead. Now we have effectively created an asynchronous audit logger which should cause us no performance problems and never slow down our main table and transactions.
The only downside to this is that you will have a slight delay on your data being processed into history depending on how busy your log is… but I’ve yet to find anyone who can’t cope with a couple of seconds delay. Oh, and the complexity of running CDC etc.
Yes, it’s good old service broker. This is an option which allows processes to work asynchronously, in parallel, and with scalability.
Sadly there’s the obvious downside involving the setup, running, and monitoring of Service Broker, but if you can get beyond that then you’ll be okay.
What you need is a few procs instead of triggers, each one tailored to dealing with the historical processing of its respective table and you’re good to go. If it gets overwhelmed Service Broker can spin up new parallel threads to kick off more procs and clear your backlog nice and quickly. This is probably the fastest way to keep your data up to date and tracked compared to CDC.
Neither of these may work well if you want a single table solution to your slowly changing data as they may not be fast enough for you, but Service Broker may come close. But for the audit out to history tables I would most definitely choose one of these options as opposed to the Synchronous Trigger method.