IIUG World 2019

IIUG 2020 Online World Conference Presentations will begin soon!

Thursday, March 15, 2012

Bitemporal Data - Is this the next big thing?

I know you.  You are all thinking "What the heck is that?  What's Art babbling about now?" The concept of bitemporal data deals with the temporal state of our data from two separate perspectives.  Hang in there, I've got to spew the buzz words before I can explain them.  Here is what that all means.  

Data changes over time.  But there are two separate timescales.  The first is the date/time at which a value was modified, and, as part of that, what the value was/is before and after that transition time.  The second timescale or perspective answers the question: For what range of dates/times was each version of the value valid?  Here's an example that I encountered once, in a previous lifetime:  Sales teams.  I had to build a reporting system for sales teams that could cope with multiple levels of reporting (I won't even deal here with the fact that the number of reporting levels was different from one division to another - that's a separate article).  The composition of the teams, the sales divisions each team belongs to, the sales region that owns which groups of divisions, and who manages each level all of those elements were changing periodically.

What happens when a salesperson moves from one team to another in the middle of a reporting period?  How do you account for the change if the reports have to show sales earned while a member of each team attributed to each team's manager and to the correct division.  Meanwhile the individual's sales need to all be attirbuted to that one salesperson.  To make matters worse, the team change was made on the 15th of the month retroactive to the 15th of the previous month and the reporting period is from the 1st of the month to its end.  Compounding it all, last month's reports have to be recalculated.  Lastly, what happens next year when the auditors want to know why the sales figures for the two divisions were recalculated after the fact, want to see an audit and the original figures?

Honestly, when I did this, management pooh pooh'd the most difficult parts of this whole scenario and produced simplified rules that got me off the hook to do it right.  But in general, that may not happen to you if you are hit with something similar.  So, what's the solution?  Bitemporal data versioning.  The ISO/IEC (JTC1 SC32 WG3) SQL committee recently published SQL extensions to cope with bitemporal data.  MS SQL Server, Oracle, and DB2 are each partially compliant since they have had some supporting features for a little while (not long actually) but us poor Informix zlubs got nothing.  OK end of complaint which isn't the purpose of this post.  

You can implement BTD (getting tired of typing it all out so BTD will have to do now) yourself without server support, though support makes things SO much easier, especially at query time.  Each table needs two pairs of date columns:
  • First_Effective_Date
  • Last_Effective_Date
  • Date_Added
  • Date_Invalidated
With these columns in place and the Date_Added and First_Effective_Date made part of the primary key of the table you can implement my nightmare scenario and survive the exercise.

"Wait", you say, "I get the effective dates, why the added and invalidated dates?"  Well, that implements the 'B' part of BTD.  What if on the 15th we change that salesman's team to team Victor.  Then on the 1st of next month we generate the sales report for this last month and regenerate the previous month's adjusted sales report.  Senior Management looks at the report at noon on the 2nd but accounting updated their reports at 9AM.  At 12:15 the Sales Manager calls his assistant and says: "Dude, you screwed up!  I said move John K. from Team Jones to Team Hector not Team Victor.  Go fix this and get me new reports!"

So by 9AM on the 2nd there will be three versions of the sales reports for the prior month and two versions of the reports for last month.  They all have to be reproducable for auditing purposes at any time.  That's why we need the four dates.  Yes, we could live with just one effective data and one valid date, but the queries to determine which row is the one we need to report will get VERY hinky indeed as you would have to look ahead for future effective and valid dates that might invalidate the version you are looking at.

Is this all immensely complex and immensely confusing?  You bet.  Would an implementation of the BTD SQL standard in Informix ease the pain?  Probably.  Do you  REALLY care about BTD?  Well, you should because one day, when you least expect it, BTD is going to jump out and bite you on your... ankle. ;-)  

There is a LinkedIn group called Temporal Data (which includes BTD: http://www.linkedin.com/groups?gid=3885228&trk=myg_ugrp_ovr) that you can join if you want to learn more and join in the fun.  There are several experts that post and lurk there and if you look through the posting history, you will find links to BLOG posts and other documents explaining the issues and solutions more thoroughly.