SQL Server 2016 CTP 3.0 is available, and some of you already played with it. There are great set of new features for BI in 2016 in all areas; SSIS, MDS, SSRS, and SSAS. However there are some features which are part of Database Engine but still relates to BI and Data Warehousing. Temporal Tables is one of them. With the new Temporal Tables feature you can create database tables that could easily store time spanned information for historical changes. You can implement Slowly Changing Dimension with temporal tables without complex ETL implementation. In this post I’ll explain what temporal table is, how you can use it to implement SCD, and what are pros and cons of this method.
Temporal Tables
Temporal tables are new type of database tables introduced in SQL Server 2016, these tables are system-versioned and keep history of changes (insert, delete, update) of everything happened on data rows. Retrieving change log from these tables are easy. These tables can simply tell you what was the data at specific point of the time in the table. These tables works with datetime2 columns to keep FROM DATE and TO DATE information of each change. This means these tables can be used for implementing changes in dimensions, yes you know what it called; Slowly Changing Dimension!
How Temporal Table Works?
Temporal table is a regular table in SQL Server with few changes in the structure; There will be always two tables when we talk about a temporal table: A main table that holds the most current up-to-date records and values, and a history table which keep only previous states of each record. History table can be automatically generated or manually configured. When you write create script for a temporal table you need to add three mandatory columns, and reference to history table.
CREATE TABLE Department ( DeptID int NOT NULL PRIMARY KEY CLUSTERED , DeptName varchar(50) NOT NULL , ManagerID INT NULL , ParentDeptID int NULL , SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL , SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL , PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime) ) WITH (SYSTEM_VERSIONING = ON);
When any inserts, updates, or deletes happens on the data in the main table, history of that change will be stored in history table
At the time of retrieving data (querying data) from temporal table if you just query current data only data from main table will be fetched, but if you query any historical information then history table and temporal table will participate in building the result set
Slowly Changing Dimension
I just go quickly through this step as you are probably familiar with it. Slowly Changing Dimension is about values in dimension attributes that changes. Depends on the requirement we want data warehouse to act differently per each change. For example sometimes a change in name is not important, so it can be updated, and no history log is required. However a change in customer’s city might be important and we want data warehouse to keep previous city as well as the new city and the date and time of the change. Because there are different behaviours for each type of change, then there are different types of SCD (Slowly Changing Dimension). So far I’ve heard these types: 0, 1, 2, 3, 4, 6, and 7!
The most common types are type 1 and 2. Type 1 means the record will be updated with no history log. Type 2 means we keep the history with two additional column; From Date and To Date. with these two records we will set existing record as obsolete (with setting an end time) and will create a new record (with new start time). Here is an image that illustrates three different SCD types;
As you can see in type 2 all history will be kept in the same dimension table. However in temporal tables we have two tables for keeping information, and history table is separate. This architecture is more similar to SCD Type 4, where the history table is different than the actual dimension table. This is not however exactly match to what Kimball method is for implementing SCD Type 4. Here is a sample SCD Type 4 ;
Pros and Cons of SCD Typ4
Type 4 is better than type 2 in terms of performance, the actual dimension table won’t be big with changes. and even if changes are a lot (if it is a rapidly changing dimension) performance still would be good, because the history table is separate.
Type 4 however needs more complex ETL scenario because you have to take care of two tables. Fortunately temporal tables made this step easy! you can create them easily, and query them easily.
Temporal Table in Action
Creating the Table
Now is the time to go through a very quick demo and see how temporal table works in action. Let’s start it with creating a simple temporal table. Run this script to create a temporal table for Department;
CREATE TABLE Department ( DeptID int NOT NULL PRIMARY KEY CLUSTERED , DeptName varchar(50) NOT NULL , ManagerID INT NULL , ParentDeptID int NULL , SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL , SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL , PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime) ) WITH (SYSTEM_VERSIONING = ON);
script above will create the history table automatically. However you can change the script slightly if you want to create or use your own historical table. Here you can find more information about how to use your own table as history table.
Some parts of above command needs consideration; There should be always a PERIOD FOR SYSTEM_TIME for this type of tables. and there always should be two datetime2 columns declared as GENERATED ALWAYS AS ROW START / END. The history table created in this example will be automatically named as MSSQL_TemporalHistoryFor_<object id of temporal table>. Here is a view of table created in SSMS;
You can also create the table with a predefined T-SQL in SSMS with right click on Tables and create a new System-Versioned Table.
This will generate the template T-SQL as below for you, you can then change and execute it.
USE <database, sysname, AdventureWorks> GO IF OBJECT_ID('<schema_name, sysname, dbo>.<table_name, sysname, sample_table>', 'U') IS NOT NULL BEGIN IF ((SELECT temporal_type FROM SYS.TABLES WHERE object_id = OBJECT_ID('<schema_name, sysname, dbo>.<table_name, sysname, sample_table>', 'U')) = 2) BEGIN ALTER TABLE [<schema_name, sysname, dbo>].[<table_name, sysname, sample_table>] SET (SYSTEM_VERSIONING = OFF) END DROP TABLE [<schema_name, sysname, dbo>].[<table_name, sysname, sample_table>] END GO CREATE TABLE [<schema_name, sysname, dbo>].[<table_name, sysname, sample_table>] ( <columns_in_primary_key, , c1> <column1_datatype, , int> <column1_nullability,, NOT NULL>, <column2_name, sysname, c2> <column2_datatype, , char(10)> <column2_nullability,, NULL>, <column3_name, sysname, c3> <column3_datatype, , datetime> <column3_nullability,, NULL>, <column4_name, sysname, SysStartTime> datetime2(7) GENERATED ALWAYS AS ROW START NOT NULL, <column5_name, sysname, SysEndTime> datetime2(7) GENERATED ALWAYS AS ROW END NOT NULL, PERIOD FOR SYSTEM_TIME(<column4_name, sysname, SysStartTime>,<column5_name, sysname, SysEndTime>), CONSTRAINT <constraint_name, sysname, PK_sampletable> PRIMARY KEY (<columns_in_primary_key, , c1>) ) WITH ( SYSTEM_VERSIONING = ON (HISTORY_TABLE = [<history_schema_name, sysname, dbo>].[<history_table_name, sysname, sample_table_history>]) ) GO
Modifying the Table
After creating the table then you can right click on it, and edit it. add new rows, and modify some, delete some records after few minutes and so on. I keep this example simple, but be aware that you can modify records of this table with different types of insert/update/delete commands. Here are more information about modifying system-versioned tables.
Querying the Table
If you want to query just current data you just write a select query as you do for any other tables;
SELECT [DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID] ,[SysStartTime] ,[SysEndTime] FROM [DW].[dbo].[Department]
and you will see the result simply;
If you want to query historical log or changes, you can query it in different ways. For example you can time travel and check how was this table’s data at specific point of the time with query below;
SELECT [DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID] ,[SysStartTime] ,[SysEndTime] FROM [DW].[dbo].[Department] FOR SYSTEM_TIME AS OF '2015-11-23 T22:01:20.7230011'
and the result;
As you can see data rows were different at that time, there were only two departments, and manager of IT department was also different person. Sales department were also available at that time, but it is not available in current data set (previous image)
You can query all changes for a period of time as below;
SELECT [DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID] ,[SysStartTime] ,[SysEndTime] FROM [DW].[dbo].[Department] FOR SYSTEM_TIME BETWEEN '2015-01-01' AND '2015-12-31' order by deptid
and the result;
Highlighted blue is showing Sales record which has been deleted (SysEndTime set to the time of deletion). Highlighted yellow is showing IT record that has been once created, and then changed.
So as you see it is possible to query the historical data combined with current data with one single simple query. Here are some methods that you can use to query data;
Source: https://msdn.microsoft.com/en-us/library/dn935015.aspx
There are also some sample queries here in this MSDN page.
Pros & Cons of Temporal Table for SCD Implementation
Easy to implement SCD
You have seen a demo of temporal table, so you know how easy is using these tables. SCD Type 4 is not an easy to implement scenario, However temporal tables make that process much smoother.
Inferred Dimension Member is NOT supported
Implementing some of SCD types (especially type 2 and 4) along side with Inferred Dimension Member has an old trick that needs to be managed. When an Inferred Member is detected, it will be inserted into the dimension with only three values: business key, from date, and dimension key. There will be no values for other attributes. The very first appearance of Inferred Dimension Member then will require an UPDATE on all of those fields, and this should be an UPDATE regardless of SCD Type 2 or 4 for creating a new record and setting the existing record with an end date. So here is the trick that temporal table won’t support built-in.
You need to take care of this while you are doing ETL (with SSIS or any other tools). One possible solution (I haven’t tried that myself yet, I’ll probably write a blog post about it later on) is that you can turn off system-versioning of the table temporarily before updating each inferred dimension member, and turn that back on afterwards. However when you are dealing with large number of rows, this process might take longer as it requires the update process of inferred dimension members to be done in a separate process.
References to Study More?
If you are looking for references to study more about temporal tables. here are two good resources for you:
- 20 minutes Channel 9 introduction video to Temporal Tables
https://channel9.msdn.com/Shows/Data-Exposed/Temporal-in-SQL-Server-2016
- MSDN Documentation on Getting Started with Temporal Tables
https://msdn.microsoft.com/en-us/library/mt604462.aspx
- MSDN General Documentation for Temporal Tables
This is very cool. Thanks for posting, Reza.
We will definitely be taking a close look at this.
Thanks
Stuart
Hi Stuart,
It is awesome to see you here, Thanks for your kind words 🙂
Cheers,
Reza
Excellent post, Reza! This is precisely the summary that I needed.
Thanks Chris for your kind feedback.
Cheers,
Reza
Well explained Reza. If I want to store more audit information, like who made changes to rows & when, can I capture that in the history table? How do we go about designing the tables? If you can give an idea, it would be great. Thanks!
Hi Sid,
Thanks for your feedback. At the moment user information won’t be stored by it’s own. However you can create fields for username, and from the application or insert/update/delete operations send the username along and store it in table(s). you have full flexibility to design table as you want. It has to have three mandatory fields mentioned above however.
Cheers,
Reza
Appreciate your response Reza. Thanks!