The cost of storage continues to decrease, and larger volumes of data are demanded analyses and insights. These factors push us to store even more data than ever before. A popular trend nowadays is to store every change to every record. It might seem like a massive undertaking to start tracking every single record change; however, a huge amount of value can be extracted from such detailed record keeping. This post provides details on how to easily track data over time and how to query it in different ways using SQL Server temporal tables.
Before diving in, it is important to have a goal in mind. There are several use cases where tracking changes is valuable:
- Time-traveling – viewing data at specific points in time.
- Auditing – viewing all changes to data and who made them.
- Historical reporting – using history to build aggregates of past periods.
- Time-series analysis – using changes in values to trend and predict future values.
- Recovering old state – using old records to recover a corrupted change.
There are several ways to track changes in SQL Server. Before any of the recent options, developers had to build that logic into application or use triggers to track changes. This was a very tedious process and error-prone. SQL Server has eased the pain over time and released several features to make this process easier. The first was SQL Server Change Tracking. This was limited in its functionality and is a legacy feature. The most recent additions have been Change Data Capture (CDC) and Temporal Tables. Both are great options, and each have their pros and cons. For the use cases listed above, temporal tables are the better choice. They are much easier to set up and manage, and they lend themselves well to these use cases due the query support built in SQL Server.
To add change tracking to a new table, add the SYSTEM_VERSIONING table option when creating the table. In addition, two columns for the start and end timestamps must be included in the table definition as well.
CREATE TABLE StockPrice ( Id INT NOT NULL PRIMARY KEY CLUSTERED, Symbol VARCHAR(50) NOT NULL, Price DECIMAL(10,3), SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START NOT NULL, SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END NOT NULL, PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime) ) WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.StockPriceHistory));
To add change tracking to an existing table, add columns for the start and end timestamps and alter the table and set the SYSTEM_VERSIONING option.
ALTER TABLE StockPrice ADD SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START HIDDEN CONSTRAINT DF_StockPrice_SysStartTime DEFAULT SYSUTCDATETIME(), SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN CONSTRAINT DF_StockPrice_SysEndTime DEFAULT CONVERT(DATETIME2, '9999-12-31 23:59:59.9999999'), PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime); GO ALTER TABLE SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.StockPriceHistory));
Note that if the history table does not exist, SQL Server will create the history table; however, for more control, the table can be manually created and custom indexes may be added. The major requirement is that the history table matches the schema of the temporal table.
Now that the temporal table is set up, any changes to the data will tracked in the history table. This occurs for all inserts, bulk inserts, updates, deletes and merges performed on the temporal table. When a record is inserted in a temporal table, it is only created in the temporal table with the start timestamp of the time it's added and an end of the max `DATETIME2` value. When a record is updated, the record prior to the update is copied to the history table and the end timestamp of the historical record is changed to the time of update. A similar thing happens when a record is deleted from a temporal table. The record prior to the delete is copied to the history table and the end timestamp of the historical record is updated to reflect the time of deletion.
Navigating Data Through Time
Querying the data is straightforward once a table set up as a temporal table. Either query the temporal table for the latest records or query the history table for the historical records. If both historical and present data are required, the two tables must be unioned together.
With that being said, the real power comes with the query support for temporal tables and their history counterparts. T-SQL makes it easy with the `FOR SYSTEM_TIME` expression from the table source portion of the `FROM` clause. This expression can be specified after a the first table in the `FROM` clause and after subsequest tables that are joined to the first table.
There are five options for specifying the relevant point in time in the `FOR SYSTEM_TIME` expression:
- `AS OF <date_time>` - query records at that point in time (time travel)
- `FROM <start_date_time> TO <end_date_time>` - query records with any versions active during range with exclusive end for records becoming active
- `BETWEEN <start_date_time> AND <end_date_time>` - same as above with inclusive end for records becoming active
- `CONTAINED IN (<start_date_time> , <end_date_time>)` - query records that were opened and closed within the timeframe inclusive start and end
- `ALL` - query all history and current records
Query MSFT stock price at a specific point in time:
SELECT Price FROM StockPrice FOR SYSTEM_TIME AS OF '2021-03-01 09:30:00' WHERE Symbol = 'MSFT'
Query all price data for GOOG including the most recent record:
SELECT Price, SysStartDate FROM StockPrice FOR SYSTEM_TIME AS ALL WHERE Symbol = 'MSFT'
There is wealth in data, and storage is cheap. Powerful business solutions require more and more data. SQL Server makes it easy to store and query historical data using temporal tables. All it takes is a bit more configuration and a lot of the heavy lifting is done. Start tracking more data and gain even more business insights than before.