Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
A Senior SQL Server DBA based in the North West of England
A Senior SQL Server DBA based in the North West of England

(cheeky chap)
Currently, I am interested in the impact of enabling change tracking on an environment with a heavy OLTP workload. The current scope is to determine the reduction in performance and the additional storage required by any underlying internal objects associated with change tracking. As per my earlier post about Extended Events, I will perform some HammerDB tests. As I will not be using “keying and thinking time”, this will represent a best effort workload where HammerDB drives SQL Server to its peak performance. This is likely not representative of many OLTP workloads, as not every workload attempts to drive SQL Server to its limit in an artificial manner. Still, we will measure the impact regardless.
The high-level goals were as follows:
I will also preface the above with the disclaimer that I am not an expert in Change Tracking, so some config or detail may have been missed. As always, make sure you put in the work and do your own testing on your application before enabling change tracking.
Due to the DeWitt clause, the actual Transactions Per Minute and New Orders Per Minute results will not be published; only the performance change over a baseline workload with change tracking (CT) enabled.
Here is the set of tests:
For each configuration, I ran five workloads to give me statistically valid results using a 95% confidence interval. The TPCC database was built specifying 640 warehouses and was around 90GB in size. The two VMs involved in the testing were an application server with 8 vCPUs and 24GB of RAM, and the database server was a 16-core box with 32GB of memory. The storage was NVME. I specified 1 minute of ramp-up time with a test period of 4 minutes, with a checkpoint occurring at the end. I also built in an extra minute after each run, as I was using autopilot and wanted the servers to settle down between runs.
The version of SQL Server was SQL Server 2022 (16.0.4215.2).
Once the warehouse was built and the HammerDB best practice config changes were deployed, the following CT config was setup:
ALTER DATABASE TPCC SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON); ALTER TABLE district ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON); ALTER TABLE item ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON); ALTER TABLE stock ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON); ALTER TABLE warehouse ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON);
Note that only some of the tables were eligible for CT as a primary key is required.
Below is the impact on the HammerDB workload with CT enabled. Both Transactions Per Minute and New Orders Per Minute were recorded:

As we can see, the impact on performance was massive once the number of Virtual Users went past 4. Again, I emphasise that this is an artificial workload and your application may not be hit as hard.
Whilst the testing was in progress, it was abundantly clear that performance was much worse, so at 32 Virtual Users, I decided to run sp_blitzfirst over a 5-minute test period to see what SQL Server was waiting on:

Looks like we have excessive PAGELATCH_** waits. As a sanity check, I turned off CT and re-ran a 32 Virtual User workload to check that this is not present on the baseline run:

As we can see above, the buffer latch waits are absent when CT was switched off.
I decided to bust open the page at the heart of the contention using DBCC PAGE:
-- I identified the resource using this query (link at the end of the post)
SELECT wt.session_id,
wt.wait_type,
er.last_wait_type AS last_wait_type,
wt.wait_duration_ms,
wt.blocking_session_id,
wt.blocking_exec_context_id,
resource_description
FROM sys.dm_os_waiting_tasks AS wt
INNER JOIN sys.dm_exec_sessions AS es
ON wt.session_id = es.session_id
INNER JOIN sys.dm_exec_requests AS er
ON wt.session_id = er.session_id
WHERE es.is_user_process = 1
AND wt.wait_type <> 'SLEEP_TASK'
ORDER BY wt.wait_duration_ms DESC;
--Looking at the page
DBCC TRACEON(3604);
DBCC PAGE(5, 1, 5631744, 3) WITH TABLERESULTS;

The contents of the page looked like change tracking’s internal tables, so I have decided to put this new contention down to the CT.
Other metrics I captured via sp_blitzfirst were as follows:
The amount of work SQL Server is getting through has tanked.
I must confess I spent very little time on this aspect once I realised how epic the actual impact of CT on the application was. Nevertheless, I ran this query courtesy of Brent Ozar:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; GO select sct1.name as CT_schema, sot1.name as CT_table, ps1.row_count as CT_rows, ps1.reserved_page_count*8./1024. as CT_reserved_MB, sct2.name as tracked_schema, sot2.name as tracked_name, ps2.row_count as tracked_rows, ps2.reserved_page_count*8./1024. as tracked_base_table_MB, change_tracking_min_valid_version(sot2.object_id) as min_valid_version FROM sys.internal_tables it JOIN sys.objects sot1 on it.object_id=sot1.object_id JOIN sys.schemas AS sct1 on sot1.schema_id=sct1.schema_id JOIN sys.dm_db_partition_stats ps1 on it.object_id = ps1. object_id and ps1.index_id in (0,1) LEFT JOIN sys.objects sot2 on it.parent_object_id=sot2.object_id LEFT JOIN sys.schemas AS sct2 on sot2.schema_id=sct2.schema_id LEFT JOIN sys.dm_db_partition_stats ps2 on sot2.object_id = ps2. object_id and ps2.index_id in (0,1) WHERE it.internal_type IN (209, 210); GO
Here is the output after 2 plus hours of load testing:

As I have not really used CT that much, I do not intuitively know if these are extreme values or not compared to the base tables. Please draw your own conclusions. My hunch says this is excessive (but HammerDB is an extreme load test).
Does cleanup impact things? Absolutely no idea, as I did not get that far. However, I have seen out in the field that the cleanup job can get blocked by an application. It is also likely that the reverse is also possible, so ensure you test things.
Here are the things I would do if I had more time:
On a heavy OLTP workload, Change Tracking impacted performance in escalating severities as the number of concurrent processes increased. At extreme levels, excessive latch contention was observed on internal objects associated with Change Tracking.
My main takeaway is that your application needs to be tested in a reasonably representative environment before enabling CT, especially on busy tables.
https://www.brentozar.com/archive/2014/06/performance-tuning-sql-server-change-tracking
Great mental reference to have. Good stuff Ed.