This is an issue I’ve only seen a few times in our system, but when it does appear the effects are very noticeable indeed.
What I witnessed was that replication suddenly slowed down on a machine and the CPU started to rise to uncomfortable and unusual levels.
Benchmark for the server:
CPU levels indicating something was very wrong:
In order to find which replication was the problem I disabled all replication jobs on the server and then ran Performance Monitor. This was then left running as each replication was turned on and off. Doing this made it quite clear as to which replication was at fault as turning on one particular replication caused the following:
This narrowed down my search quite considerably and therefore I switched all other replication jobs back on and settled down to poke around in the trouble maker.
What I found was that there was a degrading performance of sp_MSenumgenerations90 and the best I can fathom is as follows (though this may not be correct, this is just my take):
There were numerous records “stuck” in MSmerge_genHistory with genStatus = 4. These had been in the table for well over a week and that should not have been the case. This seemed to be causing sp_MSenumgenerations90 to repeatedly scan the whole table on a loop.
At this point I was tempted to delete these records as they were my clear suspects, but that seemed a little extreme, therefore I went for the option of simply updating them to something more sensible such as 0. Therefore, using the guid of the offending rows it was a simple case of running an update to “reset” the record:
set genStatus = 0
where guidsrc = '3F4178FC-885B-4235-A4BB-E8324967EED1' -- for example
and genStatus = 4
Within seconds the results were obvious as the CPU dropped instantly back to normal levels:
Hopefully this could help someone else who’s encountered a similar issue as I couldn’t find this documented anywhere which means it could well have been an anomaly on my system, although it has happened a few times now.