Yesterday, we received this one solitary error e-mail from an application server hosting a message application (which I'll call XYZPDQ to protect the guilty innocent) at 21:30:23 UTC:
ABORTING - Exception: System.Data.Entity.Infrastructure.DbUpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlClient.SqlException: The insert failed. It conflicted with an identity range check constraint in database 'XYZPDQ', replicated table 'dbo.XYZPDQReplicatedTable1', column 'XYZPDQTable1Column1'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
The statement has been terminated.
We know it came from the application server because we found the matching error message in the machine's Event Viewer event log. It did not show up in the SQL Server logs.
Background: This messaging application was originally intended to have an active-active architecture whereby users could create messages in a Web front-end at either of two geographic locations, and those two locations would resolve through merge replication. We (the DBAs) warned that SQL Server merge replication is not as reliable as the development team had been led to believe, but our warning was disregarded and, consequently, the application support team had eventually to take measures to prevent merge-replication collisions: The corresponding application pool at one site was shut down, ensuring database updates only on one side and effectively transforming this into an active-hot-standby architecture. We still had merge replication, however, and would failover the application servers roughly every month. And periodically we still encounter problems with the identity ranges between the publisher and subscriber databases.
At the time of the above error, the publisher (the data server with the distribution database and the publication) was not the site of the "active" database receiving message data, which I'll call the primary; i.e., the subscriber data server was the primary. We ran this query on the publisher/secondary to see the records in dbo.XYZPDQReplicatedTable1:
--USE XYZPDQ
SELECT COUNT(*) FROM dbo.XYZPDQReplicatedTable1
WHERE DateCreated BETWEEN '2018-12-12 00:00:00' AND '2018-12-12 21:45:00'
We also ran this query on the subscriber/primary just to see if data was being replicated correctly. Good news: it was replicating. We then ran this query on the distribution database to see the identity column ranges assigned at each site:
SELECT [publisher_id]
,[publisher_db]
,[publication]
,[article]
,[subscriber]
,[subscriber_db]
,[is_pub_range]
,[ranges_allocated]
,[range_begin]
,[range_end]
,[next_range_begin]
,[next_range_end]
,[max_used]
,[time_of_allocation]
FROM [distribution].[dbo].[MSmerge_identity_range_allocations]
WHERE article = 'XYZPDQReplicatedTable1'
ORDER BY time_of_allocation DESC
Both the publisher range size and the subscriber range size were set to 100,000, and the range threshold was set to 80% (verified by viewing the publication's Properties | Articles | XYZPDQReplicatedTable1 | Article Properties | Identity Range Management). The subscriber/primary had a range from 3463918-3563918 until an insert of 1,794 records was attempted at 21:22:15 UTC, and the last identity column value was 3563864 (already exceeding 100000*0.8 + 3463918 = 3543918). When the range is full on the publisher side, a new range is allocated by a replication trigger on the table in question; if the range is full on the subscriber side, the table has to wait until the merge agent runs again. Our merge agent runs every five minutes and should have allocated a new range at 21:25:00.000. However, it did not allocate a new range until 21:35:04, five minutes after the error was reported.
So while it's good news that merge replication didn't break, we're still troubled by a few unanswered questions. If the range threshold was already being exceeded, why didn't the merge agent allocate a new range right then (3563864 - 1794 > 3543918)? And if SQL Server didn't pay attention to that threshold, for whatever reason, why didn't it allocate a new range until the third subsequent run of the merge agent? We're also concerned that identity ranges of 100,000 were fine only six months ago but are now starting to get too crowded, despite no obvious increase in the data for these replicated databases.
In the short term, we decided to resync ("Reinitialize All Subscriptions" on the publication | "Use a new snapshot" | "Upload unsynchronized changes before reinitialization". In the long-term, we're going to use sp_adjustpublisheridentityrange in a maintenance job to force a new identity range every month or so:
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-adjustpublisheridentityrange-transact-sql
Update: There are some addition details that came to light from my notes is that this messaging application uses a Web front end and has a search utility necessitating a full-text index on a catalog of the message text. Because views are used by the front end, the full-text index and a clustered index, which is referenced by the full-text index, are created on one of the views. The underlying tables cannot be dropped/recreated at the subscriber unless this view is part of the subscription, but the full-text indexes cannot be replicated. So the resync process is further complicated here in that we have to script out the views, particularly this view with two indexes, by generating scripts not only for the view, but also for the indexes: Right-click the database in SSMS, and choose Tasks | Generate Scripts ..., select the views in the Generate and Publish Scripts wizard, and then under Advanced choose Script Full-Text Indexes and Script Indexes. Then we get something like this:
CREATE UNIQUE CLUSTERED INDEX [IDX_XYZPDQReplicatedTable1V1_ID] ON [dbo].[XYZPDQReplicatedTable1_view] (
[XYZPDQTable1Column1] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
CREATE FULLTEXT INDEX ON [dbo].[XYZPDQReplicatedTable1_view]( [From] LANGUAGE 'English', [MsgTxt] LANGUAGE 'English', [MsgTypeName] LANGUAGE 'English', [Subject] LANGUAGE 'English') KEY INDEX [IDX_XYZPDQReplicatedTable1V1_ID] ON ([MsgTxt_Catalog], FILEGROUP [PRIMARY]) WITH (CHANGE_TRACKING = AUTO, STOPLIST = SYSTEM)
So the process is:
- Script out the views and their corresponding indexes.
- Drop the views.
- Restore replication, and set the appropriate identity column ranges
- Restore the views and their indexes
(Or, alternately, we suppose we could just mirror from the primary to the inactive side and lie about having restored merge replication, but we are honest DBAs and would never even consider such a terrible thing ....)
Update: These are some quick-reference/cheat-sheet notes for setting up merge replication.
On the publisher data server, open SSMS, and expand the Object Explorer tree to Replication | Local Publications.
- Right-click Local Publications, and choose New Publication.
- In the New Publication Wizard, select these configurations:
a. Database: ExampleDB1
b. Publication Type: Merge publication
c. Objects to publish: Select all tables, (except any that may have been used to capture deadlocks or other troubleshooting data, if applicable.)
d. Snapshot Agent settings: Create the snapshot immediately. Do not schedule the agent.
e. Snapshot Agent Security: Run under the SQL Server Agent service account. Connect to the publisher by impersonating the process account.
f. Wizard Actions: Create the publication at the end of the wizard.
g. Publication name: Choose something meaningful that allows you to easily locate and maintain the publication. - Click Finish to create the publication.
- Locate and right-click the publication under Replication | Local Publications in the SSMS Object Explorer tree,
and choose View Snapshot Agent Status. - View Snapshot Agent Status dialog indicates that a snapshot has been created, right-click the publication under
Replication | Local Publications in the SSMS Object Explorer tree, and choose New Subscriptions. - In the New Subscription Wizard, select these configurations:
a. Publication: Choose the publication you just created.
b. Merge Agent Location: Run all agents at the distributor, which is the publisher data server (the
current machine).
c. Subscribers: Add the subscriber data server, and specify ExampleDB1 as the subscription database.
d. Merge Agent Security: Run under the SQL Server Agent service account, which a Windows account.
Connect to the publisher, distributor, and subscriber by impersonating the process account.
e. Synchronization Schedule: Schedule synchronization to occur daily every five minutes.
f. Initialize Subscriptions: Initialize the subscriptions at the subscriber immediately.
g. Wizard Actions: Create the subscription at the end of the wizard. - Right-click the publication under Replication | Local Publications in the SSMS Object Explorer tree, choose Properties | Articles, right-click the ExampleTable1 table article, choose Set Properties of This Table Article, and set these properties:
a. Identity Range Management: Set both the publisher range size and the subscriber range size to 100000. - Repeat the previous step to set the Identity Range Management properties for the ExampleTable2, ExampleTable3 and ExampleTable4 table articles.
Note: ExampleTable1-ExampleTable4 are the most heavily updated tables for this messaging application and also the tables on which the crucial views are built. - Go through the other articles (except those 4), and make sure that both the publisher and subscriber Identity Range
size properties are set to 10000. (The subscriber defaults to 1000, and must usually be changed to match the
publisher value.) - Right-click the subscription under Replication | Local Publications in the SSMS Object Explorer tree, and choose
View Synchronization Status to verify that the merge agent is working. - Go back to the SSMS Query window that contains the script to create the views and the view-indexes. Execute this
entire script on the Publisher node, then:
a. Right-click in the query window containing the script, choose Connection | Change Connection,
and connect to the subscriber data server - Verify the status of merge replication and of the application
Comments