One of our support staff recently approached me to find out why a client application was timing out during its initial configuration. He knew it was connecting to the database but couldn't determine what was happening exactly. I kicked off the configuration myself, looked at Activity Monitor for the database in question, and found this query (table/column names have been changed here) with a non-SARGable LIKE operation:
DELETE FROM [Example_Table] WHERE [Example_Column] LIKE 'exstr%'
The [Example_Column] column is a non-null varchar(200) type with almost 10,000 rows. I could see there were 9 cascading deletes per row corresponding to 6 other tables with foreign keys referencing this table, and the estimated query plan had four steps:
- Clustered Index Scan (Example_Table.PK_Example_Table) COST: 0%
- Compute Scalar COST: 0%
- Clustered Index Insert (Example_Table) COST: 100%
- DELETE, COST: 0%
Looking at the tables with foreign-key constraints to the table undergoing deletes, I began to see the cause of the poor performance:
- Referencing table #1 had 7 FK constraints and 7 corresponding clustered-index seek operations (each with 112,210,564 actual rows read; the estimate was 21.9658 rows) on [Example_Table], each with a cost of 47%, and clustered-index delete operation with a cost of 44%.
- Referencing table #2 had 1 FK constraint and 1 corresponding clustered-index seek operation on [Example_Table] with a cost of 47% and a clustered-index delete operation with a cost of 37%; this table had a multi-valued primary clustered key of three columns, two of which were uniqueidentifier/GUID columns, and two other FK constraints to other tables.
- Referencing table #3 had 1 FK constraint and 1 corresponding clustered-index seek operation on [Example_Table] with a cost of 47% and a clustered-index delete operation with a cost of 37%; this table also had a multi-valued primary clustered key of three columns, all of which were uniqueidentifier/GUID columns, and two other FK constraints to other tables.
- Referencing table #4 had 1 FK constraint and 1 corresponding clustered-index seek operation on [Example_Table] with a cost of 5% and a hash-match operation with a cost of 29%; this table also had a multi-valued primary clustered key of three columns, all of which were uniqueidentifier/GUID columns, and three other FK constraints to other tables.
- Referencing table #5 had 1 FK constraint and 1 corresponding clustered-index seek operation on [Example_Table] with a cost of 16% and a table-delete operation with a cost of 78%; this table also had two other FK constraints to one other table. A second subquery in the execution plan had a table-delete operation with a cost of 90%.
- Referencing table #6 had 2 FK constraints and 2 corresponding clustered-index seek operations on [Example_Table], each with a cost of 22%, and clustered-index delete operation with a cost of 44%; this table also had one other FK constraint to another table. A second subquery in the execution plan had a table-delete operation with a cost of 70%.
The primary/secondary/tertiary/etc. cascading deletes and resulting contention and wait times were causing a connection timeout, and even when I ran the query manually it took over 20 minutes before I cancelled it. We sent our feedback to the developers (a third party, not our company) and cancelled the production installation as user-acceptance testing had failed. I also noticed another significant problem with this particular database, although it wasn't immediately evident as the direct cause of the poor performance here, and it's worth noting: At least three of the tables had clustered indexes with at least one uniqueidentifier (16-byte GUID) column. I'll go into more depth on this issue in a subsequent post.
Nice column! I had never heard of the term "Sargability" and had to look it up. As a young developer at IBM, we were told using LIKE, and other non SARGable operations was not only bad form, but pointed to an error in the data architecture. At that time we had people dedicated to properly analyzing the data and designing the database. I believe that today we are sorely lacking in the skills these data architects possessed. My last job used a mySQL database that was the most pathetic mess I had ever seen. The database was "developed", and I use that term very loosely, by the originator of the product. Very poor design, and used exactly what you recommend against, the use of a GUID value as both a primary key and a column in a clustered index. It had horrible performance, was impossible to maintain and extend as the product evolved.
Posted by: Howard Van Dam | 11/15/2018 at 06:50 PM