From time to time, I have to troubleshoot a SQL Server instance entirely from the command line. Today we got an e-mail from one of the sysadmins asking about errors on a SharePoint Web server in development. This Web server evidently has a SQL Server instance (we, the DBAs, did not install it):
I was alerted to a few potential problems with PCS this morning, and looking into it, it seems like SQL on [aDevelopmentServer] is throwing a lot of errors. Is this something to worry about? It seems like the disk has a lot of space left, so we can probably set much higher limits if that’s what needs to be done.
From Event Viewer | Windows Logs | Application, I could see that there were a slew of these errors:
Could not allocate space for object 'dbo.MSSBatchHistory'.'IX_MSSBatchHistory' in database 'WSS_Search_aDevelopmentServer' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for
existing files in the filegroup.
And also these errors:
CREATE DATABASE or ALTER DATABASE failed because the resulting cumulative database size would exceed your licensed limit of 4096 MB per database.
I tried to start Management Studio and did not find that it had been installed (again, the DBAs did not install this instance, and I think it's part of a SharePoint installation). I checked the Services and found that there was a SQL Server running, but SQL Agent was not running. I tried to connect to SQL Server with osql:
C:\Windows\system32>osql -E
[SQL Server Native Client 10.0]Named Pipes Provider: Could not open a connection to SQL Server [2].
[SQL Server Native Client 10.0]Login timeout expired
[SQL Server Native Client 10.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.
I bounced the SQL Server service from the Services menu, but I still could not connect to this data server (same error as above). I tried to connect to this through SSMS from an instance in the same domain:
===================================
Cannot connect to aDevelopmentServer.domain.int.
===================================
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible.
Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error:
40 - Could not open a connection to SQL Server) (.Net SqlClient Data Provider)
------------------------------
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&EvtSrc=MSSQLServer&EvtID=2&LinkId=20476
------------------------------
Error Number: 2
Severity: 20
State: 0
------------------------------
Program Location:
at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential
credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString
userConnectionOptions, SessionData reconnectSessionData, DbConnectionPool pool, String accessToken, Boolean applyTransientFaultHandling, SqlAuthenticationProviderManager
sqlAuthProviderManager)
at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo,
DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions)
at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions,
DbConnectionInternal oldConnection, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory,
TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry,
DbConnectionOptions userOptions)
at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.Open()
at Microsoft.SqlServer.Management.SqlStudio.Explorer.ObjectExplorerService.ValidateConnection(UIConnectionInfo ci, IServerType server)
at Microsoft.SqlServer.Management.UI.ConnectionDlg.Connector.ConnectionThreadUser()
===================================
The system cannot find the file specified
I went back to the machine in question and looked through the installation but couldn't find the master database files right away, so I searched on *.mdf and found the database files here:
C:\Program Files\Microsoft SQL Server\MSSQL10.SHAREPOINT\MSSQL\Binn\Templates
The last modification date for these files is 3/31/2009. From looking at the details of the sqlservr.exe file, I could see that the product version is 10.0.5538.0, which corresponds to SQL Server 2008 SP3. I tried various login commands and found success with this:
SQLCMD -S aDevelopmentServer\SHAREPOINT -E
Here, aDevelopmentServer is the machine name (changed from the actual name), and SHAREPOINT is the name that appears in parenthesis after "SQL Server" in in the services menu (i.e., "SQL Server Agent (SHAREPOINT)"). Once I could connect, I could issue commands to the instance. I executed sp_databases and found that the database in question, WSS_Search_aDevelopmentServer, was 4374720 KB or 4272 MB in size. Evidently, SharePoint imposed a limit of 4096 MB per database, and this database was already in excess of that. So I tried to shrink the database:
1> USE [WSS_Search_aDevelopmentServer]
2> GO
Changed database context to 'WSS_Search_aDevelopmentServer'.
1> DBCC SHRINKDATABASE(N'WSS_Search_aDevelopmentServer')
2> GO
This didn't do much good as a subsequent execution of sp_databases showed the database to be at 4194304 KB or 4096 MB, which is the max size. Two subsequent shrink ops only got it down to about 4095.74 MB, so I took a look under the hood to see what data structures are so large. I found this script on StackOverflow at https://stackoverflow.com/questions/2094436/how-to-find-largest-objects-in-a-sql-server-database (my thanks to the author):
SELECT
t.name AS TableName,
i.name AS IndexName,
SUM(p.rows) as RowCounts,
SUM(a.total_pages) as TotalPages,
SUM(a.used_pages) as UsedPages,
SUM(a.data_pages) as DataPages,
(SUM(a.total_pages) * 8) / 1024 as TotalSpaceMB,
(SUM(a.used_pages) * 8) / 1024 as UsedSpaceMB,
(SUM(a.data_pages) * 8) / 1024 as DataSpaceMB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
WHERE
t.NAME NOT LIKE 'dt%' AND
i.OBJECT_ID > 255 AND
i.index_id <= 1
GROUP BY
t.NAME, i.object_id, i.index_id, i.name
ORDER BY
object_name(i.object_id)
I changed this to use the database in question and ORDER BY TotalSpaceMB DESC, saved this to a file: C:\temp\scripts\size_of_objects.sql and then ran it from another SQLCMD command line:
C:\>SQLCMD -S aDevelopmentServer\SHAREPOINT -E -i C:\temp\scripts\size_of_objects.sql
-o C:\temp\scripts\query_results.txt -e
The output showed me that the largest object was a table named MSSCrawlURLLog, which had an index named PK_MSSCrawlURLLog. The table had 5,786,640 rows and used 3670 MB. I backed up the database:
BACKUP DATABASE [WSS_Search_aDevelopmentServer] TO DISK = N'C:\temp\backups\WSS_Search.bak' WITH NOFORMAT, NOINIT,
NAME = N'WSS_Search-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
And then I looked at the columns for the table:
1> SELECT * FROM sys.columns WHERE object_id = OBJECT_ID('dbo.MSSCrawlURLLog')
2> GO
Then I tried to get a look at the size of the data:
1> SELECT TOP(100) TrackID FROM MSSCrawlURLLog
2> GO
And then I started to delete records from this table 100,000 records at a time:
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 0 AND 1000000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 100000 AND 200000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 200000 AND 300000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 300000 AND 400000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 400000 AND 500000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 500000 AND 600000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 700000 AND 800000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 800000 AND 900000
2> GO
1> DELETE MSSCrawlURLLog WHERE TrackID BETWEEN 900000 AND 1000000
2> GO
I could see the row count had dropped:
1> SELECT COUNT(*) FROM MSSCrawlURLLog
2> GO
-----------
1993684
(1 rows affected)
Using another script I found at https://stackoverflow.com/questions/27057000/sql-server-get-all-databases-with-mdf-and-ldf-file-location (and thanks to that author), I found the location of the MDF and LDF files:
1> SELECT db.name AS DBName, type_desc AS FileType, Physical_Name AS Location
2> FROM sys.master_files mf
3> INNER JOIN sys.databases db ON db.database_id = mf.database_id
4> WHERE db.name = 'WSS_Search_aDevelopmentServer'
5> GO
And here they were:
C:\Program Files\Microsoft Office Servers\14.0\Data\MSSQL10.SHAREPOINT\MSSQL\DATA\WSS_Search_aDevelopmentServer.mdf
C:\Program Files\Microsoft Office Servers\14.0\Data\MSSQL10.SHAREPOINT\MSSQL\DATA\WSS_Search_aDevelopmentServer_log.LDF
It turns out that I didn't have permission to view some of these directories (at least, I had to confirm that I wanted to see them to the OS), and that's probably why they didn't turn up in a Windows file search. WSS_Search_aDevelopmentServer.mdf was now 2997248 KB, so I continued deleting rows and using DBCC SHRINKDATABASE:
1> DBCC SHRINKDATABASE(N'WSS_Search_aDevelopmentServer')
2> GO
I didn't use TRUNCATE because, in my experience, that can cause the data server to become unresponsive and sometimes crash (especially if you're truncating over 5 million rows). I ran DBCC SHRINKDATABASE periodically so that the log file didn't get too large. Eventually, I got the row count down to 100,371 and the database size down to 491000 KB (480 MB).
Following up with the sysadmin, I explained that this looks like old data that only recently spilled over the allocated space limit--and also that there is no maintenance plan for this data server and that the SQL Agent had likely been disabled by SharePoint for a reason. I scheduled a follow-up for this coming week at which point I'll evaluate if this problem is likely to recur soon. If we need to take any precautions, I'll set up a Scheduled Task to run a T-SQL script to truncate certain tables and shrink the database.
Comments