How to Corrupt a Database
Microsoft goes to great lengths to ensure data integrity remains its paramount concern with SQL Server data. After all, ACID compliance doesn’t mean much if the underlying data is corrupt. Having said that, there are certainly points-of-failure in every computer system that make corruption inevitable given enough activity over enough time. This post talks about how to corrupt a database, so you’re prepared when it happens in production.
Learning how to deal with a corrupt database can be one of the most challenging tasks a DBA will ever undertake. Typically, corrupt databases aren’t a big deal outside of production – you simply restore a new masked-data-copy from production and continue on. But, what if it’s your business’ primary production database that suddenly has corruption, and you’d rather not have to restore a backup? The best way to be prepared for something like that is to have an intentionally corrupted database that you can use for learning purposes. This post shows you how to corrupt a database so you can discover what works best for you before you’re in a Critical Moment™.
The Setup
First things first; let’s create a database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
USE master; GO IF DB_ID(N'corrupt_db') IS NOT NULL BEGIN ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE corrupt_db; END CREATE DATABASE corrupt_db ON ( NAME = N'corrupt_db_system' , FILENAME = N'C:\data\corrupt_db_system.mdf' , SIZE = 10MB , MAXSIZE = 100MB , FILEGROWTH = 10MB ) LOG ON ( NAME = N'corrupt_db_log' , FILENAME = N'C:\data\corrupt_db_log.ldf' , SIZE = 10MB , MAXSIZE = 100MB , FILEGROWTH = 10MB ); ALTER DATABASE corrupt_db SET RECOVERY SIMPLE; ALTER DATABASE corrupt_db SET PAGE_VERIFY CHECKSUM; BACKUP DATABASE corrupt_db TO DISK = N'NUL:'; GO |
Since we need some data to corrupt, let’s create a clustered index and add some data to it. In fact, we’ll add 2,000 pages or 16MB of data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
USE corrupt_db; CREATE TABLE dbo.corrupt_table ( rownum int NOT NULL CONSTRAINT corrupt_table_pk PRIMARY KEY CLUSTERED WITH ( DATA_COMPRESSION = NONE , FILLFACTOR = 100 , PAD_INDEX = OFF ) IDENTITY(1,1) , rowdata1 char(8000) NULL , rowdata2 char(48) NULL ) ON [PRIMARY]; ;WITH source_rows AS ( /* generate 2,000 rows */ SELECT [number] = v1.n * 1000 + v2.n * 100 + v3.n * 10 + v4.n + 1 FROM (VALUES (0), (1)) v1(n) CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v2(n) CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v3(n) CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v4(n) ) INSERT INTO dbo.corrupt_table (rowdata1, rowdata2) SELECT REPLICATE(CHAR(0x11), 8000) , REPLICATE(CHAR(0x22), 48) FROM source_rows; GO |
Let’s Corrupt It!
SQL Server includes a built-in undocumented command purpose-built for writing to database pages directly, DBCC WRITEPAGE. That command can be used with equal efficacy to corrupt a page, or to fix corruption on a page.
The syntax for DBCC WRITEPAGE
is:
1 |
DBCC WRITEPAGE(<database_id | database_name>, <file_id>, <page_id>, <offset>, <length:, <value>, <write_direct>); |
The “offset” parameter indicates the starting address on the page where we should start writing data. The “length” parameter indicates how much data we’ll be writing. The “value” parameter is a hexadecimal value to write into the page; the length of this parameter must match the “length” parameter. The “write_direct” parameter should be 0
if you want DBCC WRITEPAGE to write to a buffer in memory, or 1
if you can put the database into single-user-mode, and have DBCC WRITEPAGE write directly to the data on-disk.
If you attempt to use “write_direct” with a value of 1
, but don’t have the database in single user mode, you’ll receive the following error:
Msg 7919, Level 16, State 5, Line 91 Repair statement not processed. Database needs to be in single user mode.
This next piece of code will thoroughly corrupt the dbo.corrupt_table
by placing the database into single user mode, then running DBCC WRITEPAGE
to randomly alter a single byte on each page of the table. We’re inserting the list of pages used by the table into a temporary table since we won’t be able to directly access the corrupt_db database while using DBCC WRITEPAGE. We then iterate a cursor over the temporary table, calling DBCC WRITEPAGE once against each page. The DBCC WRITEPAGE writes a randomly generated single byte to a randomly chosen location within the page.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
DROP TABLE IF EXISTS #table_pages; DECLARE @dbid int = DB_ID(N'corrupt_db'); SELECT dpa.allocated_page_file_id , dpa.allocated_page_page_id INTO #table_pages FROM corrupt_db.sys.schemas s INNER JOIN corrupt_db.sys.objects o ON o.schema_id = s.schema_id CROSS APPLY sys.dm_db_database_page_allocations(@dbid, o.object_id, NULL, NULL, 'DETAILED') dpa WHERE o.name = N'corrupt_table' AND s.name = N'dbo' AND dpa.page_type_desc = N'DATA_PAGE'; USE master; ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE; GO DECLARE @dbid int = DB_ID(N'corrupt_db'); DECLARE @fileid int; DECLARE @pageid int; DECLARE @offset int; DECLARE @value varbinary(1); DECLARE cur CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR SELECT dpa.allocated_page_file_id , dpa.allocated_page_page_id FROM #table_pages dpa OPEN cur; FETCH NEXT FROM cur INTO @fileid, @pageid; WHILE @@FETCH_STATUS = 0 BEGIN SET @offset = CONVERT(int, CRYPT_GEN_RANDOM(2)) % 8192; SET @value = CRYPT_GEN_RANDOM(1); DBCC WRITEPAGE (@dbid, @fileid, @pageid, @offset, 1, @value, 1) WITH NO_INFOMSGS; FETCH NEXT FROM cur INTO @fileid, @pageid; END CLOSE cur; DEALLOCATE cur; GO ALTER DATABASE corrupt_db SET MULTI_USER; GO |
What Happened?
The last statement in the code above brings the database back into multi user mode, so we can query the table:
1 2 3 |
USE corrupt_db; SELECT * FROM dbo.corrupt_table; |
The first page the query attempts to read is detected as corrupt, via the checksum calculation that is performed when each page is read into the buffer pool:
Msg 824, Level 24, State 2, Line 135 SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x00caae6e; actual: 0x80caae26). It occurred during a read of page (1:328) in database ID 15 at offset 0x00000000290000 in file 'C:\data\corrupt_db_system.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
DBCC CHECKDB, super-hero?
Next, lets run a DBCC CHECKDB
against the database, as instructed in the error message:
1 |
DBCC CHECKDB(N'corrupt_db') WITH NO_INFOMSGS, ALL_ERRORMSGS; |
Hmmm… seems there are a boat-load of errors that need fixing. As detailed in the last line of the output, repair_allow_data_loss
is the minimum level of repair that will bring the database back to a consistent state. Let’s try that, but first we need to put the database into single user mode:
1 2 3 4 5 |
USE master; ALTER DATABASE corrupt_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DBCC CHECKDB(N'corrupt_db', repair_allow_data_loss) WITH NO_INFOMSGS, ALL_ERRORMSGS; ALTER DATABASE corrupt_db SET MULTI_USER; GO |
Ok, the output reports a bunch of pages were fixed. Below is a portion of the output showing items that were repaired in the database.
Repair: The page (1:2354) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data). Repair: The page (1:2355) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data). Repair: The page (1:2356) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data). Repair: The page (1:2357) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data). Repair: The page (1:2358) has been deallocated from object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data). Msg 8945, Level 16, State 1, Line 140 Table error: Object ID 565577053, index ID 1 will be rebuilt. The error has been repaired. Msg 8928, Level 16, State 1, Line 140 Object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data): Page (1:328) could not be processed. See other errors for details. The error has been repaired. Msg 8939, Level 16, State 98, Line 140 Table error: Object ID 565577053, index ID 1, partition ID 72057594041270272, alloc unit ID 72057594046906368 (type In-row data), page (1:328). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 2057 and -4. The error has been repaired.
Now let’s look at the table, with a simple SELECT
statement:
1 2 3 |
USE corrupt_db; SELECT * FROM dbo.corrupt_table; |
Ok, so some rows still exist, but a whole bunch have been deleted by the DBCC CHECKDB
with repair_allow_data_loss
option. Since we’re randomly choosing an offset and value to write to each page, the effects of that corruption will vary each time you run the test code.
In the above example, where we lost data, you might need to choose between restoring the entire database or restoring a temporary copy of the database somewhere, so you can copy data from the temporary copy into the production copy. That may or may not work for the business, but it should definitely be a business decision.
Summary
Using DBCC WRITEPAGE
in a variety of ways to intentionally corrupt a database is an important learning exercise. Determining a plan for dealing with corruption, and executing the plan to see the outcome, is priceless experience.
Check out the rest of our tools, and let me know if you have any questions or suggestions about this post. Thanks for reading, and please share this post if you found it useful!
The DBCC WRITEPAGE, most dangerous command in SQL Server. LOL. Thank you for this information.