Compress Big Tables
Storing data in SQL Server can be expensive. Not necessarily because disk is expensive; it isn’t and is generally getting cheaper all the time. Data costs money because it consumes RAM, requires CPU resources to process, time to back up, etc, etc. Obviously, reducing costs associated with SQL Server can reap major benefits for your business. When you compress big tables with SQL Server Data Compression, you can dramatically reduce costs, and simultaneously increase performance.
Why Compress?
Data compression for tables in SQL Server has been around for a long time. Since SQL Server 2012, to be precise. The feature is enabled by defining a table using the WITH ( DATA_COMPRESSION = [PAGE | ROW | NONE] )
clause. Compressing large tables has several benefits, including:
- Reduced storage space on disk (well, duh!). Compressing data can reduce the on-disk storage requirements by 50 to 75%; for a large table that might equate to many gigabytes of regained space on your precious SAN or local SSD storage.
- Reduced memory consumption. Once table compression is enabled, the data remains compressed in memory until SQL Server needs to read that data. To read compressed data, SQL Server uncompresses it on-the-fly. This means you might be able to reduce the amount of memory used by SQL Server, and get higher utilization if your environment is virtualized. Higher utilization equals lower costs. Sounds like a Win to me.
- Increase in Page Life Expectancy. Page Life Expectancy (PLE) is how long the average data page will remain in memory. Higher PLE means higher performance, up to a point. If you have a server with 100 second PLE, and compressing data bumps PLE up to 1000, that’s a Major Win™.
- Reduced I/O congestion. Since compressed data occupies less space, reading a compressed table from disk into memory requires less I/O bandwidth than reading an uncompressed table into memory. For a table that is seeing 50% compression, it’ll be twice as fast to read it into memory. Big Win™.
- On SQL Server Express Edition, more space to store your data (SQL Server 2017+). If you’re using the Express edition of SQL Server, you’ll realize at some point there is a 10GB limitation on the size of SQL Server Databases. Compression on large tables in SQL Server Express can mean you don’t need to step up to a paid edition of SQL Server. Very Big Win™.
- Increased CPU utilization. Assuming you’re on a paid edition of SQL Server, you’re paying for CPU, you may as well use it. Compression does increase CPU utilization, so you’ll probably not want to compress tables on a system that is already CPU bound. Aside from that, you’re golden if you’ve got spare CPU capacity. May as well use it to your advantage!
The Script
The script below will compress big tables, and their indexes, if they are consuming more space than the specified cutoff. It uses the sys.sp_estimate_data_compression_savings
system stored procedure to get the expected compression ratio of each table. If the expected compression ratio is above a certain target, the table or index is compressed. The estimation prevents wasting time compressing tables that are un-compressible, such as ones containing JPG images that are already compressed.
Note, compression affects SQL Server Replication and SQL Server Partitioning in various ways. If you are using either of those technologies, you should read the Microsoft docs linked above to ensure you don’t cause problems.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
/* Compresses tables and indexes that are consuming more than the specified amount of space. By: Hannah Vernon, 2019-08-08 Set the following variables before running the script! - @DebugOnly - 1 will cause the script to print out the generated commands, 0 will also execute those commands - @OnlineOnly - 1 causes the script to use ONLINE=ON when performing the compress operation - @CompressionType - can be "PAGE" or "ROW" - @MinObjectSizeGB - the minimum current size of a table/index before it will be considered for compression - @MinCompressionRatio - the minimum estimated compression ratio before a table/index will be compressed, a lower number indicates a higher level of compression: 0.25 - the data must compress to 1/4 its uncompressed size 0.5 - the data must compress to half its uncompressed size 0.1 - the data must compress to 1/10th its uncompressed size 1.0 - no compression Notes: - Only Heaps, Clustered Indexes, and Non-Clustered Indexes are affected by this script. */ SET TRANSACTION ISOLATION LEVEL READ COMMITTED; SET NOCOUNT ON; DECLARE @DebugOnly bit = 1; --set to zero to actually compress tables DECLARE @MinObjectSizeGB decimal(38,2) = 0.01; DECLARE @OnlineOnly bit = 1; DECLARE @CompressionType nvarchar(60) = N'PAGE'; --can be PAGE or ROW DECLARE @MinCompressionRatio decimal(38,2) = 10; DECLARE @SchemaName sysname; DECLARE @ObjectName sysname; DECLARE @IndexID int; DECLARE @PartitionNumber int; DECLARE @MaxDOP int; DECLARE @msg nvarchar(1000); DECLARE @cmd nvarchar(max); SET @MaxDOP = ( --set our MaxDOP to half of the configured value, or 1 if the sysadmin --hasn't explicity configured the value SELECT COALESCE(CONVERT(int, c.value), 2) FROM sys.configurations c WHERE c.name = N'max degree of parallelism' ) / 2; IF OBJECT_ID(N'tempdb..#estimated_results', N'U') IS NOT NULL DROP TABLE #estimated_results; CREATE TABLE #estimated_results ( [schema_name] sysname NULL , [object_name] sysname NULL , [index_id] int NULL , [partition_number] int NULL , [size_with_current_compression_setting(KB)] bigint NULL , [size_with_requested_compression_setting(KB)] bigint NULL , [sample_size_with_current_compression_setting(KB)] bigint NULL , [sample_size_with_requested_compression_setting(KB)] bigint NULL ); DECLARE cur CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR SELECT SchemaName = s.name , ObjectName = o.name , i.index_id , p.partition_number FROM sys.schemas s INNER JOIN sys.objects o ON s.schema_id = o.schema_id INNER JOIN sys.indexes i ON o.object_id = i.object_id INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id INNER JOIN ( SELECT v.index_type FROM (VALUES (0) --heap , (1) --clustered index , (2) --non-clustered index )v(index_type) ) index_types ON i.type = index_types.index_type WHERE i.is_hypothetical = 0 AND s.name <> N'sys' AND o.is_ms_shipped = 0 AND p.data_compression_desc <> @CompressionType AND ( ( @OnlineOnly = 1 AND NOT EXISTS ( SELECT 1 FROM sys.columns c INNER JOIN sys.types ty ON c.system_type_id = ty.system_type_id WHERE c.object_id = o.object_id AND ( ty.name = N'ntext' --none of these column types can be compressed online OR ty.name = N'text' OR ty.name = N'image' ) ) AND p.filestream_filegroup_id = 0 --no filestream table can be compressed online ) OR @OnlineOnly = 0 ) AND NOT EXISTS ( --tables with sparse columns cannot be compressed. --https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#how-compression-affects-other-sql-server-components SELECT 1 FROM sys.columns c WHERE c.object_id = o.object_id AND c.is_sparse = 1 ) ORDER BY s.name , o.name , i.index_id , p.partition_number; OPEN cur; FETCH NEXT FROM cur INTO @SchemaName, @ObjectName, @IndexID, @PartitionNumber; WHILE @@FETCH_STATUS = 0 BEGIN SET @msg = N'EXEC sys.sp_estimate_data_compression_savings @schema_name = N''' + @SchemaName + N''' , @object_name = N''' + @ObjectName + N''' , @index_id = ' + CONVERT(nvarchar(1000), @IndexID) + N' , @partition_number = ' + CONVERT(nvarchar(1000), @PartitionNumber) + N' , @data_compression = ' + @CompressionType + N';'; --RAISERROR (@msg, 0, 1) WITH NOWAIT; INSERT INTO #estimated_results ( object_name , schema_name , index_id , partition_number , [size_with_current_compression_setting(KB)] , [size_with_requested_compression_setting(KB)] , [sample_size_with_current_compression_setting(KB)] , [sample_size_with_requested_compression_setting(KB)] ) EXEC sys.sp_estimate_data_compression_savings @schema_name = @SchemaName , @object_name = @ObjectName , @index_id = @IndexID , @partition_number = @PartitionNumber , @data_compression = @CompressionType; FETCH NEXT FROM cur INTO @SchemaName, @ObjectName, @IndexID, @PartitionNumber; END CLOSE cur; DEALLOCATE cur; DECLARE cur CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR SELECT CASE WHEN er.index_id = 0 THEN N'ALTER TABLE ' + QUOTENAME(er.schema_name) + N'.' + QUOTENAME(er.object_name) + N' REBUILD WITH ( DATA_COMPRESSION = ' + @CompressionType + N' , ONLINE = ' + CASE WHEN @OnlineOnly = 1 THEN N'ON' ELSE N'OFF' END + N' , SORT_IN_TEMPDB = OFF , MAXDOP = ' + CONVERT(nvarchar(10), @MaxDOP) + N' ); ' ELSE N'ALTER INDEX ' + QUOTENAME(i.name) + N' ON ' + QUOTENAME(er.schema_name) + N'.' + QUOTENAME(er.object_name) + N' REBUILD WITH ( DATA_COMPRESSION = ' + @CompressionType + N' , ONLINE = ' + CASE WHEN @OnlineOnly = 1 THEN N'ON' ELSE N'OFF' END + N' , SORT_IN_TEMPDB = OFF , MAXDOP = ' + CONVERT(nvarchar(10), @MaxDOP) + N' ); ' END FROM #estimated_results er LEFT JOIN sys.schemas s ON er.schema_name = s.name LEFT JOIN sys.objects o ON s.schema_id = o.schema_id AND er.object_name = o.name LEFT JOIN sys.indexes i ON o.object_id = i.object_id AND er.index_id = i.index_id WHERE er.[size_with_current_compression_setting(KB)] >= (@MinObjectSizeGB * 1048576) --sizes are reported in KB AND (CONVERT(decimal(38, 2), er.[size_with_requested_compression_setting(KB)]) / er.[size_with_current_compression_setting(KB)]) <= @MinCompressionRatio OPEN cur; FETCH NEXT FROM cur INTO @cmd; WHILE @@FETCH_STATUS = 0 BEGIN if @DebugOnly = 0 BEGIN BEGIN TRANSACTION RAISERROR (@cmd, 0, 1) WITH NOWAIT; EXEC sys.sp_executesql @cmd; COMMIT TRANSACTION CHECKPOINT END ELSE BEGIN RAISERROR (@cmd, 0, 1) WITH NOWAIT; END FETCH NEXT FROM cur INTO @cmd; END CLOSE cur; DEALLOCATE cur; |
I’m out!
Check out the rest of our posts on SQL Server Performance, and please, let me know what you think of this post!