Data Compression in SQL 2008 – The Performance Impact
I created a Test database and decided to load data from the FactInternetSales table ( from Adventure Works ). This table has 60398 rows , so I create a table called Uncompressed and loaded the data into it 10 times. After that , I had a table with 603980 rows ,using 79.977 MB of storage space. I then duplicated this table twice – calling one of them CompressedRow and another CompressedPage. The reason for this is becuase you get 2 types of data compression – Row level and Page level.
To get an idea of how compression will affect the storage space of your table , right-click on the table, then select Storage – Manage Compression. Click Next. At the next screen select "Use same compression type for all partitions" and then select the compression type. For the CompressedRow table I selected Row and clicked Calculate at the bottom. The results for both tables are displayed in the table below.
Type | Table Size | Time to compress |
Page Level | 22.45 MB | 23 seconds |
Row Level | 45.00 MB | 14 seconds |
The page level compression in this case took about 50% longer to compress the table but yeilds an astonishing 72 % saving in disk space. Remember that when using this feature the developer will probably create the table with compression , so the above timing isn’t as important as insert and update performance. Which brings us to test no. 2 …..
Insert Performance
I backed up the uncompressed table and then truncated all 3. I then proceeded to insert the rows again into all 3 and compare the times. The results here are a bit more interesting.
Table | Insert 1 | Insert 2 | Insert 3 | Insert 4* | Insert 5* | Insert 6** |
Uncompressed | 00:13 | 00:13 | 00:11 | 00:14 | 00:13 | 00:17 |
CompressedPage | 00:10 | 00:12 | 00:12 | 00:16 | 00:15 | 00:20 |
CompressedRow | 00:13 | 00:12 | 00:11 | 00:17 | 00:13 | 00:20 |
* Insert 4 & 5 was with 603980 rows already in the table (ie. The table was not truncated )
** Insert 6 was with 1207960 rows in the table.
So what do the results say ? Well……with an empty table its very similar. Once the table loads up with data it takes slightly longer to load it. Even 1.2 million rows of data isn’t large by data warehousing standards so once I’m planning to repeat this test with a really large table. But we can already see for a table of a million rows the difference isn’t that large. What’s also interesting is that Page and Row compression yielded similar times. This is also somthing that might change with larger volumes.
Query Performance
This would be of more interest to most people in data warehousing. After running "SET STATISTICS TIME ON;", to get a more precise CPU and execution time , I performed the select off the tables as they were after the last test , with about 1.8 million rows in each table. The results are as follows ( times in milliseconds ) :
Table | Select 1 CPU Time | Select 1 Run Time | Select 2 CPU Time | Select 2 Run Time | Select 3 CPU Time | Select 3 Run Time |
Uncompressed | 4656 | 78220 | 4226 | 77746 | 4196 | 76414 |
CompressedPage | 6770 | 80764 | 6519 | 78859 | 6689 | 78337 |
CompressedRow | 6519 | 80142 | 6760 | 80711 | 6630 | 80969 |
Predictably , the select off the uncompressed data performed the best but the difference is not that great. In fact , for most people the slight increase in query time ( about a second on average ) will be acceptable. I’ve seen articles claiming that selecting off the compressed tables can even be quicker , but I was unable to replicate that – perhaps it has something to do with compressed indexes. Of course the caveat here is CPU time – as you can see selecting data from the compressed tables used on average 50% more CPU time than selecting off the uncompressed table. This might be a problem on servers with fewer CPU’s or older CPU’s. Page compression is supposed to use more CPU, and although it did provide the worst CPU time ( 6770 ) on average the results were comparable to Row compression.
Overall I’m pretty happy with the built in compression. To use or not to use – that is the first question. My answer would be to use compression if you have the CPU power and it’s more important to save disk space. Of course for some people it’s more cost effective to add storage to an old server and in that scenario you would leave compression off. The second question – which type of compression. From these results there’s not much in it and while I haven’t tested with really big tables yet ( I’l update this post when I have ) , right now I’m thinking page compression is the way to go.
BUT….. there’s something interesting that I found. Remember I created the tables with about 600k rows then compressed them ? And the Page-compressed table was a lot smaller than the row-compressed one ? Then I inserted another 1.2 million rows into them ? How big were the tables after that. Time for another table…..
Table | Initial Size ( 600k rows ) | Eventual Size ( 1.8M rows ) |
Uncompressed | 79.97 MB | 239.93 MB |
CompressedPage | 22.45 MB | 134.969 MB |
CompressedRow | 45.00 MB | 134.969 MB |
As you can see, I was saving about 100MB per table , but there was no difference between the page and row compressed tables. Why could this be ? I tried to do a DBCC SHRINKDB to see what would happen . It was exactly the same afterwards.
This is the only puzzling aspect of my test , and I await a comment from any of you out there as to why this has occured. Once again we’ll see how this scales with larger volumes. Keep watching for the update.
Data mining in Excel
Scroll to the bottom of the page for the videos….