Archive

Archive for June, 2009

Data Compression in SQL 2008 – The Performance Impact

June 11, 2009 1 comment

I’ve been playing around with the Data Compression feature in SQL 2008. I think this is one feature that will be quite useful in the Data Warehousing environment. The 64-bit question is – at what cost does it come with ? I decided to run a few tests.

I created a Test database and decided to load data from the FactInternetSales table ( from Adventure Works ). This table has 60398 rows , so I create a table called Uncompressed and loaded the data into it 10 times. After that , I had a table with 603980 rows ,using 79.977 MB of storage space. I then duplicated this table twice – calling one of them CompressedRow and another CompressedPage. The reason for this is becuase you get 2 types of data compression – Row level and Page level.

To get an idea of how compression will affect the storage space of your table , right-click on the table, then select Storage – Manage Compression. Click Next. At the next screen select "Use same compression type for all partitions" and then select the compression type. For the CompressedRow table I selected Row and clicked Calculate at the bottom. The results for both tables are displayed in the table below.

Type Table Size Time to compress
Page Level 22.45 MB 23 seconds
Row Level 45.00 MB 14 seconds

The page level compression in this case took about 50% longer to compress the table but yeilds an astonishing 72 % saving in disk space. Remember that when using this feature the developer will probably create the table with compression , so the above timing isn’t as important as insert and update performance. Which brings us to test no. 2 …..

Insert Performance
I backed up the uncompressed table and then truncated all 3. I then proceeded to insert the rows again into all 3 and compare the times. The results here are a bit more interesting.

Table Insert 1 Insert 2 Insert 3 Insert 4* Insert 5* Insert 6**
Uncompressed 00:13 00:13 00:11 00:14 00:13 00:17
CompressedPage 00:10 00:12 00:12 00:16 00:15 00:20
CompressedRow 00:13 00:12 00:11 00:17 00:13 00:20

* Insert 4 & 5 was with 603980 rows already in the table (ie. The table was not truncated )
** Insert 6 was with 1207960 rows in the table.

So what do the results say ? Well……with an empty table its very similar. Once the table loads up with data it takes slightly longer to load it. Even 1.2 million rows of data isn’t large by data warehousing standards so once I’m planning to repeat this test with a really large table. But we can already see for a table of a million rows the difference isn’t that large. What’s also interesting is that Page and Row compression yielded similar times. This is also somthing that might change with larger volumes.

 Query Performance

This would be of more interest to most people in data warehousing. After running "SET STATISTICS TIME ON;", to get a more precise CPU and execution time , I performed the select off the tables as they were after the last test , with about 1.8 million rows in each table. The results are as follows ( times in milliseconds ) :

Table Select 1 CPU Time Select 1 Run Time Select 2 CPU Time Select 2 Run Time Select 3 CPU Time Select 3 Run Time
Uncompressed 4656 78220 4226 77746 4196 76414
CompressedPage 6770 80764 6519 78859 6689 78337
CompressedRow 6519 80142 6760 80711 6630 80969

Predictably , the select off the uncompressed data performed the best but the difference is not that great. In fact , for most people the slight increase in query time ( about a second on average ) will be acceptable. I’ve seen articles claiming that selecting off the compressed tables can even be quicker , but I was unable to replicate that – perhaps it has something to do with compressed indexes. Of course the caveat here is CPU time – as you can see selecting data from the compressed tables used on average 50% more CPU time than selecting off the uncompressed table. This might be a problem on servers with fewer CPU’s or older CPU’s. Page compression is supposed to use more CPU, and although it did provide the worst CPU time ( 6770 ) on average the results were comparable to Row compression.

Overall I’m pretty happy with the built in compression. To use or not to use – that is the first question. My answer would be to use compression if you have the CPU power and it’s more important to save disk space. Of course for some people it’s more cost effective to add storage to an old server and in that scenario you would leave compression off. The second question – which type of compression. From these results there’s not much in it and while I haven’t tested with really big tables yet ( I’l update this post when I have ) , right now I’m thinking page compression is the way to go.

BUT….. there’s something interesting that I found. Remember I created the tables with about 600k rows then compressed them ?  And the Page-compressed table was a lot smaller than the row-compressed one ? Then I inserted another 1.2 million rows into them ? How big were the tables after that. Time for another table…..

Table Initial Size ( 600k rows ) Eventual Size ( 1.8M rows )
Uncompressed 79.97 MB 239.93 MB
CompressedPage 22.45 MB 134.969 MB
CompressedRow 45.00 MB 134.969 MB

As you can see, I was saving about 100MB per table , but there was no difference between the page and row compressed tables. Why could this be ? I tried to do a DBCC SHRINKDB to see what would happen . It was exactly the same afterwards.

This is the only puzzling aspect of my test , and I await a comment from any of you out there as to why this has occured. Once again we’ll see how this scales with larger volumes. Keep watching for the update.

 

Categories: SQL Server

Data mining in Excel

June 9, 2009 Leave a comment

I’ve found this site with some excellent videos on how easy it is to get started with Data Mining in Excel. Honestly it’s so easy downloading the videos is probably the most difficult thing you’ll do when you play around with Excel Data Mining…… the link

Scroll to the bottom of the page for the videos…. 

Categories: Reporting Services