OK , so SQL 2016 is here , and I’m sure we’re all already playing with the new features.
I’ve decided to start learning R, and did a fresh installation of SQL 2016 with R Services installed.
While attempting to run my “Hello World” script I encountered the error –
“Msg 39011, Level 16, State 1, Line 1
SQL Server was unable to communicate with the LaunchPad service. Please verify the configuration of the service.”
OK, so the first thing you have to do is to enable SQL to run external scripts. Run
Exec sp_configure ‘external scripts enabled’, 1
Reconfigure with override
However , make sure that the new “Launchpad” service is running – I simply had to start this to solve the error on my VM.
I’m sure that many of you are familiar with the Column Store index feature that was launched with SQL 2012. Using Microsoft’s In-memory technology, the Column Store index delivered a massive performance boost to data warehouse queries. However , I’ve found that many people were put off the feature because of the fact that the Fact table becomes read only, once you implement Column Store. Well , you’ve probably heard the big news ( I’m writing this post months later than I originally wanted to ) , and for those of you who haven’t, the Column Store feature in SQL 2014 is update-able. That’s right , no more read-only fact table. However, that isn’t the reason for this post – I have something else here that might interest you. Space, the final frontier…. Firstly , you actually have 2 options with SQL 2014 – You can create the Non-Clustered Column Store as per 2012 ( which is NOT update-able ). Alternatively , you can create the new Clustered Column Store which IS update-able. The Clustered Column Store has some other interesting features : 1) It becomes the primary storage mechanism for the table. 2) It eliminates the need for other covering indexes. 3) It dramatically reduces the storage requirements for the table – and that’s a fact. Now its point 3 that I want to talk more about. In my test DW that I built for demo purposes, I have a fact table with about 20 million rows. Lets take a look at the storage statistics of that table. This table is currently taking up about 2.3 GB of space. We could apply Page Compression to this table ( which I believed has also been improved in 2014 ) , and we would get the following result. Not bad , a reduction of the space used to less than 25% of the original size.However, we haven’t built any indexes yet to boost performance. Prior to applying the Page compression , I created a copy of the table , called FactSales2. Let’s apply the Non Clustered Column Store index to that table , to give us that performance boost , and see what happens to the storage. The storage space of the table increases , and we can see that there is a 242MB overhead for the index. Now we could implement page compression and then the Non Clustered Column Store index , but then your table would still only be read-only. In addition to that , you would probably need to implement more indexes which takes up more space. In SQL 2014 , we have a better solution , which is, implement the Clustered Column Store Index. What about the Page Compression ? ….. ….. it simply isn’t needed. The Clustered Column Store delivers better compression than Page Compression , reducing the size of the Table and the Index together to a mere 200MB. Technically , the index is the table. This is astonishingly less than a tenth of the storage space required when compared to the regular table with the Non-Clustered Column Store. Is the performance the same ? I’ve written a typical Data Warehouse reporting query with joins to the dimensions , and executed it on all 4 table types. The results are as follows :
|Type||Avg. Execution Time||Notes|
|Regular Table||32 Seconds||No Indexes|
|Regular Table with Page Compression||23 Seconds||No Indexes|
|Table with Non Clustered Column Store||5 Seconds||No Compression ( 2.3 GB Table ), Read Only|
|Clustered Column Store Table||5 Seconds||Table is 200MB !|
The only thing left to say is that, if you had reservations about using the Non-Clustered Column Store previously, you would be mad not to use it on Fact tables in SQL 2014. The benefits with regards to storage and performance are astounding. It remains to be seen if there is any impact to insert performance , but that’s a topic for another day.
This is the big one !! SQL 2014 , with the in-memory OLTP capability, has officially launched as of 1 April 2014. You’ll find everything that you need to know at the link below :
So far , I’ve been experimenting with the in-memory tables , and I’ve definitely seen some big performance improvements. I’ll do an upcoming post detailing how you create these tables.
Before that , I want to talk about dev and test environments. Many of you will be wanting to try out SQL 2014 , but perhaps don’t have the space available to spin up a VM. What you can do in this situation, if you have an MSDN subscription, is spin up a VM in Azure at no cost, and try out SQL 2014 there. To do so :
1) Log into the main MSDN page ( http://www.MSDN.com )
2) Click on “Cloud”
3) Under “Start your free trial” click on sign up
4) Scroll down to the bottom of that page. You’ll see a heading called “MSDN Subscriber?”. Click on “Activate Now”
5) You are basically entitled to $150 per month of free Azure services.
Once you have sorted that out , you’ll see that SQL 2014 is already available in Azure. There are preconfigured VM templates with various options.
It takes about 5 minutes to spin up a new Virtual Machine , with the OS and SQL 2014 already installed !! Once that’s done, you can remote desktop into that and work on it as normal.
Bet you didn’t see that one coming ! While some people are still getting to grips with the excellent SQL 2012 , Microsoft has officially announced SQL 2014 at Tech Ed North America.
Your first stop should be Microsofts website , where you can sign up to be notified once the trial version is available. There are also 3 PDF documents at the bottom of that page that I highly recommend downloading.
Next , what exactly was announced in terms of features ? Just some of them :
1) Heckaton ! I’ve blogged about this before. It’s basically in-memory technology to give SQL Server OLTP applications a massive performance boost.
2) Writeable Column Store Indexes !! – A popular feature , and now your fact tables won’t be read-only once you create the index.
3) SSD Caching – extend memory by caching on an SSD. Also a performance boosting enhancement.
4) Extended Online Operations – more operations supported without taking database offline , and this extends uptime.
I’m sure that there are a whole lot more that we’ll learn about in the upcoming months. In the meantime , I recommend that you download those whitepapers at the Microsoft site and get reading !!
As you will have heard by now, some exciting announcements were made at SQL Pass. One of those was Hekaton , the exciting new in-memory database platform for SQL Server. I’ve found some more information on Hekaton.
Before I move onto that though, remember that xVelocity ColumnStore debuted in SQL 2012 , and the performance results are already impressive. I did a blog post on ColumnStore here, and the results were very impressive.
But that currently only applies to Data Warehouse workloads. Now we have “Hekaton” , which promises to bring the xVelocity technology to OLTP workloads. What we’ve heard at pass about Hekaton :
- Will ship with the next major version of Sql Server
- Will be integrated into SQL Server – no separate interface or programming model
- Basically allows you to place most frequently used OLTP data tables in memory , in order to boost reads and writes ( that’s a very simplified explanation of it )
- You can keep your other OLTP tables on normal storage. So a database application can be designed to run with a hybrid storage mechanism.
I would imagine that at some interval , the in-memory table data is written to disk. According to Microsoft , some customers on TAP are testing this at the moment , and are seeing anywhere between 5 and 50 times throughput gains. While I don’t like repeating such claims , we did see the evidence of such speed boosts with ColumnStore , so this is very exciting.
In the mean time , start ordering more RAM for your servers ….. J
One of the most exciting features in the upcoming SQL Server 2012 is most definitely ColumnStore Indexes. This feature uses technology developed initially for PowerPivot to deliver a huge performance boost to queries running of a data-warehouse, and I do mean massive !
If we look at your typical Data Warehouse workload , you would have large amounts of data , especially in the fact tables. Apart from the ETL window, you are normally querying not updating these tables, so it is essentially a read-only workload. This is perfect for the current implementation of ColumnStore.
The ColumStore feature actually brings to us not just the new Index where data is stored by column ( and potentially compressed ), but also the new vector based query execution capability called “batch processing”. The latter is also responsible in a big way for the stunning performance gains .
So, as always, I’ve built a little demo using AdventureWorks, so that you can try this yourself. I have increased the sizeof the Fact table to about 13 million rows by duplicating the data. I brought in only 4 Dimension tables, which after testing was sufficient for demonstrating the difference in performance when running the queries.
I then created a copy of the fact table. The reason for the second fact table existing is because I wanted to create the columnstore index there and then compare the query execution time to running the query on the original table.
The index itself is very easy to create – you can script it or use the GUI, just as you would a normal index. For this example , I simply created the columnstore index on all columns on the fact table.
So I run the first query , which you can see on the first image. I’m running 2 DBCC commands before the query. The first, DBCC FREEPROCCACHE, will flush out already cached query plans. The second , DBCC DROPCLEANBUFFERS, will clear buffers from the buffer pool. Running these will ensure that we get a more scientific analysis of query performance when testing.
The results are as follows:
|Query||Execution time||CPU Time|
|Normal ( 1st run )||1:19||4891 ms|
|Normal ( 2nd run )||1:13||3890 ms|
|ColumnStore ( 1st)||0:04||1264 ms|
|ColumnStore (2nd)||0:04||1218 ms|
Your eyes do not decieve you. The query execution time in this example went from 79 seconds down to an amazing 4 seconds ! The CPU time is roughly a quarter of that of the first query. This perhaps illlustrates that CPU was not a huge bottleneck in the first query , but reading the data from disk was mkore of a bottleneck. It also illustrates the advantage that “batch processing” brings.
So, i then modified the query, removing the entire WHERE clause. I wanted to see the effect of less filtering and a larger data set. The results :
|Query||Execution time||CPU Time|
|Normal ( 1st run )||1:10||27375 ms|
|Normal ( 2nd run )||1:10||27141 ms|
|ColumnStore ( 1st)||0:04||3233 ms|
|ColumnStore (2nd)||0:04||3139 ms|
The query off the “normal” table improves in terms of execution time slightly , with less filtering to do ( but a lot more CPU being used ), while the query off the ColumnStore table holds steady at about 4 seconds.
Now if you set up a test scenario like this , there are a whole host of further tests that you can do. Some things that I found interesting, when doing queries with lots of aggregations ( eg. SUM ) , using Columnstore delivered particularly fast results back compared to the normal table. Also very interesting , there were queries where the benefit was very little , or when the engine decided not to even use the Columnstore index at all. But overall, I was blown away by the performance gains and really, this is one of the big features of SQL 2012 for me.
Now currently there is one limitation to using Columnstore – the table becomes Read-Only when the index is built. Is this a problem ? Well , the current implementation of this feature states that it is to be used for Data Warehousing workloads. In a typical scenario, you would drop the index , perform the ETL , then rebuild the index. Creating the index on my table above would take on average about 2 minutes , so it was roughly the same time as creating a clustered index. I don’t really see a problem with this in most scenarios , however you can also use partition switching to add to a fact table with a Columnstore Index.
Of course , you can’t use this on an OLTP database, but perhaps in future we will be allowed to do so. Now that would really be something to look forward to. In the meantime however, I predict that many companies will enjoy performance gains by using Columnstore on their Data Marts and Data Warehouses. What do you think ?
UPDATE : For news on xVelocity on OLTP , see here.
We’re only days away from Tech Ed Africa. I’ve been busy with preparations ( I’m doing 4 sessions this year ), so I havent been blogging much lately. I’ll pick that up after Tech Ed – in fact there’s plenty of my Tech Ed content that I want to share here. Of course , Sql Denali will be the focus of everything.