There seems to be some anger after the recent announcements at SQL PASS. The big announcement from an Analysis Services perspective was the ability to create whats called the BI Semantic Model ( BISM ), basically a relational model that uses the Vertipaq engine for performance.
Almost immediately people were complaining that after years of investment into the Analysis Services UDM model ( And MDX skill ) , they are not happy to throw that all away. Well you don’t have to. I think that there was a knee jerk reaction here – UDM is very much alive and well and people will still be able to write MDX. In fact , one presentation showed that in future you might be able to query the BISM with MDX – so there is that roadmap for the MDX language.
The idea behind this new model seems to be to bring in some simplicity , its aims to be less complex than UDM. UDM will still be there , and will be more powerful but more complex. Perhaps UDM will be used for Tier-1 solutions in future ?
If anything , Analysis Services as a product just got a whole lot better, with a second modeling option now available to you. This knee jerk reaction takes me back to the Performance Point announcement – Some people don’t realize that PerformancePoint is still alive and actually better since you now get it free with SharePoint Server Enterprise.
If you want to read more , here’s TK Anand’s post
He does mention that there have been some improvements in UDM for Denali.
If anything , I’m surprised that nobody picked up on the Project Crescent limitation where you can only source data from a BISM model. That was more disappointing for me , however lets be honest :
- The finished product hasn’t launched yet
- It’s an end user tool so once again simplicity is the focus.
As always , send me your thoughts.
The third tip form my Tech Ed deck that I’ll share with you concerns the definition of tables in a Sql database. Most developers focus on writing code that either just gets the job done or thats optimized in some way. However , I’ve noticed that many SQL developers don’t pay attention to the actual table definition, lazily using standard column definitions. Most of you might think that its not really a big deal – How much space do you really save by narrowing columns on a sql table definition ? The answer might surprise you.
Lets take a look at a table that I created. Its very simple , holding just a primary key , a number column and a date column.
So an alternative would be the following definition.
The Primary key stays the same. The number column changes from INT to TINYINT , since I know that the values I will insert are not greater than 256. The DATETIME field becomes a SMALLDATETIME, since I don’t need the extra precision.
Now what exactly will all of this achieve ? Surely this is being obsessive ? Well, I wrote a script to populate both tables with a million rows each. The script was a while-loop so that the data generated would be exactly the same in each table , except possibly the second table would have a slightly later datevalue ( shouldn’t affect it ). The results are very interesting. Without any compression enabled, the “Big” table used up 24.188MB , while the “Small” table came in at 17.406MB , which is about 71% the size or a 29% saving. Now I don’t know about you , but a 30% saving before compression is something I’ll take anyday. This is just a table with 2 columns , it might be more on a “wider” table. And if you implement this on a fact table which will grow to a billion rows , the savings become really meaningful.
Other things to consider :
- More memory is used when larger fields are queried
- More data is transported across the network
- If all tables in a database are larger , this will mean larger backups
- Larger backups mean longer restore times , which means longer recovery times in disasters
The NVARCHAR issue
The other issue when it comes to field definition lies with the NVARCHAR and NCHAR type. Some developers don’t know what the difference is between VARCHAR and NVARCHAR, so they just use NVARCHAR. Well NVARCHAR can handle international language code sets – so if you have a user captured field that might contain European or Asian language characters , then you would need NVARCHAR. However , for text fields where you know that the data is stored in English ( or is constrained ), you are simply wasting a lot of space by defining the field as NVARCHAR. How much space ? Try twice as much – yep , NVARCHAR uses double the space as VARCHAR – definitely something to think about.
Following up to my previous post on scalability, we now turn our attention to cursors. You can probably find lots of info on the web on cursors and where not to use them , but I had included it into the presentation for the benefit of junior developers, and I’ll just mention here the demo I did during the presentation.
So what are the problems with cursors ?
1) They are procedural. This means that programmers coming from other languages think that they are perfectly logical , without realising that SQL is set based.
2) Some types of cursors lock records in the database and prevent other users from changing them. Other types of cursors create an additional copy of all records and then work with them.
3) A cursor is a lot more code to maintain. You’ll see that set based code is normally smaller.
Lets move on to an example.
Suppose that we wanted to update a table based on values from another table. Yes, this is a simple example I know, but it will be easier to understand if you’re coming from a .NET background into T-SQL development. So you’re probably thinking about some kind of lookup at this point , and in code terms you’re thinking about implementing the following :
I’ve used tables from AdventureWorksDW, althought I modified the DimCustomers table to hold geographical information, and put them in a cursors schema. Now, the above is certainly not an example of best practice coding. When I ran it on my machine , it took about 10 minutes to complete. To get an idea of what is actually going on , run a profiler trace when the cursor is running. In the background , you’re actually getting thousands of UPDATE and SELECT statements executing on the database. This is very inefficient when compared to the set-based solution that we’ll investigate in a moment. It also means a tremendous amount if IO going on.
Why this doesn’t scale
Well , typically what happens is that the cursor runs fine in development because the volumes of data are very small. In production , the volumes grow monthly and the cursor gets slower every month. Another reason is that when the cursor is running , it will cause blocking in the production environment. Lets take a look at a set based solution. One way of writing this would be to write and UPDATE statement with a JOIN clause.
When I ran this on my machine , it took exactly 1 second to execute , but did the same job as the cursor. Now you see the power of set based code. The update statement , because there is a join , can in 1 statement update the relevant cell with the right piece of information. That is why we see such a difference. Another thing you can do is to compare execution plans.
Once you get your head around Set-Based queries , you’ll find it much easier to come up with set-based alternatives to cursors in your daily programming life.