Friday, October 13, 2017

The Data Warehouse Lab: A step-by-step guide using SSIS and SSAS 2017



I always feel some difficulty for newbies in data warehousing to grasp all theoretical aspects due to the complexity of the tools and the long-running process of data warehousing, which starts by selecting the data source and ends by visualizing the information. Therefore, I designed several examples in my course, and I realized that the examples helped students to understand the overall picture better.

Last year, I realized that if all examples can be defined as a kind of story that students can follow them and see the overall picture, it can help even more, so it was the main motivation behind writing this book.

This book aims to help students and practitioners who are new to data warehousing to start developing a new data warehouse project from scratch. It shows different phases of data warehousing projects through a simple case. So readers can experience the full data warehouse development life-cycle through a simple example step-by-step. The book is written for the novice user, so there is no requirement for previous experience of working with MS SQL Server and other tools. However, it expects readers to know basics of databases like the table, columns, etc.

The book does not aim to teach data warehousing and multi-dimensional design principle, nor play the role of a comprehensive reference book on Microsoft Business Intelligence Toolset. It only intends to help readers to gets a hands-on experience on data warehouse development quickly. It aims to give readers basic understanding and experience, so they become more confident in using reference books and online materials.

The book does not go through the installation of tools that are used in the sample project. The readers need to install the following tools in order to follow the steps, i.e., Microsoft SQL Server Database Engine, Microsoft SQL Server Integration Services (SSIS) 2017 , Microsoft SQL Server Analysis Services (SSAS) 2017, Microsoft SQL Server Management Studio (SSMS), Microsoft Excel, and Microsoft Power BI.

For me, the best time to publish the book would be in January, since I did not have time to do serious proofreading on it. However, my course is going to start soon, so I decided to publish the book, and I will correct typos and other problems later. So, please let me know if you read the book and need some help or find some typos. At least, this is the good feature that is provided by Amazon that I can update the content of the book later :-)

Here are the links to the book:
The Data Warehouse Lab: A step-by-step guide using SSIS and SSAS 2017 Kindle Edition


Friday, September 20, 2013

Pivot Query

I always love SQL, and it is really pity that I have not written any complex SQL for really long time (I remember I wrote the last complex SQL statement in 2007). Today, I needed a sort of pivot result from SQL Server. Although I knew that I can retrieve the result very easily using pivot in Excel, I wanted to do that in SQL. Sometimes, I like to suffer myself ;)
First, I realized that there is a feature added to the SQL Server 2008 that I was not aware of, called PIVOT and UNPIVOT! It is really nice feature added to T-SQL, yet it was also possible to do such queries in previous SQL servers with a little more difficulties. However, there is a problem that I discuss through definition of table:
The table has three columns, i.e. userID, movieID and rate. We want to have a pivot view which contains movie as rows, users as columns and count of rates as cells. However, in current cyntax, you have to explicitly mention each user by value, which seems not suitable. Thus, I change a SQL which ‘chandra sekhar pathivada’ mentions in 1/17/2013 in the mentioned page to retrieve the result (look at http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx). The SQL that I wrote is:

DECLARE @columns NVARCHAR(1000)

SELECT @columns = COALESCE(@columns + ',[' + cast(userID as nvarchar) + ']', '[' + cast(userID as nvarchar)+ ']')
FROM(
  Select distinct userID
  From movies     
) CLMNS

DECLARE @query nVARCHAR(4000)     
     
SET @query = '     
Select
movieID,        
'+ @columns +'    
FROM       
(       
select movieID, userID, rate       
from        
movies
) x       
PIVOT       
(count(rate) FOR userid IN('+ @columns +')) AS MyPivot'     
 print @columns
 print @query
EXECUTE(@query)


This SQL retrieves the result correctly, yet I am still not satisfied. Do you think that it might be other better way to do that? I am still thinking about it.

Tuesday, July 30, 2013

The SSAS Tutorial for beginners

It was last summer that I developed a new tutorial for SSAS. The aim of this tutorial is to make students familiar with SQL Server 2012 Analysis Services. It might be helpful for you if you are an absolute beginner; otherwise, there is nothing new that you can learn from. The tutorial is based on a fictitious scenario which is grounded on simplified version of AdventureWorks Database.
Unfortunately, I cannot share the data source of this tutorial.

Wednesday, December 28, 2011

Stanford Machine Learning course


It has been always a true challenge for me to understand Data Mining in depth.

I have always problem to have a deep understanding of data mining algorithm, since all of them were ended to the mathematical formula that I have no idea about them.

Some times ago, I found the Machine Learning course at Stanford University. This is a free course, and I strongly recommend this course to ones who like to have a deep understanding of these algorithms.

This course will give you deep knowledge and understanding about different algorithm. it also gives you clue to distinguish which approach suits you the most in applying Data mining.

Finally, it is really awesome, since you should write some linear algebra programs that implement these algorithms. Therefore, it is not just theoretical, but also it is really practical.

I finish the course successfully, and the course will be repeated in January.

Other courses which are offered free are:

These courses are offered by Stanford University and Berkley University.

Tuesday, May 3, 2011

MDX and DAX Formatter

Today, I watched a video about "Format MDX&DAX" online application. This application enables you to convert the MDX or DAX expressions into a readable expression. it would be usefull specially when you are tracing your application and using "Microsoft SQL Server Profiler".
You can see the video here.

link of application

Saturday, November 27, 2010

BIDS Helper

I participate at Data Warehouse course as the teacher assistant this term.
I face lots of interesting questions and solutions by different students' groups when they asked questions for designing the solution for their assignment.
One interesting things that I have not seen before, was Dimension Health Check facility.

You can install the BIDS Helper adds. It provides you some new facilities that make your job as Data warehouse designer or developer easier.
You can see its facilities and find its link to download here:
http://bidshelper.codeplex.com

Wednesday, August 11, 2010

Process Mining

Extracting relevant knowledge for decision makers is the key activity in all Business Intelligence areas. Different areas provide capabilities with which we can extract different knowledge from our data. For example, we can discover patterns, clusters and etc. using data mining. An example of pattern mining techniques is sequence mining, where we aim to discover the sequential order of events that happens in a data. Despite the wide use of sequence mining in different areas like biology, these algorithms have not been employed much to add values to business domains. The reason might be the complexity that exists in businesses and the way that we handle our businesses. Many businesses are running through a formal or informal agenda, called business processes. These processes define how different activities in a business process should be handled to fulfill the goal of the business. The relation between these activities can be quite complex, so the analysis of their data could be quite challenging. Let's see the situation with some example.

Imagine that we have a company consists of several informal business processes. This means that we have not modeled our processes, and people know it by heart. If we want to define the business process, we can interview different people who are involved. This is very costly, and it can be biased based on the information that people give to us. We should always consider the probability that not all people tell the way that they work; instead, many might tell the way that they should work! Although we do not have formal business process models, we have the result of execution of our business in different Information Systems. This includes different databases that record different activities, or different log files that persist different actions through time. Business Process Discovery is a sub-area of the process mining that aims to discover business process models through these information. It offers a different algorithm that enables us to discover these models from captured information.

Interesting? Yes! but it is not the end of the story!

Imagine that you have a formal business process model, and the information that records the activities that happened in your business. How can you make sure if what is happening in your business is complying with what you have defined in business process model? Is there any fraud case? Is there any employee who does not know the work but (s)he doesn't know! These sort of questions can be answered if we are able to compare our process model with the information that has been captured in the log files. This is another area of Process Mining which is called Conformance checking.

Wow! so far so good! Can we expect more from Process Mining?

Off course! business processes can capture many different perspectives. For example, they can be so basic that only describes which activity should be performed when! but they can be extended to explain who should perform each activity! Consider a company that has a basic process model. The company might not be able to define who should perform each task at beginning. Instead, the manager lets people work, and after a while (s)he wants to assign people to different activities based on the successful experiences of running the business process. I am sure you are sharp enough to realize that this information are already captured in our databases and log files! so, can we give the basic version of the process model and our log file to an algorithm and expect to receive a evolved version of the business process capturing who should do each activity? Off course! It is called Process Enhancement!

That is amazing! Can we be even more greedy to expect even more?

Sure! The good news is that we can combine Process Mining with other Mining techniques like Rule Mining and etc to expand the power of our magic! If you are interested to know more, there is a research group at the Eindhoven University of Technology that conducts this project, you can find more information on their site.

Today is the third year of this blog! I will get the data mining course next term, and I am very eager to learn and perform it in different contexts.