Saturday, November 27, 2010

BIDS Helper

I participate at Data Warehouse course as the teacher assistant this term.
I face lots of interesting questions and solutions by different students' groups when they asked questions for designing the solution for their assignment.
One interesting things that I have not seen before, was Dimension Health Check facility.

You can install the BIDS Helper adds. It provides you some new facilities that make your job as Data warehouse designer or developer easier.
You can see its facilities and find its link to download here:

Wednesday, August 11, 2010

Process Mining

Extracting relevant knowledge for decision makers is the key activity in all Business Intelligence areas. Different areas provide capabilities with which we can extract different knowledge from our data. For example, we can discover patterns, clusters and etc. using data mining. An example of pattern mining techniques is sequence mining, where we aim to discover the sequential order of events that happens in a data. Despite the wide use of sequence mining in different areas like biology, these algorithms have not been employed much to add values to business domains. The reason might be the complexity that exists in businesses and the way that we handle our businesses. Many businesses are running through a formal or informal agenda, called business processes. These processes define how different activities in a business process should be handled to fulfill the goal of the business. The relation between these activities can be quite complex, so the analysis of their data could be quite challenging. Let's see the situation with some example.

Imagine that we have a company consists of several informal business processes. This means that we have not modeled our processes, and people know it by heart. If we want to define the business process, we can interview different people who are involved. This is very costly, and it can be biased based on the information that people give to us. We should always consider the probability that not all people tell the way that they work; instead, many might tell the way that they should work! Although we do not have formal business process models, we have the result of execution of our business in different Information Systems. This includes different databases that record different activities, or different log files that persist different actions through time. Business Process Discovery is a sub-area of the process mining that aims to discover business process models through these information. It offers a different algorithm that enables us to discover these models from captured information.

Interesting? Yes! but it is not the end of the story!

Imagine that you have a formal business process model, and the information that records the activities that happened in your business. How can you make sure if what is happening in your business is complying with what you have defined in business process model? Is there any fraud case? Is there any employee who does not know the work but (s)he doesn't know! These sort of questions can be answered if we are able to compare our process model with the information that has been captured in the log files. This is another area of Process Mining which is called Conformance checking.

Wow! so far so good! Can we expect more from Process Mining?

Off course! business processes can capture many different perspectives. For example, they can be so basic that only describes which activity should be performed when! but they can be extended to explain who should perform each activity! Consider a company that has a basic process model. The company might not be able to define who should perform each task at beginning. Instead, the manager lets people work, and after a while (s)he wants to assign people to different activities based on the successful experiences of running the business process. I am sure you are sharp enough to realize that this information are already captured in our databases and log files! so, can we give the basic version of the process model and our log file to an algorithm and expect to receive a evolved version of the business process capturing who should do each activity? Off course! It is called Process Enhancement!

That is amazing! Can we be even more greedy to expect even more?

Sure! The good news is that we can combine Process Mining with other Mining techniques like Rule Mining and etc to expand the power of our magic! If you are interested to know more, there is a research group at the Eindhoven University of Technology that conducts this project, you can find more information on their site.

Today is the third year of this blog! I will get the data mining course next term, and I am very eager to learn and perform it in different contexts.

Friday, July 23, 2010

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence

If you followed kimball articles, you may be interested to have all of them in an organized collection like a book.
Yes, Kimball published a new book based on these articles. I just could say that it is more than a collection of articles. Indeed, they provide a good description and tips about the whole data warehouse lifecycle using these articles.
I just took a look at it, and it was really facinate me to read it.

Monday, July 19, 2010

IBM Cognos 8 Report Studio Cookbook

I received a free copy of this book in order to write my standpoint on it.

I think if you are new to cognos, or if you worked with Microsoft SQL Server Reporting Services, and want to migrate to cognos this book is very nice to follow.

Indeed, it makes the learning of cognos easier by providing lots of examples which are describes with lots of pictures.

Honestly, I think if you are professional in one reporting tools, you don't need any book, and you can just surf the tool and follow the help to overcome its difficulties.

By the way, if you are not a guru on any reporting tools, or if you do not have enough time to spend on the cognos to migrate from SSRS, I strongly recommend this book.

This book is not going to introduce any concept or any sophesticated techniques on reporting. However, it provides easy to follow step by step instructions to get familiar with the IBM Cognos 8 Report Studio.

Book Link

Friday, March 26, 2010

Solving previous problem

If we analyze the previous problem, we could consider that there are three decisions that the decision maker could take:
  1. No treatment
  2. Treatment with aspirin
  3. Treatment with warfarin
So, we should draw a decision three with these three decisions at first branch of tree.
There are three different uncertainties that could appear for treatments. Tow of these three is applied also for no treatment.
If we draw the decision three it should looks like the following pictures. (I just provide two pictures, because the tree for treatment with warfarin is the same as the aspirin, just different valuse)

If we calculate the Expected Value (EV, which clarifies the best decision that we could make) it will be clear that patients should be treated with warfarin.

A decision tree of the problem is given at the end of this document.
No treatment & CVA & affected = ‐158,000 SEK
No treatment & CVA & unaffected = ‐76,600 SEK
No treatment & No CVA = 0
Aspirin/Warfarin & side‐effect & CVA & affected = ‐(15,000+158,000) = ‐173,000 SEK
Aspirin/Warfarin & side‐effect & CVA & unaffected = ‐(15,000+76,600) = ‐91,600 SEK
Aspirin/Warfarin & side‐effect & haemorrhage = ‐(15,000+32,000) = ‐47,000 SEK
Aspirin/Warfarin & no side‐effect & CVA & affected = ‐(15,000+158,000) = ‐173,000 SEK
Aspirin/Warfarin & no side‐effect & CVA & unaffected = ‐(15,000+76,600) = ‐91,600 SEK
Aspirin/Warfarin & no side‐effect & No CVA = ‐15,000 SEK
Folding back the tree:
EMV(no treatment) =( 0.25*(‐158,000) + 0.75*(‐76,600))*0.8 = ‐77,560
EMV(aspirin treatment) =(((( 0.25*(‐173,000)+0.75*(‐91,600))*0.644)+0.356*(‐47,000))*0.048) + (((0.25*(‐173,000)+0.75*(‐91,600))*0.5)+0.5*(‐15,000))*0.952 = ‐64,692
EMV(warfarin treatment) = (((0.25*(‐173,000)+0.75*(‐91,600))*0.5)+0.5*(‐47,000))*0.072 + (((0.25*(‐173,000)+0.75*(‐91,600))*0.1)+0.9*(‐15,000))*0.928 = ‐28,639
Thus, if we merely look at the costs for the hospital, patients should be treated with warfarin.
These kind of analysis are called subjective analysis. If you are interested to deal with these kind of problem, I recommend the Making Hard Decisions: An Introduction to Decision Analysis book by Robert T. Clemen.

Tuesday, March 23, 2010

Decision Tree

I hade a course previous quarter. It was about how to use decision tree for taking correct decisions. It was very amazing and I want to rewrite one of the question of final exam for showing the types of question that decision tree could help us to solve. This quarter I have the second and advance version of that course called Decision and Risk Analysis, second course.

You are working as a medical advisor at a big hospital and your department is specialized within cardiology. Your manager has asked you to formally analyze the medical condition of atrial fibrillation. In the analysis you will merely look at costs for the hospital and not consider other criteria. Atrial fibrillation is a common condition, which carries with it a significant risk for stroke (80%) if left untreated. Treatment with the medicines warfarin or aspirin significantly reduces this risk, but there are side-effects to both treatments. For the average patient (at moderate risk), treatment with aspirin has a slightly smaller risk for side-effects than warfarin, 4.8% as opposed to 7.2%, but aspirin reduces the risk for stroke less effectively than warfarin. The serious side-effects are either cerebrovascular accident, CVA during the treatment or haemorrhage. Of patients treated with aspirin and affected by side-effects, 64.4% get during their treatment, whereas this number is only 50% for patients treated with warfarin and affected by side-effects.

For the patients who do not suffer from immediate side-effects during the aspirin treatment, the risk for CVA is still 50%, whereas the risk for CVA is only 10% for patients treated by warfarin and not suffering from side-effects during the period of treatment. If an average patient (moderate risk) with atrial fibrillation gets a stroke (CVA), regardless of treatment or no treatment, he or she is classified as affected or unaffected. Out of all CVA:s that occur, 25% are affected and 75% are unaffected. Transition costs(those that happen just once) for the treatment of a patient with CVA is established to 76,600 SEK (unaffected patients), whereas the state costs (those that remain patients for their Iifetime) for treatment of a patient with CVA is estimated to 158,000SEK (affected patients). The treatment of a haemorrhage is estimated to 32,000 SEK, and the cost of a medical treatment (either aspirin or warfarin) is 15,000 SEK.

What would be your recommendation to the hospital in the handling of moderate risk patients with atrial fibrillation and why?

But how could we solve it?

If you draw the decision tree, you will see that the warfarin is the best medical treatment.