Sunday, December 30, 2007

Data Analysis Using SQL and Excel

I work as a system analyst and Designer, and also Database Designer. I have a good command of database and SQL. Moreover, I performed some data realization using Excel. But I never imagined how tightly they can fit together.

I am familiar with normal distribution and some other simple techniques in statistics theoretically, yet I have a big problem with statistic concepts like Regression and other ones.

When I started to read some books about data mining, this was the main problem that I dropped data mining. When I saw this book, I felt like the time that I seen MDX Solution book. It raised my hopes of achieving the difficulty of data mining. I take a quick look at it, and it seems organized very well.

I decided to write an overview about this book in order to help my friend Pedro and some other people who want to make a decision to buy this book or not, but keep it in your mind that I write this post based on taking a quick overview on that, not by reading it completely.

Chapter 1: A Data Miner Looks at SQL
The author introduces Dataflow concept in this chapter.

Chapter 2: What’s In a Table? Getting Started with Data Exploration
This chapter explains how you can explore SQL results with excel charts. I discover another point of view by skimming this chapter.

Chapter 3: How Different Is Different?
The basic concepts of statistic and the combination of statistics, SQL, and Excel are explained in this chapter.

Chapter 4: Where Is It All Happening? Location, Location, Location
The geography and the processes which could be done using SQL and Excel play a primary role in discovering the knowledge. You can observe this great job by reading this chapter.

Chapter 5: It’s a Matter of Time
This chapter does just as the previous one but for time.

Chapter 6: How Long Will Customers Last? Survival Analysis to Understand Customers and Their Value
Nothing will be better than the sentence that author mentioned at the first: “Survival analysis estimates how long it takes for a particular event to happen. A customer starts; when will that customer stop? By assuming that the future will be similar to the past (the homogeneity assumption), the wealth of data about historical customer behavior can help us understand what will happen and when.”

Chapter 7: Factors Affecting Survival: The What and Why of Customer Tenure
“This chapter builds on this foundation, by introducing three extensions of basic survival analysis. These extensions solve some common problems faced when applying survival analysis in the real world. They also make it possible to understand the effects of other factors besides tenure on survival.”

Chapter 8: Customer Purchases and Other Repeated Events
This chapter discusses everything about customer behavior: when, where, and how. With one notable exception: what customers purchase.

Chapter 9: What’s in a Shopping Cart? Market Basket Analysis and Association Rules
“This chapter dives into the detail, looking at the specific products being purchased, to learn both about the customers and the products they are buying. Market basket analysis is the general name for understanding product purchase patterns at the customer level.”

Chapter 10: Data Mining Models in SQL
“This chapter takes an alternative approach that introduces data mining concepts using databases. This perspective presents the important concepts, sidestepping the rigor of theoretical statistics to focus instead on the most important practical aspect: data.”

Chapter 11: The Best-Fit Line: Linear Regression Models

Chapter 12: Building Customer Signatures for Further Analysis
This chapter focuses on data preparation.

Tuesday, December 25, 2007

Data Mining


I love to know more about data mining, but I have never found any simple book which discusses data mining and statistics practically.


Yesterday, I realized that the library bought "Data Analysis Using SQL and Excel" that describes about Data Mining with SQL and Excel in practical way.


I decided to read it, in order to know the fundamental concepts and becoming eligible to read other books.


I hope it will be useful as the author mentioned in the preface.


Friday, December 21, 2007

process analysis services objects through SSIS

There are two methods for populating data into SSAS, which are used mostly for non-standard data sources:

  1. Using Dimension Processing Data Flow Destination
  2. Using Partition Processing Data Flow Destination


You can map your source data into the dimension or Partition, and set the update method which can be Add, Full, or Update.