Tuesday, October 30, 2007

Handling Late-Arriving Facts


When you want to handle this kind of data, you must set the surrogate key and the start date of the dimension data with a look up component at the first, and then you must check what data is the late-arriving one. The rows that those operation dates are lower than the start date of their dimension’s start date (Dim.StartDate>Fact.OperationDate) are late-arriving facts which their surrogate keys must be corrected with the appropriate dimension row. For specifying the correct dimension row you must send these facts to the look up component and correct the surrogate key. Finally you must add all these rows together.
The only tip that is remained is you must correct the dimension’s surrogate key with modifying the SQL of look up component in its advanced tab, for example for Order System the SQL must set as the follow:

Select *
From (Select * from [dbo].[DimProduct] as refTable)
where [refTable].ProductAlternateKey]=?
and startDate <= ? and endDate > ?

The parameters must be set to ProductAlternateKey, OrderDateAlternateKey, OrderDateAlternateKey.

Wednesday, October 17, 2007

Handling Missing Dimension Lookups

There are three methods for handling inferred members:

  • Pre-data flow checkyou must insert these members as inferred members before adding fact data to the fact table.The best situation that you can use this method is when the number of dimensions that support inferred members is one or two.In this approach you have to process the fact table, two times.

  • Post-data flow process (best solution)You must insert all fact data except for the inferred members that must store in a stage table, then you must insert inferred members, finally you should run the fact load process for the stage table.

  • In-line data flow approachyou must set the second lookup item cache mode to “No Cache” to force it to get the key from the database instead of getting from the not-updated cache.You must check that the inferred member has not added before the inserting, because it is probable that some fact data with that key are exists. For each inferred member you must first check for the record, second add it to the dimension, and finally get its surrogate key, therefore this process is suitable when the inferred members make up very small minority of source records.

Sunday, October 14, 2007

Identifying changed fact records

If your data source rows contain "created date" and "modified date", you won’t need to match the fact rows with source rows for identifying changed fact records; conversely you have to match them using one of these two methods (you must match business key and time key in both of these methods):

  1. Database joins between the source data and the fact table: The source data and fact data must be at the same server in order to perform the join efficiently.
  2. Data correlation with the data flow (using the lookup to identify fact change record): you cannot identify deleted rows in this method.

Don’t forget about historical dimension changes and inferred members and late-arriving facts, you must handle those situations if your business supports them.

If you want to update or delete a row in a table, at the same time that you are inserting rows in it; you may cause locking issues. For preventing this situation turn off the table locking the data flow destination. If the problem is remained, turn off the fast load.

Saturday, October 6, 2007

Index Optimizations

Dimension lookups and updates are row-by-row operations; therefore you must create appropriate indexes for getting the best performance. One of the tips that you can follow is creating dimension’s table clustered index on the business key instead of surrogate key. And if you need to do better you can include the current indicator flag for end date as the second column in the Type-2 dimensions. Keep it in your mind that Index optimization must be balanced between the ETL and the query usage. Although you can put the data into staging table and later establish an inner join between it and the dimension for updating the rows that are exists in the stage table. Finally, you must truncate the stage table.
Be cautious about using this method, because in most situations it is not a good idea to put surrogate key away, and index the business keys, for joining between fact-tables and dimension tables are based on surrogate keys.