Sunday, December 4, 2011

Some tips defining the Fact & Dimension Tables

The fact table is the center of the star schema. The fact table contains the business metrics (i.e., numerical measurements).
Fact tables are the largest tables (in number of rows) in the star schema design. The fact table is usually highly normalized, containing only keys and metrics.

Identify Subject Areas

Start by identifying the focus or topic of the analysis (e.g., sales, human resource, finance). A subject area must contain discrete metrics (e.g., sales contains dollars sold, units sold) and have a data source available (e.g., operational source data).

Identify Facts

Within each subject area, identify the operational transactions that depict key business events. A sample operational transaction is a customer purchasing transaction.
Examine the data created by these transactions and identify facts that are used by the business processes. Facts should be numeric, have a value, and be additive.
Confirm with the end-users that you have identified all the facts that the user wants to know about.

Identify the Data that References the Facts

Identify the major dimensions for each fact table. Start by examining the operational system's logical data model that contains the entity depicting the fact table. Identify the entities that are associated (i.e., have relationships) with the entity depicting the fact table. For example, the entity purchasing transaction has a relationship with the entity customer and with the entity product.
Use Data Warehouse Query Modelling to determine how the data will be analyzed (i.e., define the business queries). Structure the fact table's dimensions to represent the focus of the business queries.

Verify that a Fact is Really a Fact and a Dimension is Really a Dimension

Look for fact tables that contain both facts and dimensions. This often happens when fact tables are derived from operational entities. For example, a property entity in a property management operational system may contain the property address and a series of dates indicating when various events took place (e.g., the grass was mowed, the snow was shoveled). The property entity could be identified as a fact table, however this would be incorrect. The correct design would be to identify a property events fact table that contains a row for each event that occurred on the property. The property fact table would be associated with a location dimension table that contains the address and a time dimension table that contains the dates the events took place.
In many cases an entity (from a operational system) could be either a fact or a dimension. If the entity has four or more dimensions then it is probably a fact.

Identify Historical Requirements

Identify how long the detail facts need to be stored. The requirement will vary by business function and business area. In some cases the detailed information will be required for a long period of time (e.g., five to 10 years) but in many cases an aggregation and/or summarization of the data will be sufficient.
When the detail facts are no longer required on-line, they can be moved to near-line or off-line storage. Where the archived data is stored depends on how quickly the data must be restored.

Identified Required Attributes

For each fact table, identify the attributes required to create the business queries. A fact table attribute should provide information about a specific occurrence. Attributes (that are not keys) in a fact table should be summable. For example, store "unit sales" and "dollar sales" and derive "price."
Summarized data, and aggregated data should not be stored in fact tables. Use summary tables to address performance issues.

Identify the Keys

Every fact table has a primary key. A primary key uniquely identifies an occurrence of a fact.
A fact table's primary key is always a composite key. The composite key is composed of multiple foreign keys. The foreign keys document the relationships between the fact table and the dimension tables. A foreign key in a fact table is the primary key of a dimension table.
If a table has a composite key then it is a fact table. If a table does not have a composite key then it is a dimension table.
The foreign key of the fact table (i.e., primary keys of the dimension tables) can be the unique identifier used in the real world or a generated key that points to the real world unique identifier. Use the real world unique identifier if the identifier will not change during the life of the data warehouse, otherwise use a generated key.

Factless Fact Tables

A factless fact table is a fact table that does not contain any facts (i.e., metrics). Factless fact tables are used to track events that did, or did not, happen.
Use a factless fact table to track events of interest to the organization. For example, attendance at a cultural event can be tracked by creating a fact table containing the following foreign keys (i.e., links to dimension tables): event identifier, speaker/entertainer identifier, participant identifier, event type, date. This table can then be queried to find out information, such as which cultural events or event types are the most popular.
Factless fact tables can also be used to track events that did not happen. Using the example above we can also identify the events that were not attended or the event types that were the least popular.







Source: http://it.toolbox.com/blogs/enterprise-solutions/identifying-fact-tables-data-warehouse-20810

Digital Inspiration Technology Guide

Change the world with your passion