Thursday, May 23, 2013

Star vs Snowflake Schemas



First Answer: My personal opinion is to use the star by default, but if the product you are using for the business community prefers a snowflake, then I would snowflake it. The major difference between snowflake and star is that a snowflake will have multiple tables for a “dimension” and a start with a single table. For example, your company structure might be

Corporate à Region à Department à Store

In a star schema, you would collapse those into a single "store" dimension. In a snowflake, you would keep them apart with the store connecting to the fact.

Second Answer:  First of all, some definitions are in order. In a star schema, dimensions that reflect a hierarchy are flattened into a single table.  For example, a star schema Geography Dimension would have columns like country, state/province, city, state and postal code. In the source system, this hierarchy would probably be normalized with multiple tables with one-to-many relationships.

A snowflake schema does not flatten a hierarchy dimension into a single table.  It would, instead, have two or more tables with a one-to-many relationship.  This is a more normalized structure. For example, one table may have state/province and country columns and a second table would have city and postal code. The table with city and postal code would have a many-to-one relationship to the table with the state/province columns.

There are some good for reasons snowflake dimension tables. One example is a company that has many types of products. Some products have a few attributes, others have many, many. The products are very different from each other.  The thing to do here is to create a core Product dimension that has common attributes for all the products such as product type, manufacturer, brand, product group, etc. Create a separate sub-dimension table for each distinct group of products where each group shares common attributes.  The sub-product tables must contain a foreign key of the core Product dimension table. 

One of the criticisms of using snowflake dimensions is that it is difficult for some of the multidimensional front-end presentation tools to generate a query on a snowflake dimension.  However, you can create a view for each combination of the core product/sub-product dimension tables and give the view a suitably description name (Frozen Food Product, Hardware Product, etc.) and then these tools will have no problem.

1 comment: