Junk dimension 4


What is a Junk Dimension? Have you ever come across a scenario where many small dimension tables connect (using foreign key) to the fact table? In data warehouse, we have two types of tables, Fact and Dimension. Fact table contains business facts as measures and references to the dimension tables. Dimension tables have attributes which contains textual information of business facts and are used to filter and label the data.

Typically, fact tables are deeper whereas dimension tables are wider. Sometimes it happens that we have many narrow and low cardinality (low cardinality : few number of rows in the table) dimensions in the system; like flags and indicators. To keep all these information in fact table, we need to connect all these dimension tables (available in system or created from flag and indicator attributes) with fact table. Connecting all these small dimension tables converts fact table in a centipede fact with possible performance degradation by increasing the number of referenced tables with each row. Also, having many small dimensions increases the total number of dimensions in data warehouse which needs an extra amount of effort for maintenance.

In this post we are going to explore junk dimension and its usage in data warehouse designing with the help of an example. Lets start with the definition of Junk Dimension.

What is a Junk Dimension?

To avoid a centipede fact table in our data warehouse, we create a single dimension table by combining attributes from miscellaneous low cardinality dimensions. This combined dimension table covers all possible set of values which might occur in fact table and is known as junk dimension. The Junk dimension is not a Cartesian product of all miscellaneous dimensions, but it must cover all possible set of values from combined dimension tables which are present or might appear in fact table. Having a single dimension table for such type of indicator and flag dimensions or attributes would not only decrease the number of dimensions, but also require less number of tables to be referred by fact table. We refer the key of the junk dimension instead of individual low cardinality dimension tables in fact table.

Junk dimension example

To demonstrate the junk dimension, have a look on below Fact_Product table which is a dummy fact table. Fact_Product has many wide and high cardinality dimensions attached with their dimension table surrogate key. This table also has few low cardinality small dimension tables which are highlighted in red rectangle in below image.

Junk dimension - Fact_Product

Fact_Product

Below is the attribute level detail of these low cardinality dimension tables;

Junk dimension - Low cardinality dimensions

Low cardinality dimension tables

We can create a single dimension for all above small and low cardinality dimensions in such way that it covers all possible values present or values that may appear later in fact table. Junk dimension created from low cardinality dimensions would multiply the number of rows significantly which is quite better than having billions of records in fact table.

Below is the junk dimension from above miscellaneous dimensions;

Junk Dimension

Junk Dimension – Sample data

Above junk dimension has {(Dim_ProductType *  Dim_TaxInformation * Dim_ProductQuality * Dim_Expiry * Dim_StorageType * Dim_DeliveryMode) – (number of non relevant business rows)} which is (6 x 3 x 3 x 3 x 3 x 3) = 1458 – 0 (might vary as per the business requirement) = 1458.

Thanks for reading the article and please do rate and share this post if you really liked it. Also, please share your valuable inputs as comments.

Rate This
[Total: 8    Average: 4.5/5]


Gopal Krishna Ranjan

About Gopal Krishna Ranjan

I am Gopal Krishna Ranjan, having 8 years of industry experience in Software development. I have a head down experience in Database, Data Warehouse, Big Data and cloud technologies and have implemented end to end Database, Data Warehouse,  Big Data and Cloud Solutions.
I have extensively worked on SQL Server, Python, Hadoop, Hive, Spark, Azure, Machine Learning, and MSBI (SSAS, SSIS, and SSRS). I also have good experience in windows and web application development using ASP.Net and C#.


Leave a comment

Your email address will not be published. Required fields are marked *

*

4 thoughts on “Junk dimension