Junk dimension

What is a Junk Dimension? Have you ever come across a scenario where many small dimension tables connect (using foreign key) to the fact table? In data warehouse, we have two types of tables, Fact and Dimension. Fact table contains business facts as measures and references to the dimension tables. Dimension tables have attributes which contains textual information of business facts and are used to filter and label the data.

Typically, fact tables are deeper whereas dimension tables are wider. Sometimes it happens that we have many narrow and low cardinality (low cardinality : few number of rows in the table) dimensions in the system; like flags and indicators. To keep all these information in fact table, we need to connect all these dimension tables (available in system or created from flag and indicator attributes) with fact table. Connecting all these small dimension tables converts fact table in a centipede fact with possible performance degradation by increasing the number of referenced tables with each row. Also, having many small dimensions increases the total number of dimensions in data warehouse which needs an extra amount of effort for maintenance.

In this post we are going to explore junk dimension and its usage in data warehouse designing with the help of an example. Lets start with the definition of Junk Dimension.

What is a Junk Dimension?

To avoid a centipede fact table in our data warehouse, we create a single dimension table by combining attributes from miscellaneous low cardinality dimensions. This combined dimension table covers all possible set of values which might occur in fact table and is known as junk dimension. The Junk dimension is not a Cartesian product of all miscellaneous dimensions, but it must cover all possible set of values from combined dimension tables which are present or might appear in fact table. Having a single dimension table for such type of indicator and flag dimensions or attributes would not only decrease the number of dimensions, but also require less number of tables to be referred by fact table. We refer the key of the junk dimension instead of individual low cardinality dimension tables in fact table.

Junk dimension example

To demonstrate the junk dimension, have a look on below Fact_Product table which is a dummy fact table. Fact_Product has many wide and high cardinality dimensions attached with their dimension table surrogate key. This table also has few low cardinality small dimension tables which are highlighted in red rectangle in below image.

Junk dimension - Fact_Product
Fact_Product

Below is the attribute level detail of these low cardinality dimension tables;

Junk dimension - Low cardinality dimensions
Low cardinality dimension tables

We can create a single dimension for all above small and low cardinality dimensions in such way that it covers all possible values present or values that may appear later in fact table. Junk dimension created from low cardinality dimensions would multiply the number of rows significantly which is quite better than having billions of records in fact table.

Below is the junk dimension from above miscellaneous dimensions;

Junk Dimension
Junk Dimension – Sample data

Above junk dimension has {(Dim_ProductType *  Dim_TaxInformation * Dim_ProductQuality * Dim_Expiry * Dim_StorageType * Dim_DeliveryMode) – (number of non relevant business rows)} which is (6 x 3 x 3 x 3 x 3 x 3) = 1458 – 0 (might vary as per the business requirement) = 1458.

Thanks for reading the article and please do rate and share this post if you really liked it. Also, please share your valuable inputs as comments.

Rate This
[Total: 11 Average: 4.5]

5 thoughts on “Junk dimension”

    1. Hi Narasimha,

      Thanks for the reading and for your input.
      When we use a junk dimension which combines few small dimensions, we do a lookup on junk dimension from fact table considering all the columns which have been combined to create a junk dimension. Then, we pick that unique record id from the dimension table which has the exact values match for all these columns.

      Thanks,
      Gopal

  1. Pingback: Role playing dimension - SQLRelease

  2. Pingback: Degenerate dimension - SQLRelease

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.