Creating a Shared Dimension in Power BI Using Power Query: Basics and Foundations of Modeling

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

For a data warehouse professional, a shared dimension concept is always clear, and a star schema design always includes such type of entity. However, in the Power BI user’s world, there are many users who are not coming from a data warehousing background. It is necessary to understand some of the concepts in order to design a good performing Power BI model. I will be writing about some of the concepts in an easy-to-understand way in some articles, here is the first one; What is a shared dimension, and why do you need that in your Power BI model? In this blog post, I will explain how it can prevent many issues, as well as the need for both directional relationship. If you like to learn more about Power BI, read the Power BI book from Rookie to Rock Star.

Sample Dataset

To follow the example in this post, download the custom Excel data source from here. In this Sample dataset, we have three tables as below

Table Inventory; which shows the inventory details in each day in each warehouse.

Table Sales; which shows the sales transactions.

Table; Manufacturing, which shows summarized information about the cost of producing each product based on date.

Design Challenge

When you load the three tables above in a model, you can see that they are not related to each other.

Many-to-many Relationship Issue

If you try to create the relationship yourself based on Product for example between the two tables Inventory and Manufacturing, you get the message pop up about the need for Many-to-Many relationship!

Don’t click on creating the relationship. A many-to-many relationship is not a perfect type of relationship to be used. The reason that the relationship can be only created as a many-to-many relationship, is that the Product column in none of these tables has unique values. In fact, there is no single product table with a list of unique values of products, so there is no single table that can be used as a source of One-to-many relationship.

When none of the two tables on both sides of a relationship, doesn’t have unique values for the relationship field, then many-to-many will be suggested. However, It is not recommended to use it. Read the rest of the blog post to learn how to fix it using a shared dimension.

Both-directional Relationship Issue

Another issue which usually happens is when you have the unique list of values, however, you still want to slice and dice based on both tables. Let’s say you want to create the relationship between Inventory table and the Manufacturing table but this time based on the Date field.

In this scenario, we have unique values of the Date field in both tables, so relationship won’t be many-to-many. However, because we want to slice and dice data of Inventory by Dates selected from the Manufacturing table, and vice versa, then the relationship needs to be both-directional.

A both-directional relationship usually happens when you want to use fields from both tables for slicing and dicing. A both-directional relationship has a significant effect on performance and is not recommended. Read the rest of this post to learn how it can be fixed using a shared dimension.

Another issue with the both-directional relationship is that you cannot apply it on all relationships in your model, because it might create a circular reference scenario!

Master List Does Not Exist!

The third issue of design with the three tables above is that there is no master list! There are some products in each table, and we do not have necessarily all products in each table. Or there are some dates in each table, and we do not have necessarily all dates in each table (Power BI will create an auto date dimension which can resolve this issue only for Date fields, but what about other fields such as Product?).
To explain the issue, I’ve created both directional relationships between all three tables to make sure that they are all filtering each other. All of the relationships are based on Date fields.

Then I created a table with the Date field from the Sales table as a slicer and three table visuals from each table. The date slicer should be able to filter all three tables based on the Date selection;

If I select a field in the date slicer, it will filter all three tables (because of both-directional relationships). However, if you look closely the two dates mentioned in the above screenshot, and some other dates in the Inventory and Manufacturing tables, doesn’t exist in the slicer. Because the date slicer is coming from the Sales table, and the Sales table doesn’t have those dates in it! we will have the same challenge with the Product slicer if we add it.

If you use a field as a slicer which doesn’t have all possible values in it, then it cannot show the right data from all tables. It is not recommended to design it this way, read the rest of the blog post to learn how shared dimension can fix this challenge.

Shared Dimension: Solution

I just mentioned three of the challenges you might have with a design like above. In reality, you don’t have just three tables, you will have much more, and you will have much more challenges with a design such as above. The best practice to design such a model is to create a shared dimension. A shared dimension is a dimension that is shared between multiple fact tables. I have explained what is the fact table and dimension table in this article previously. However, here is just a short brief:

  • A dimension table is a table that has descriptive information, such as Product. Fields from the dimension table usually will be used to slice and dice the data from the fact table.
  • A fact table is a table that has numeric and additive information, such as Sales. Fields from the fact table usually will be used as metrics and measures of our report and sliced and diced by dimension tables.

Well, now in the design above, what are our dimensions? Date and Product. What are fact tables?  Sales, Inventory, and Manufacturing. The challenge is that there is no dimension table. dimensions are fields inside the fact tables, and this creates inconsistency and design approaches. What you should do is to build the two tables separately. Because the Date and Product tables will be tables that slice and dice all of the fact tables and will be related to all fact tables, we call them shared dimensions. Shared dimension is just a dimension which is shared between multiple fact tables.

a design sketch of tables above with shared dimensions would be like this:

Creating Shared Dimension

Now that you know what is the shared dimension, and how it can be helpful, let’s see how we can add it to our design. You can build the shared dimension in many ways; using DAX calculated tables, using t-SQL (if sourced from database systems), or in Power Query. Because Power Query is applicable regardless of the data source you select, and also because the data transformations step is better to be done in Power Query rather than DAX, I am going to show you how to do it in Power Query.

Go to Edit Queries in the Power BI Desktop;

Prepare sub-tables

In the Power Query Editor window, right click on Inventory table, and create a reference from the query;

If you like to learn more about reference, read this article. Reference will create a copy of the existing query, with a reference to the existing query, which can now have extra steps in it. In the new query, right click on Product table and remove all other columns.

The Inventory table should now look like this:

Right click on the Inventory (2) table and uncheck the Enable load option for it. This is to save performance and avoid loading extra tables into the memory of Power BI Desktop. Read more about this option here.

Do the same process now for the other two tables; Manufacturing and Sales;

  • create a reference from each table
  • Only keep the Product table and remove other columns
  • uncheck the enable load in the new query

You should now have the new three tables with one Product column only in each:

Set all column names to be the same

The next step is to make sure the column names are exactly the same. because we are going to append the three tables, if we have different names, then it would create extra columns. Names should be an exact match, and remember that Power Query is a case-sensitive language; product is different from Product in the Power Query world. In our sample model, the two tables Inventory and Manufacturing have the column name as Product, but in the Sales table it is called Product Name, rename it to Product.

Append all three tables

Append all three tables together to create one table with all Product values in it. If you like to learn more about Append, read my article here.

Then in the Append command window, select Three or more tables, and all the new tables in it;

The output of the append would be one table including all Product values; you can rename this query to Product.

Because this is a table you want to be loaded into Power BI Desktop, make sure the Enable Load of this table is checked. This table is our master list including all product values. However, there are duplicate values in it which have to be removed.

Remove Duplicates

A dimension should have a unique list of values, so we need to remove duplicates for the key field here.

Before using remove duplicate, make sure to read this article about important tips you need to know before applying remove duplicate in the Power Query. In nutshell, because Power Query is case-sensitive, and because space at the end of text values, and other special characters may end with keeping duplicate values, this is how you would remove duplicates in few steps;

  • Clean transformation
  • Trim transformation
  • Transform to Upper Case
  • Remove Duplicates

full details of these steps are written here.

Now you have your Product shared dimension ready! Repeat this process for any other shared dimensions with the relevant fields. However, for the Date table, we would do it differently.

Date Dimension

Because the date table is one of the most common tables, and it has one record per day not matter is it related to sales, inventory or other tables. there are general practices of how to create a date table. Here is my explanation about creating a general purpose date dimension for Power BI using Power Query.

Best Practice Design: Star Schema and Shared Dimensions

After loading the tables above, you can create one-to-many relationship single directional from Product and Date tables to all other fact tables. This is the final design based on our example;

The above design uses two shared dimensions, and avoided all challenges mentioned;

  • It doesn’t need both-directional relationships
  • It doesn’t need many-to-many relationships
  • Product and Date tables are master tables which can be the source of any slicing and dicing

To make sure that you won’t use incorrect fields for slicing and dicing, make sure that you hide Date and Product columns in all the three fact tables as below;

This solution can now have proper reporting capabilities as below;

Summary

Nothing is worse than a bad data model design. Bad data model design cause using relationships which decrease the performance, it cause writing a lot of unnecessary DAX expressions to cover for the wrong design, and at the end of the day, it performs slow. In this example, you learned one of the basics, but the most fundamentals of designing Power BI data models. Using a shared dimension in your model will avoid both-directional and many-to-many relationships. You learn how easy is to create such a dimension. This method can be used always in your Power BI data models. I will be writing about some others basic but most important tips of modeling in the future, stay tuned.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for eight continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

18 Comments

  • Hi Reza, great post! But let’s suppose we have this:

    Source 1 – Column 1 (Dimension), Column 2 (Dimension), Other columns not related
    Source 2- Column 1 (Dimension), Column 2 (Dimension), Other columns not related
    And know I create my dimensions as it’s described in this post and create slicers for Dimension of Column 1 and 2. How do I make it work so my dimension of the column 1 filters dimension of column 2 or vice-versa and also filter my facts?
    Is it possible with this model?

    Thanks!

    • Hi Joao
      Your requirement is different. This example I explained here, talks about dimensions which are not dependent on each other.
      In your scenario, you are talking about dimensions which selection of one, should filter the other one.
      The implementation of such a scenario is heavily dependent on each case. But in general, if the two dimensions are going to be somehow related, I would create one dimension with those columns and columns of it, rather than two tables.
      Cheers
      Reza

  • What you call “shared dimensions” is actually a master data. So as a recap of your huge post I may say: Don’t be naive, use master data. Am I?

    • short answer yes.
      Long answer; you do not always have a master data management system. I’m showing how it is possible to create a master table even with no MDM behind the scene
      Cheers
      Reza

  • Great post, planning on using this to create slicers in projects that I’m working on. Very strong example and very easy to follow steps!

  • Great stuff as always. I am just wondering whats the difference between using reference and ”add as new query” and then do all the following stuff. This is what i do today to create my master table?

    Best regards from sweden
    /Daniel

    • Hi Daniel
      There are some differences. one of the most important one in this case is:
      let’s say query1 is the main query, and you create a new copy using “Add as a new query” and call it query2, and continue with more transformations in query2. Then after a while, you want to add an extra step in query1, which you also want to be applied automatically on query2 (which is very normal), then you can’t. You have to do that transformation manually in each query!
      however, if you do use the reference and generate query2 as a reference to query1. as soon as you change query1, query2 will automatically have that change.
      I have written a whole article about the difference of Duplicate (which add as a new query is doing that), and reference here.
      Cheers
      Reza

  • Thank you for the in depth explanation.
    What do you mean “To make sure that you won’t use incorrect fields for slicing and dicing, make sure that you hide Date and Product columns in all the three fact tables as below;”
    and how do you hide them?
    Giora

    • I mean, if you suddenly select the Product or Date field from one of the fact tables as a slicer, you will not get other tables filtered by that. You have to select it from dimensions. So better to hide those fields in the fact tables.

  • When I tried your example I was unable to create an active relationship for manufacturing or Sales to Date. Inventory had no problem. I kept getting the warning “you can’t create a direct active relationship between Manufacturing and DateDimension because that would introduce
    ambiquity between tables Product and Manufacturing. To make this relationship active, deactivate or delete one of the relationships between Product and Manufacturing first” and the same for Sales.

    • Hi Sue
      Have you checked that maybe your other relationships (between other tables) are not both-directional? sometimes that cause ambiguity

      Cheers
      Reza

  • Power BI dataflow requires Premium license to use Reference Query.
    What would be the alternative for Power BI dataflow with a Pro license?

Leave a Reply

Your email address will not be published. Required fields are marked *