For a data warehouse professional, a shared dimension concept is always clear, and a star schema design always includes such type of entity. However, in the Power BI user’s world, there are many users who are not coming from a data warehousing background. It is necessary to understand some of the concepts in order to design a good performing Power BI model. I will be writing about some of the concepts in an easy-to-understand way in some articles, here is the first one; What is a shared dimension, and why do you need that in your Power BI model? In this blog post, I will explain how it can prevent many issues, as well as the need for both directional relationship. If you like to learn more about Power BI, read the Power BI book from Rookie to Rock Star.
To follow the example in this post, download the custom Excel data source from here. In this Sample dataset, we have three tables as below
Table Inventory; which shows the inventory details in each day in each warehouse.
Table Sales; which shows the sales transactions.
Table; Manufacturing, which shows summarized information about the cost of producing each product based on date.
When you load the three tables above in a model, you can see that they are not related to each other.
Many-to-many Relationship Issue
If you try to create the relationship yourself based on Product for example between the two tables Inventory and Manufacturing, you get the message pop up about the need for Many-to-Many relationship!
Don’t click on creating the relationship. A many-to-many relationship is not a perfect type of relationship to be used. The reason that the relationship can be only created as a many-to-many relationship, is that the Product column in none of these tables has unique values. In fact, there is no single product table with a list of unique values of products, so there is no single table that can be used as a source of One-to-many relationship.
When none of the two tables on both sides of a relationship, doesn’t have unique values for the relationship field, then many-to-many will be suggested. However, It is not recommended to use it. Read the rest of the blog post to learn how to fix it using a shared dimension.
Both-directional Relationship Issue
Another issue which usually happens is when you have the unique list of values, however, you still want to slice and dice based on both tables. Let’s say you want to create the relationship between Inventory table and the Manufacturing table but this time based on the Date field.
In this scenario, we have unique values of the Date field in both tables, so relationship won’t be many-to-many. However, because we want to slice and dice data of Inventory by Dates selected from the Manufacturing table, and vice versa, then the relationship needs to be both-directional.
A both-directional relationship usually happens when you want to use fields from both tables for slicing and dicing. A both-directional relationship has a significant effect on performance and is not recommended. Read the rest of this post to learn how it can be fixed using a shared dimension.
Another issue with the both-directional relationship is that you cannot apply it on all relationships in your model, because it might create a circular reference scenario!
Master List Does Not Exist!
The third issue of design with the three tables above is that there is no master list! There are some products in each table, and we do not have necessarily all products in each table. Or there are some dates in each table, and we do not have necessarily all dates in each table (Power BI will create an auto date dimension which can resolve this issue only for Date fields, but what about other fields such as Product?).
To explain the issue, I’ve created both directional relationships between all three tables to make sure that they are all filtering each other. All of the relationships are based on Date fields.
Then I created a table with the Date field from the Sales table as a slicer and three table visuals from each table. The date slicer should be able to filter all three tables based on the Date selection;
If I select a field in the date slicer, it will filter all three tables (because of both-directional relationships). However, if you look closely the two dates mentioned in the above screenshot, and some other dates in the Inventory and Manufacturing tables, doesn’t exist in the slicer. Because the date slicer is coming from the Sales table, and the Sales table doesn’t have those dates in it! we will have the same challenge with the Product slicer if we add it.
If you use a field as a slicer which doesn’t have all possible values in it, then it cannot show the right data from all tables. It is not recommended to design it this way, read the rest of the blog post to learn how shared dimension can fix this challenge.
Shared Dimension: Solution
I just mentioned three of the challenges you might have with a design like above. In reality, you don’t have just three tables, you will have much more, and you will have much more challenges with a design such as above. The best practice to design such a model is to create a shared dimension. A shared dimension is a dimension that is shared between multiple fact tables. I have explained what is the fact table and dimension table in this article previously. However, here is just a short brief:
- A dimension table is a table that has descriptive information, such as Product. Fields from the dimension table usually will be used to slice and dice the data from the fact table.
- A fact table is a table that has numeric and additive information, such as Sales. Fields from the fact table usually will be used as metrics and measures of our report and sliced and diced by dimension tables.
Well, now in the design above, what are our dimensions? Date and Product. What are fact tables? Sales, Inventory, and Manufacturing. The challenge is that there is no dimension table. dimensions are fields inside the fact tables, and this creates inconsistency and design approaches. What you should do is to build the two tables separately. Because the Date and Product tables will be tables that slice and dice all of the fact tables and will be related to all fact tables, we call them shared dimensions. Shared dimension is just a dimension which is shared between multiple fact tables.
a design sketch of tables above with shared dimensions would be like this:
Creating Shared Dimension
Now that you know what is the shared dimension, and how it can be helpful, let’s see how we can add it to our design. You can build the shared dimension in many ways; using DAX calculated tables, using t-SQL (if sourced from database systems), or in Power Query. Because Power Query is applicable regardless of the data source you select, and also because the data transformations step is better to be done in Power Query rather than DAX, I am going to show you how to do it in Power Query.
Go to Edit Queries in the Power BI Desktop;
In the Power Query Editor window, right click on Inventory table, and create a reference from the query;
If you like to learn more about reference, read this article. Reference will create a copy of the existing query, with a reference to the existing query, which can now have extra steps in it. In the new query, right click on Product table and remove all other columns.
The Inventory table should now look like this:
Right click on the Inventory (2) table and uncheck the Enable load option for it. This is to save performance and avoid loading extra tables into the memory of Power BI Desktop. Read more about this option here.
Do the same process now for the other two tables; Manufacturing and Sales;
- create a reference from each table
- Only keep the Product table and remove other columns
- uncheck the enable load in the new query
You should now have the new three tables with one Product column only in each:
Set all column names to be the same
The next step is to make sure the column names are exactly the same. because we are going to append the three tables, if we have different names, then it would create extra columns. Names should be an exact match, and remember that Power Query is a case-sensitive language; product is different from Product in the Power Query world. In our sample model, the two tables Inventory and Manufacturing have the column name as Product, but in the Sales table it is called Product Name, rename it to Product.
Append all three tables
Then in the Append command window, select Three or more tables, and all the new tables in it;
The output of the append would be one table including all Product values; you can rename this query to Product.
Because this is a table you want to be loaded into Power BI Desktop, make sure the Enable Load of this table is checked. This table is our master list including all product values. However, there are duplicate values in it which have to be removed.
A dimension should have a unique list of values, so we need to remove duplicates for the key field here.
Before using remove duplicate, make sure to read this article about important tips you need to know before applying remove duplicate in the Power Query. In nutshell, because Power Query is case-sensitive, and because space at the end of text values, and other special characters may end with keeping duplicate values, this is how you would remove duplicates in few steps;
- Clean transformation
- Trim transformation
- Transform to Upper Case
- Remove Duplicates
full details of these steps are written here.
Now you have your Product shared dimension ready! Repeat this process for any other shared dimensions with the relevant fields. However, for the Date table, we would do it differently.
Because the date table is one of the most common tables, and it has one record per day not matter is it related to sales, inventory or other tables. there are general practices of how to create a date table. Here is my explanation about creating a general purpose date dimension for Power BI using Power Query.
Best Practice Design: Star Schema and Shared Dimensions
After loading the tables above, you can create one-to-many relationship single directional from Product and Date tables to all other fact tables. This is the final design based on our example;
The above design uses two shared dimensions, and avoided all challenges mentioned;
- It doesn’t need both-directional relationships
- It doesn’t need many-to-many relationships
- Product and Date tables are master tables which can be the source of any slicing and dicing
To make sure that you won’t use incorrect fields for slicing and dicing, make sure that you hide Date and Product columns in all the three fact tables as below;
This solution can now have proper reporting capabilities as below;
Nothing is worse than a bad data model design. Bad data model design cause using relationships which decrease the performance, it cause writing a lot of unnecessary DAX expressions to cover for the wrong design, and at the end of the day, it performs slow. In this example, you learned one of the basics, but the most fundamentals of designing Power BI data models. Using a shared dimension in your model will avoid both-directional and many-to-many relationships. You learn how easy is to create such a dimension. This method can be used always in your Power BI data models. I will be writing about some others basic but most important tips of modeling in the future, stay tuned.