The article that I wrote earlier this week about the shared dimension had a lot of interest, and I’m glad it helped many of you. So I thought better to write about the basics of modeling even more. In this article, I will be focusing on a scenario that you have all faced, however, took different approaches. Is it good to have too many dimension tables? can you combine some of those tables together to build one flatten dimension table? how much should you flatten it? should you end up with one huge table including everything? In this article, I’m answering all of these questions and explaining the scenarios of combining dimensions, as usual, I explain the model in Power BI. However, the concepts are applicable to any other tools. If you like to learn more about Power BI; read Power BI book from Rookie to Rock Star.
Video
Sample Dataset
The dataset for this model is the AdventureWorksDW2012 Excel file, which you can download from here. For this example, we need these tables: DimProduct, DimProductCategory, DimProductSubcategory, and FactInternetSales.
In the AdventureWorksDW dataset, we have three tables related to Product information; DimProduct has the details of every single product;
DimProductCategory has information about all categories of products;
DimProductSubCategory is the second level of the hierarchy, where we have subcategories under each category;
As you can see, every record in the subcategory table belongs to a category, or let’s say have a value in their ProductCategoryKey column which is then the key to find the higher level category in the DimProductCategory table. If we want to design a hierarchy of Products, this is how it would look like;
Design Challenge
Let’s say we load all of the tables above (plus the FactInternetSale table, which is our transactional sales table) into a Power BI model. This is how the relationship diagram looks like:
Note that Power BI automatically creates a couple of these relationships, but you need to create the one which is highlighted yellow yourself, connect ProductSubcategoryKey from DimProduct to ProductSubcategoryKey in the DimProductSubcategory table.
The Need for Both-Directional Relationship Sometimes
The directions of filters as you can see in the above diagram is from category to subcategory, and then to the product, and finally to the fact table. If we filter data in the fact table using a field from any of these three tables, then we get slicing and dicing working perfectly; higher levels of hierarchy filters lower levels of it easily as below;
however, what about this: we want to filter the higher level of the hierarchy using the lower level! For example, I want to know what is the product category that Mountain bikes belong to?
As you can see, the visual above isn’t helpful. Product Subcategory, cannot filter the ProductCategory table, and the reason is obvious; the relationship is single directional between these tables from the category to the subcategory.
Sometimes, depends on the requirements, the design above leads to the need for both-directional relationship, which is not recommended.
Creating a Hierarchy of all three tables
Let’s say you want to make the report design consistent, and want to create a hierarchy of all of these three fields (category, subcategory, and product) and then use it in all visuals.
*Note that it is possible to create a hierarchy of fields directly in each visual by dragging and dropping fields after each other, however, creating the hierarchy in the model makes this process more consistent, and then you can just drag the hierarchy in every visual you want.
The challenge is that you cannot create a model hierarchy between fields that are not on the same table! You may need to create calculated columns in the Product table using the RELATED DAX function to achieve this process. This is so much process to achieve something really simple.
You cannot create model hierarchies between fields that come from multiple tables.
Extra Relationships! Why?!
The next challenge of the existing model is that we have so many relationships to cover something really simple. In this model so far, we have three relationships. In the real world scenarios, you don’t have only four tables, you will have hundreds of tables, and if you follow this approach, you will have hundreds of relationships! Relationships will cost processing power when it comes to slicing and dicing. you won’t notice it for small models with a few tables, but you will see the difference when the model grows. If you don’t think much about your data model, and just add tables and relationships to it, soon you will end up with a model much more complicated than this:
I have written an article about the importance of data preparation as the very first step of data modeling, read it here.
The Model is Confusing
When you have more tables in your model, you will make it more confusing at the reporting side. There are too many tables having product information on it! Why such confusion for such a simple model?!
Where this designing concept is coming from?
So the design above has many challenges. Let’s see where the design concept is coming from? The design that you see, which has one table per attribute or function, is coming from transactional database design. In the world of transactional databases, it is important to design the database in a way that you can apply CRUD (Create, Update, Retrieve, and Delete) operations easy and fast. and one of the best ways of doing that is to separate each entity as a table; category as a table, subcategory as another table, color of the product as a table, brand as another table, and etc. in such a design, you will have probably more than ten tables storing different pieces of information about the product itself. That design is great for transactional systems, but it is not working for reporting systems. You need a different design for reporting.
Flattening Dimension Tables
In a reporting data model, your approach should be designing a star schema (which I explained here), and having a fair amount of dimension tables. Your dimension tables can include everything about that entity. For example; if you have a Product dimension, then it can include color, brand, size, product name, subcategory and category of the product, all in one table. This way of design will avoid all challenges and issues I mentioned earlier in this post, and many other challenges too.
So the solution to our design challenge is to flatten the Product table by combining DimProductCategory, and DimProductSubCategory with it. Let’s see how it is possible;
Combining or Flattening tables into one Dimension
There are different methods you can do this flattening process, you can flatten it using T-SQL if you have relational databases as a source, you can use DAX to do it, or alternatively, you can do it using Power Query, which is the method I am explaining here.
Go to Edit Queries in the Power BI Desktop;
In the Power Query Editor window, click on Product table, and from the Combine section, select Merge Queries;
In the Merge Queries window, check that the DimProduct is the first table, and then select ProductSubcategoryKey from it, then select DimProductSubcategory as the second table, and also the ProductSubcategoryKey column in there too, and then click on OK.
Merge table is an operation which flattens two tables based on matching field(s). I have written fully in details what is Merge and the difference of that with append, and also what are different join kinds in the merge operations. In this operation, the default join kind would work for us, but make sure to read the difference between all kinds of joins and their output samples here.
The Merge operation will create a new column in the DimProduct table (at the end of all columns), which then can be expanded into details coming from DimProductSubcategory. This way, we are combining product and subcategory together;
I only selected EnglishProductSubcategoryName, and ProductCategoryKey (this one is needed to merge it at the next step with DimProductCategory). here is the result;
As you can see, the two columns above are now part of the Product table.
We do the Merge Operation one more time, and this time will select DimProductCategory as the second table using the ProductCategoryKey in both tables to merge.
And then expanding it to the details of DimProductCategory, which in this case is only ProductSubcategoryName
Finally, we will have all category and subcategory details in the DimProduct as below; You can see that after the second merge, I also deleted the ProductCategoryKey column, which is not needed anymore.
Do not load intermediate tables
It is important to set the “enable load” property of the two intermediate tables (DimProductCategory and DimProductSubcategory) to unchecked now. These two tables are not needed to be loaded into the Power BI model directly. They are feeding the data into the DimProduct, and that is the only table we need in the model. To learn more about Enable Load option and the performance tips about it, read my article here.
Now if you load the data into the Power BI model, you will have a simple model such as below;
You can easily get slicing and dicing working correctly;
You can also build a model hierarchy and use it everywhere you want in the report;
How about other tables? Flattening all dimensions into one?
The method above was helpful, so you might think that let’s do that for other tables. Why not combining customer and product together as an example! Well, the flattening has a golden rule; Flatten it as long as entities have a meaningful relationship with each other! Product category and product has a meaningful relationship. every product belongs to a category. However, product and customer don’t have a relationship unless there are sales transactions made! We do already have a sales transaction table, which is the heart of this model, and flattening dimension tables into that, will make it huge, and also it will reduce our flexibility if we want to connect another table later on to product or customer. So in this scenario; we will not flatten customer and product. We will keep them as their own dimensions, with a relationship to the fact table; building a star schema.
Summary
In summary; flattening dimension tables will avoid a lot of challenges in the future in the reporting system. It is recommended to flatten attributes into one big dimension for each entity. Product dimension can have all information about the color, size, brand, product number, category, and etc. and Customer dimension can have everything about customer job title, education, age, and etc. You learned through this article, what are some of the challenges when dimensions are not flattened, and you learned how you can use Power Query Merge command to combine them together. Make sure to read my other post about Power BI modeling which is about shared dimension too as the related topic.
Nice
Very well explained. PBI’s true value lies in the data modeling capability with minimum technical hurdle.
There’s an inherent battle in data modeling between the database administrator, who, as Reza points out, designs a database for transactional efficiency (many tables into a star / snowflake scheme) and the data scientist, who, until recently, was (mainly) limited to one table with tools such as SAS, R, or Excel.
HI, can you suggest best practice approach for intermediary, or mapping tables that provide a link between a fact table and a dimension?
For example, if you have a fact table with projects that contains records for project funding. There can be multiple people associated with the project. There are also custom groups, to which people can be in multiple groups.
Where should the link between the custom group and the fact be? In an intermediary table which has GroupID and ProjectID, or push the GroupID into the Projects table (which results in duplicate project lines) and use DAX to obtain distinct Project information?
Hope that makes sense.
Cheers
Phil
Hi Phil
Have you read my other article about shared dimensions? I believe that is what you are asking about
Cheers
Reza