Now that you know about the dimension and fact tables, it is time to talk about one of the most important concepts of modeling. You might have heard about dimensional modeling or star schema, but you may have vague ideas about what it is, or alternatively, you might have used it, but not sure if it is the right way of doing it. In this article of the back to basics series, I’m going to explain what is Star Schema, and how you can build it, what are best and worst practices for it. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.
There is no prerequisite for this article. However, I highly recommend you to read below articles beforehand;
For every BI or reporting system, you have a process of designing your tables and building them based on that design. This process is called dimensional modeling. Some others call it data warehouse design, which is the same thing. Dimensional modeling is the process of thinking and designing the data model including tables and their relationships. As you see, there is no technology involved in the process of dimensional modeling, It is all happening on your head and ends up with sketching diagrams on the paper. Dimensional modeling is not the diagram in which tables are connected to each other, it is the process of doing that.
There are multiple methods of building the diagram of tables and their relationships. Let’s check these, one by one.
Transactional Database diagram
This is a diagram only useful for transactional systems, where the operations such as insert, update, delete, and retrieve data happens often. In such systems, it is important that the model is designed in a way that minimizes the efforts of doing insert, update, or delete. We are not talking about that method here. In a reporting system, the transactional database design won’t be helpful, It will make the report slow in performance. So what are the designs for a reporting model?
The diagram of tables can be in all shapes, however, there are two big categories when it comes to design a diagram for reporting systems; Snowflake and Star Schema. Snowflake is when there are many relationships between tables, and when you have to pass through multiple relationships to get from one table to another. Here is an example:
To path from FactInternetSales tables (which is the fact table) to DimProductCategory (which is the dimension table) needs passing through three relationships. More relationships will slow down the performance, and also it will bring the need for the both-directional relationship as I explained in this article. The diagram above is a snowflake schema, in a snowflake schema, there are many intermediate tables to get from one table to another. To remind you what snowflake looks like, here is a view of that;
Dimension tables are not directly connected to the fact table. They are related to other dimensions in between.
Star Schema is when there is a direct relationship between the fact table and each dimension. Each fact table can have a connection (or relationship) to multiple dimensions so it will come up with a view like this:
You cannot unsee the Star in the layout above. The fact table is the heart of the star and all dimensions are around it as points of the star. The star schema might no have exactly five points (pentagram), It might have more or less. The important part of this design is not the number of points for the star, Is the direct relationship between the fact table and every dimension.
Star Schema means the fact table in the heart of the star, and a single relationship to each dimension around it as points of the star.
If we want to re-design the model you have seen in the snowflake example, it should look like this:
In this example, the DimProduct includes all details of Subcategory and Category in it too. And it is directly related to the fact table. To learn how to combine dimensions to build a flatten dimension, read this article.
Star Schema Can Include Multiple Fact Tables
A common misunderstanding is that you should have only one fact table in a star schema. Well, that is true somehow, but the fact is that you can have a combination of star schemas to build a data model. Look at the example below;
We have two fact tables; FactInternetSales and FactResellerSales. They have some shared dimensions in between (DimProduct and DimDate), and they also have their separate dimensions (DimCustomer for FactInternetSales, and DimReseller for FactResellerSales). The model above is a perfect star schema design. Because each fact table still needs one single relationship to a dimension. A model like above, give us the ability to filter both fact tables based on shared dimension in between;
The Importance of the Star Schema
Star Schema is the best way of designing a data model for reporting, You will get the best performance and also flexibility using such a model. I have explained in another article that why having everything in one fat fact table is not a good idea. A lot of modeling challenges and issues in Power BI implementation these days can be resolved with a good star schema design. Star Schema is like a conductor of an orchestra. An orchestra without a conductor can play music, like a Power BI implementation without a star schema. However, the music won’t be synchronized, it won’t be the best it can do, and it won’t be what the audiences are after. However, if they Power BI implementation leverages the Star Schema (like the orchestra with a good conductor), it would play its full capacity, answer all the requirements, and stun the audience.
Star Schema for a Data Model is like a Conductor for an Orchestra. The best analytics outcome would be possible with the star schema, like the best music outcome with a good conductor.
How to Design Star Schema
Mastering Star Schema design can take years of experience. There are some fundamental rules to learn at the beginning, and then there are heaps of tips and tricks along the way. There are exceptions that you need to learn how to deal with them. There are books under this subject explaining how to design star schema. It is impossible to explain it in one article. However, I can point out some of the fundamental rules, that helps you to start.
Define Dimension Tables
The design of everything in one huge table is not flexible, neither performs well. I explained in this article that what is a dimension and why not put everything in one table. Dimension tables are descriptive tables that describe happening of an action which is the fact. Fields from dimension tables are often used as slicers or axis of visuals.
Define Fact Tables
Fact tables are the heart of the star schema. You have to be careful when you work with them. Especially because these are huge deep tables that can make your data model very big. In this article, I explained what is the Fact table and tips about designing it. Fact tables are tables that explain happening of action, such as sales, production, etc. Fact table includes fields which are playing the VALUE part of visuals and calculations, called facts. Fact is a numeric value that can be aggregated.
Flatten Dimension Tables
It is important that we keep the Star Schema design which is a single relationship between fact and dimension tables. You cannot get there if you have a dimension for every single field. Having a dimension for Category, another for subcategory, another for Brand, another for Color, and then one for Product details is not good for Star Schema design. That way of modeling is good for a transactional database. You have to combine tables as much as possible (As much as the meaning is still the same entity) to achieve a flatten dimension. In this article, I explained how you can flatten dimension tables.
Do NOT Flatten Fact Table
Although flattening dimension tables are good for a star schema design, the same approach doesn’t work for the fact tables! The reason is that the result of flattening is to create a wider table and bring everything in one table. We do that for dimensions because we want them to be one dimension serves it all for one entity. And to avoid extra relationships between dimensions. Flattening the fact table will lead us into the trap of having everything in one big huge table. We will lose flexibility and also performance.
Don’t Expect an Action from Dimension Table
Dimension tables are descriptive tables and should have a single-directional one-to-many relationship to a fact table. However, most of the times, people come to me and say, I want to filter my dimension data by the fact table, so I need a both-directional relationship. The reality is that you never want to filter something by fact table. Because the fact table is just a table full of numbers! You want to filter that by something which filters the fact table. So you want to filter a dimension by another dimension at the end. If the existing fact table doesn’t give you that option, it is either of these two: The fact table doesn’t include all the facts, and you can add more facts into that, which brings the desired output. Or the current fact table should not be used for this purpose and you might need another fact table. This is a tricky part of the modeling, and I will dedicate a separate article to it later on.
No! You Don’t Have Just One Table!
When I teach in my classes about the star schema, dimensional model, fact and dimension tables, I hear this a lot: “This stuff are interesting, however, in my case, I just have one single table!”. My answer is always this: “No! You don’t have just one Table! Have a closer look!”. Then I go through the table with them. Their table is just one table, but it has descriptive fields and facts all in one table. Of course, it will work for that single table. However, analytics requirements don’t end with that single dataset. After using that for a while, end users will come up with new requirements, and saying that “OK, Nice, now we want that data to be combined with this data”, and that is when the end of an era for that single table model comes fast. If the author of the model, at the beginning thought about such thing, It would have been easily extensible with adding more fact and dimension tables. But now, it means to re-do the whole model. So, here I Say again: “No! You don’t have just one Table! Have a closer look!”
As I mentioned before, we can talk about Star Schema for pages, and it never really finishes! In this article, you learned what is dimensional modeling, and also learned that Star Schema is the best way to design your model. You also learned that Star Schema doesn’t mean one single fact table, it can include multiple fact tables. In the end, you learned about a few of fundamentals of designing a star schema. In the next posts of the Back to Basics of Modeling in Power BI, I’ll go through some more examples of designing star schema. If you have a model that you can’t think of a star schema for it, let me know in the comments below. And always remember that “You don’t have just one table! Have a closers look!”
If you haven’t read other parts of Back to Basics Series of Power BI Modeling, here they are: