Power BI Basics of Modeling: Star Schema and How to Build it

Now that you know about the dimension and fact tables, it is time to talk about one of the most important concepts of modeling. You might have heard about dimensional modeling or star schema, but you may have vague ideas about what it is, or alternatively, you might have used it, but not sure if it is the right way of doing it. In this article of the back to basics series, I’m going to explain what is Star Schema, and how you can build it, what are best and worst practices for it. If you want to learn more about Power BI, read Power BI book from Rookie to Rock Star.

Prerequisite

There is no prerequisite for this article. However, I highly recommend you to read below articles beforehand;

What is a Relationship?

What is the Cardinality of the Relationship?

What is the Direction of the Relationship?

Data preparation; First and Foremost Important task

What is a Dimension table and why say No to a single big table

Basics of Modeling in Power BI: Fact Tables

Combining Dimension Tables in Power BI using Power Query; Foundation of Modeling in Power BI

Dimensional Modeling

For every BI or reporting system, you have a process of designing your tables and building them based on that design. This process is called dimensional modeling. Some others call it data warehouse design, which is the same thing. Dimensional modeling is the process of thinking and designing the data model including tables and their relationships. As you see, there is no technology involved in the process of dimensional modeling, It is all happening on your head and ends up with sketching diagrams on the paper. Dimensional modeling is not the diagram in which tables are connected to each other, it is the process of doing that.

There are multiple methods of building the diagram of tables and their relationships. Let’s check these, one by one.

Transactional Database diagram

This is a diagram only useful for transactional systems, where the operations such as insert, update, delete, and retrieve data happens often. In such systems, it is important that the model is designed in a way that minimizes the efforts of doing insert, update, or delete. We are not talking about that method here. In a reporting system, the transactional database design won’t be helpful, It will make the report slow in performance. So what are the designs for a reporting model?

Snowflake Schema

The diagram of tables can be in all shapes, however, there are two big categories when it comes to design a diagram for reporting systems; Snowflake and Star Schema. Snowflake is when there are many relationships between tables, and when you have to pass through multiple relationships to get from one table to another. Here is an example:

To path from FactInternetSales tables (which is the fact table) to DimProductCategory (which is the dimension table) needs passing through three relationships. More relationships will slow down the performance, and also it will bring the need for the both-directional relationship as I explained in this article. The diagram above is a snowflake schema, in a snowflake schema, there are many intermediate tables to get from one table to another. To remind you what snowflake looks like, here is a view of that;

Dimension tables are not directly connected to the fact table. They are related to other dimensions in between.

Star Schema

Star Schema is when there is a direct relationship between the fact table and each dimension. Each fact table can have a connection (or relationship) to multiple dimensions so it will come up with a view like this:

You cannot unsee the Star in the layout above. The fact table is the heart of the star and all dimensions are around it as points of the star. The star schema might no have exactly five points (pentagram), It might have more or less. The important part of this design is not the number of points for the star, Is the direct relationship between the fact table and every dimension.

Star Schema means the fact table in the heart of the star, and a single relationship to each dimension around it as points of the star.

If we want to re-design the model you have seen in the snowflake example, it should look like this:

In this example, the DimProduct includes all details of Subcategory and Category in it too. And it is directly related to the fact table. To learn how to combine dimensions to build a flatten dimension, read this article.

Star Schema Can Include Multiple Fact Tables

A common misunderstanding is that you should have only one fact table in a star schema. Well, that is true somehow, but the fact is that you can have a combination of star schemas to build a data model. Look at the example below;

We have two fact tables; FactInternetSales and FactResellerSales. They have some shared dimensions in between (DimProduct and DimDate), and they also have their separate dimensions (DimCustomer for FactInternetSales, and DimReseller for FactResellerSales). The model above is a perfect star schema design. Because each fact table still needs one single relationship to a dimension. A model like above, give us the ability to filter both fact tables based on shared dimension in between;

The Importance of the Star Schema

Star Schema is the best way of designing a data model for reporting, You will get the best performance and also flexibility using such a model. I have explained in another article that why having everything in one fat fact table is not a good idea. A lot of modeling challenges and issues in Power BI implementation these days can be resolved with a good star schema design. Star Schema is like a conductor of an orchestra. An orchestra without a conductor can play music, like a Power BI implementation without a star schema. However, the music won’t be synchronized, it won’t be the best it can do, and it won’t be what the audiences are after. However, if they Power BI implementation leverages the Star Schema (like the orchestra with a good conductor), it would play its full capacity, answer all the requirements, and stun the audience.

Star Schema for a Data Model is like a Conductor for an Orchestra. The best analytics outcome would be possible with the star schema, like the best music outcome with a good conductor.

How to Design Star Schema

Mastering Star Schema design can take years of experience. There are some fundamental rules to learn at the beginning, and then there are heaps of tips and tricks along the way. There are exceptions that you need to learn how to deal with them. There are books under this subject explaining how to design star schema. It is impossible to explain it in one article. However, I can point out some of the fundamental rules, that helps you to start.

Define Dimension Tables

The design of everything in one huge table is not flexible, neither performs well. I explained in this article that what is a dimension and why not put everything in one table. Dimension tables are descriptive tables that describe happening of an action which is the fact. Fields from dimension tables are often used as slicers or axis of visuals.

Define Fact Tables

Fact tables are the heart of the star schema. You have to be careful when you work with them. Especially because these are huge deep tables that can make your data model very big. In this article, I explained what is the Fact table and tips about designing it. Fact tables are tables that explain happening of action, such as sales, production, etc. Fact table includes fields which are playing the VALUE part of visuals and calculations, called facts. Fact is a numeric value that can be aggregated.

Flatten Dimension Tables

It is important that we keep the Star Schema design which is a single relationship between fact and dimension tables. You cannot get there if you have a dimension for every single field. Having a dimension for Category, another for subcategory, another for Brand, another for Color, and then one for Product details is not good for Star Schema design. That way of modeling is good for a transactional database. You have to combine tables as much as possible (As much as the meaning is still the same entity) to achieve a flatten dimension. In this article, I explained how you can flatten dimension tables.

Do NOT Flatten Fact Table

Although flattening dimension tables are good for a star schema design, the same approach doesn’t work for the fact tables! The reason is that the result of flattening is to create a wider table and bring everything in one table. We do that for dimensions because we want them to be one dimension serves it all for one entity. And to avoid extra relationships between dimensions. Flattening the fact table will lead us into the trap of having everything in one big huge table. We will lose flexibility and also performance.

Don’t Expect an Action from Dimension Table

Dimension tables are descriptive tables and should have a single-directional one-to-many relationship to a fact table. However, most of the times, people come to me and say, I want to filter my dimension data by the fact table, so I need a both-directional relationship. The reality is that you never want to filter something by fact table. Because the fact table is just a table full of numbers! You want to filter that by something which filters the fact table. So you want to filter a dimension by another dimension at the end. If the existing fact table doesn’t give you that option, it is either of these two: The fact table doesn’t include all the facts, and you can add more facts into that, which brings the desired output. Or the current fact table should not be used for this purpose and you might need another fact table. This is a tricky part of the modeling, and I will dedicate a separate article to it later on.

No! You Don’t Have Just One Table!

When I teach in my classes about the star schema, dimensional model, fact and dimension tables, I hear this a lot: “This stuff are interesting, however, in my case, I just have one single table!”. My answer is always this: “No! You don’t have just one Table! Have a closer look!”. Then I go through the table with them. Their table is just one table, but it has descriptive fields and facts all in one table. Of course, it will work for that single table. However, analytics requirements don’t end with that single dataset. After using that for a while, end users will come up with new requirements, and saying that “OK, Nice, now we want that data to be combined with this data”, and that is when the end of an era for that single table model comes fast. If the author of the model, at the beginning thought about such thing, It would have been easily extensible with adding more fact and dimension tables. But now, it means to re-do the whole model. So, here I Say again: “No! You don’t have just one Table! Have a closer look!”

Summary

As I mentioned before, we can talk about Star Schema for pages, and it never really finishes! In this article, you learned what is dimensional modeling, and also learned that Star Schema is the best way to design your model. You also learned that Star Schema doesn’t mean one single fact table, it can include multiple fact tables. In the end, you learned about a few of fundamentals of designing a star schema. In the next posts of the Back to Basics of Modeling in Power BI, I’ll go through some more examples of designing star schema. If you have a model that you can’t think of a star schema for it, let me know in the comments below. And always remember that “You don’t have just one table! Have a closers look!”

If you haven’t read other parts of Back to Basics Series of Power BI Modeling, here they are:

What is a Relationship?

What is the Cardinality of the Relationship?

What is the Direction of the Relationship?

Data preparation; First and Foremost Important task

What is a Dimension table and why say No to a single big table

Basics of Modeling in Power BI: Fact Tables

Combining Dimension Tables in Power BI using Power Query; Foundation of Modeling in Power BI

Reza Rad

Trainer, Consultant, Mentor

Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

8 thoughts on “Power BI Basics of Modeling: Star Schema and How to Build it”

Joseph Yacura - M.B.A., M.S., M.Q.M. says:

May 21, 2019 at 4:11 pm

Great article!

Loading...

Ganesh says:

May 24, 2019 at 6:37 pm

Very well written! Thanks Reza

Loading...

Lars Nordin says:

May 31, 2019 at 6:02 pm

How would you answer the question: “What’s the average age of customers buying orders larger than $100 this year?” In your first Star Schema?

Loading...

- Reza Rad says:
  
  June 1, 2019 at 8:23 am
  
  Hi Lars.
  If the Average age of the customers is the value you want, then it should be part of a fact table, not in the dimension.
  Cheers
  Reza
  
  Loading...
  
Adolfo Socorro says:

November 26, 2019 at 7:25 am

Hi Reza, thanks for a very good article. I have a case where a star schema does not seem to do the trick. Let’s say I have two fact tables, Sales Fact and Shipment Fact. I also have a dimension called Sales Order Dim that filters both fact tables, and two other dimensions called Calendar Dim (Order Date) and Calendar Dim (Ship Date). In Power BI I am not able to build a simple table visual that has Sales Order Number, Order Date, and Ship Date (all fields from each of the dimensions). That is, I want a list of sales orders with their order and ship dates. Power BI can’t build it because there are “missing relationships.”

These are the relationships:

Sales Order Dim -> Sales Fact
Sales Order Dim -> Shipment Fact
Calendar Dim (Order Date) -> Sales Fact
Calendar Dim (Ship Date) -> Shipment Fact

The only solution I can think of is to merge the two fact tables, but this does not scale very well. Are there other techniques? Thanks!

Loading...

- Reza Rad says:
  
  November 29, 2019 at 12:20 pm
  
  Hi Adolfo
  if the purpose is just to show this:
  Sales Order Number | Ship Date | Order Date
  then you can use the Sale Order number from your Sales Order Dim
  but for the other two values, just use the Date field from the fact table itself, and set the aggregation as last or the first date. alternatively, you can create a measure for each using lastdate(‘Shipment Fact'[ShipDate]) and another measure for lastdate(‘Sales Fact'[Order Date]). then use the two measures in your table visual.
  
  Cheers
  Reza
  
  Loading...
  
Ken says:

January 17, 2020 at 5:10 am

Hi Reza,

Love the blog. Your recent post with mapping of Microsoft MVP is really cool but I have star schema question that relates specifically to Geolocation data. Thus I thought I would put this question here.

I have prospect table for Germany that has Geolocation data (Country, Province, City, Address, Zip) for each business – 30K plus records. Then I have a separate more detailed German Geolocation table with (Country, Province, City, Zipcodes, Latitude, Longitude, and point) – 16,000 records.

The issue is the only unique value is the Latitude/Longitude because Zipcodes can stretch more than one city and some cities have multiple zipcodes.

After reading your other posts, my thought is to merge the Lat/Long fields from Detailed table with the prospects list. Any thoughts or ideas are greatly appreciated.

Loading...

- Reza Rad says:
  
  January 17, 2020 at 2:28 pm
  
  Hi Ken
  Thanks for your nice words. glad you like the blog and posts.
  I confirm that your understanding is right. if you have geolocation information, it is better NOT to create a geolocation dimension for it, but to merge it to your other tables. like what you said, you can merge the geolocation information into the prospect table. there is no point of having geolocation as a separate table, because the concept of geolocation always comes with the concept of another attribute, for example: customer’s city, store’s location.
  Cheers
  Reza
  
  Loading...