Medallion Architecture in Fabric: Why? What? and How?

You must have heard the buzzword “Medallion architecture.” if you are working in the data space, and have listened to a lot of other misconceptions with it, such as Medallion architecture is different than Microsoft Fabric, or Microsoft Fabric is similar to the medallion architecture, or even things such as we don’t use medallion architecture because we use Snowflake and Microsoft Fabric and Power BI all in one solution! These and many other misconceptions come from not knowing what the medallion architecture is, why it exists, what its benefits are, and how it is built. In this article and video, I will explain all of this to you so that you can confidently implement data analytics.

Video

Sample Scenario: Why Medallion Architecture?

Imagine this scenario best to understand the “why” for medallion architecture:

Michael is one of my colleagues who works in the sales department of our organization. I am part of the Data Analytics team, where I collaborate with a team that integrates data from various sources, consolidates it in a central repository using a dimensional data warehouse structure, and makes this data available to the rest of the organization through reporting.

Michael wants to build self-service reports from the SAP system, but he doesn’t have direct access to the source data. He asks the data analytics team to give him access to the data warehouse so that he can build his dashboards. But the data warehouse team has been busy lately with tons of requests from many other teams (HR, warehousing, operations, etc). Although they want to provide access for Michael, they don’t have that data in the star-schema format yet in the data warehouse. This means Michael has to wait for a long time, sometimes months, to build his dashboard.

This wait time is unrealistic so Michael decides to build his dashboard on top of the CSV or excel file extracts he manually can produce from the SAP system, which works fine in the short term, but in the long terms comes with the challenges of updating the data, and also having the analytics build from the source system directly whereas it should have been through the channel of analytics team.

The sample scenario you read here is a very common scenario, unfortunately. The BI or analytics team is often occupied with many requests, and building a perfect data warehouse takes time; other teams don’t have the time to wait for this to be ready. They want some reports, even if it is not high-quality data, even if they do not conform to other systems.

Data Quality is not a binary concept

Data quality in every organization for every system can be variable. It is never just a 0 or 1. It can be something in between. The data that comes from one system might have a really high quality of data, whereas the data coming from another system in the same organization might lack some simple quality checks. Nevertheless, all of those are data, and they are valuable for the business to analyze.

If we always want to get the data in a 100% quality level and then start analyzing it, then we are here for a very long run, the data analytics project will take years and years to complete, the budget spent on this work will dry up soon, and business sponsors and stakeholders will question even the existence of the data analytics team, which can sometime even cause the whole analytics team to be laid off.

So, what is the answer? One of the answers is Medallion Architecture.

What is Medallion Architecture?

The analytics team is super busy with tons of work and requests from each team, and building a dimensional model data warehouse takes lots of effort and time. However, if they want to only give access to the raw data for their self-service users, with the knowledge that it is not the highest quality data, then it can still be immensely valuable for the business.

With the Medallion architecture, the raw data can be all integrated into a database (or some call it lakehouse, some call it data lake, the name doesn’t matter). This is the data that is a one-on-one copy from the source system. Because the analytics team had access to the source system, they did the data integration, and they brought it to this database. And now, without any change or filtering or cleansing on the data, they can provide access to the self-service users.

However, they need to make sure that the self-service users understand that the quality of data in this database is not the same as the quality of the data in the data warehouse. It is common to assign a label to this type of data, often called BRONZE.

The data from the bronze level can then be integrated, cleansed, and conformed into another database. This is the place where data from multiple systems match using key columns. Some of the data quality issues are resolved through the process, some filtering has been applied, and the data is much more reliable to use. This is often called SILVER.

From the silver layer, then the data gets into the shape of dimension and fact tables in a star-schema format, the data quality and reconciliation will be done and checked at the highest level. This will be the data warehouse from which the most critical day-to-day business reports are sourced. This data is called the GOLD layer.

Medallion Architecture is a multi-layered design of the data for analytics, in a way that can increase the trust in the organization in the analytics and lead to a successful analytics outcome.

Don’t we always want gold?

So, before you ask me, here I put the question in your mouth: “We always want gold data, what is the point of silver and bronze?” Here is the answer: everyone wants the gold data, who doesn’t? Who is against having the best quality of data? The problem is time and resources, though. Building a gold layer takes time. It takes a lot of resources, efforts, and time to bring the data to the highest quality, and because some reporting requirements can still work with lower quality data and don’t have that much time to wait, that is why Bronze and Silver can be helpful.

Medallion Architecture Diagram

The three layers of bronze, silver, and gold can be more or less, depending on the design. However, most organizations go with these three layers, which will be designed as follows;

Benefits of Medallion Architecture

Medallion Architecture helps to increase trust in the organization in the data analytics project. Without it, there will be years and years of waiting, and no one will see any results. If the analytics team provides low-quality data to the business and doesn’t tell them that this is Bronze data, then the business loses its trust in the project altogether.

A well-defined medallion architecture not only speeds up the process of analytics and makes data available for everyone to analyze, but also sets the expectation with the concepts of bronze, silver, and gold layers so that everyone knows how to trust the report coming from each layer.

Building Medallion Architecture in Microsoft Fabric

A medallion architecture can be built in any system, regardless of the technology. You can build it on an on-premises SQL server, or in Snowflake, in Power BI, in Microsoft Fabric, and in any other technology. All you need is three separate layers of data.

In Microsoft Fabric, this can be achieved with Fabric objects such as Database, Lakehouses, and Warehouse. However, the Lakehoues and Warehouse are used mostly for this design (because the database is mainly used for operational systems). We often use Lakehouse for the Bronze or even Silver layer. The main reason is to have the ability to store raw files directly in it (however, nothing is against using a Warehouse in Bronze or Silver layer). and in the Gold layer, because the data is in table format, then Warehouse works best. However, you can still use a Lakehouse technology for the gold layer if you want to.

Below is a sample design of the medallion architecture.

As you can see in my design, I used the following;

  • Bronze: Lakehouse, raw data. One-on-one copy of the source system, not filtered, not aggregated, not cleansed.
  • Silver: Warehouse, cleansed and formatted data, conformed from multiple systems. unnecessary data is filtered out; the data is designed in the star-schema model.
  • Gold: Power BI Semantic Model; business logics added as calculations, the data tables and their relationships are designed in a way to answer all the reporting requirements. DAX calculations are also added in here.

This design is one of the hundred designs you can build. You may choose not to use a semantic model as a gold layer, but instead have a Snowflake data warehouse. You may want to choose a different technology. This is a field where you can play whatever you like. The important rules are having the layers and the definition of data in each clearly determined and communicated with the rest of the organization.

One thing that can and will complete the medallion architecture is the usage of certification in Microsoft Fabric, where you can define labels of certified data or promoted, which can help label gold and silver. learn more about it here:

Common misconceptions about medallion architecture

Now that you have learned what the medallion architecture is and how it works, you can see that some of the following are just misconceptions.

Is medalian the same as Fabric? I cannot use Medallion since I don’t use Fabric.

Of course, not. These two are totally different things. Microsoft Fabric is a technology that you can use to build data analytics projects. In comparison, Medallion Architecture is a way of designing the architecture of the data within that technology.

Is Medalian supported by Fabric?

Yes, not only in Microsoft Fabric, but also in any other data analytics technologies, you can implement the layers of a medallion architecture.

Should I have a lakehouse or a warehouse for the Bronze layer?

It depends on many factors. As I mentioned briefly, you can use each of these in any of the layers without any problems. There are, however, pros and cons of using them in certain layers. Read my article here to understand more about the differences between a lakehouse and a warehouse. I generally prefer to keep Lakehouse in Bronze and Warehouse in Silver, but that is just my own preference.

Can I have a medallion just using Power BI components?

Of course, yes. You can have semantic models for each layer. Or what I prefer most is to have Dataflows for your bronze and silver layers, and then a semantic model for the gold.

Summary

Medallion architecture is a data architecture that plays an important role in the trustworthiness of the data and the success of data analytics in the organization. It is done by designing layers of data into three common layers of bronze, silver, and gold. In Microsoft Fabric, these can be achieved by many objects, but most commonly by using Lakehouse, Warehouse, and Power BI semantic model.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply