Power BI Datamart Vs. Dataflow Vs. Dataset

Power BI Datamart vs Dataflow vs Dataset

Datamart, Dataflow, and Dataset are all Power BI components dealing with the data. I have presented about these a lot, and one of the questions I get is: What is the difference between Dataflow, Dataset, and Datamart? So I thought to explain it in an article and help everyone in that understanding. In this post, you will learn the differences between these three components, when and where you use each, and how they work together besides other components of Power BI.

Video

What is Dataflow?

Power BI Dataflow is the data transformation component in Power BI. It is a Power Query process that runs in the cloud, independent of Power BI report and dataset, and stores the data into Azure Data Lake storage (or Dataverse). Dataflow is not only limited to Power BI; it can be created and used in other services such as Power Platform (Power Apps). Dataflow will give you both the transformation engine of Power Query plus the storage option. Dataflow will give you a re-usable ETL (Extract-Transform-Load) component.

Power BI Dataflow

To learn more about Dataflow, I recommend the below article/video;

What is Dataset?

Power BI Dataset is the object that contains the connection to the data source, data tables, the data itself, the relationship between tables, and DAX calculations. Usually, Power BI Dataset is usually hidden from the Power BI Desktop view but can be easily seen in the Power BI service. Power BI Dataset is used commonly when sharing a model between multiple visualization objects (such as multiple Power BI reports, paginated reports, and even Excel reports). The ability to use a shared Dataset will give you a re-usable modeling layer in Power BI.

Power BI Dataset

To learn more about Datasets and how they are used in Power BI, read the article below;

What is Datamart?

Power BI Datamart is a recently added component to the Power BI ecosystem. Power BI Datamart is a combination of Dataflow, an Azure SQL Database (acting like a data warehouse), and Dataset. Power BI Datamart also comes with a unified editor in the Power BI Service. Power BI Datamart is more like a container around other components of Power BI (Dataflow, Dataset, and Azure SQL Database).

Power BI Datamart

To learn more about Datamart, read the article below;

The Difference between Dataflow, and Dataset

Datamart, Dataflow, and Dataset are Power BI components that store and work with the data. Now that you know the definition let’s talk about the difference between the three components.

Dataflow is the Power Query component.

Dataflow is decoupling the Power Query logic and code from the Power BI file so that it can be used in multiple files. You will have the ability to get data from many different data sources, do the transformations using Power Query online and get the data in the shape you want, set a scheduled refresh for it, and load it into storage options (Azure Data Lake storage, or Dataverse)

Dataflow is the Power Query component.

Dataflow is not just for Power BI

Dataflow is the only component of the three above that can also be created outside of Power BI. You do not need a business intelligence or data analysis requirement to use Dataflow. Sometimes you may need Dataflow for just data integration; For example, you may want to get data from some source systems, transform it and store it into data storage. This might be for other applications to use.

Dataflow in Power BI might be used for data analysis purposes, but you can also create dataflow in Power Platform under the Power Apps portal.

Dataflow in Power Apps

Dataset is a replacement for DAX Calculations and Relationships.

Using a shared dataset, you can re-use the DAX calculations and relationships you have created for one model in the other Power BI files. If you want to re-use the DAX measures, calculated columns and tables, the hierarchies, field-level formatting, and relationships defined in your model for multiple files, then Shared Dataset does that for you. You can have multiple reports connecting to it and re-using the data model.

Shared Dataset is the replacement of your DAX and calculation

Why not DirectQuery from the source instead of Dataset?

The question that normally comes to mind is; what if you have a Data warehouse already? or even if there is no Data warehouse, let’s say, what if you consider the storage of your Dataflow as a data warehouse? Can’t you connect DirectQuery to that? Isn’t Power BI Dataset an unnecessary layer on top of that? Why do you need Power BI Dataset in those cases?

Power BI Dataset uses in-memory engine storage for the data. The in-memory storage for the data ensures the best performance in the report and visualization, as the interaction in the report would be the fastest. it also brings a powerful calculation language called DAX to help with some analytical requirements and calculations. So even if you already have a data warehouse, I would still highly recommend using a Dataset on top of that.

Dataset for a better performance

Dataflow is the ETL Layer

Dataflow is the Data Transformation layer in your Power BI implementation. The terminology for this layer is ETL (Extract, Transform, Load). This will extract data from data sources, transform the data, and load it into the CDM.

Dataflow is the ETL Layer

Dataset is the Modeling Layer

Dataset is the layer of all the calculations and modeling. It will get data from the Dataflow (or other sources) and build an in-memory data model using the Power BI (Analysis Services) engine.

Dataset is the modeling layer

Dataflow Feeds Data into the Dataset

The result of dataflow will be fed into a dataset for further modeling; a dataflow by itself is not a visualization-ready component.

Get Data from Dataflow

Dataset Feeds Data into Visualizations

Because the dataset is an in-memory model built and ready for visualization, the result that usually used directly to build a visualization;

Get data from Power BI Dataset

Dataflow Access the Data Source Directly

Unless you use a linked entity or computed entity, a Dataflow usually gets data directly from the data source.

Data sources for Dataflow

Dataset Can Access the Data from the Dataflow

Although a Dataset can directly get data from a data source, it is a best practice that a shared Dataset gets the data from Dataflows. This is to have a multi-developer implementation of Power BI.

Dataflow Developer Needs Power Query Skills

One of the reasons to use dataflows and shared datasets is to decouple the layers, so you have multiple developers building the Power BI solution at the same time. In such an environment, the skillset needed for a Dataflow developer is all about Power Query and how to build Star-Schema, etc. No DAX or Visualization skills are required for a Dataflow developer.

Dataset Developer Needs DAX and Modeling Skills

On the other hand, the Dataset developer needs to know everything about the relationships in Power BI and calculations in Power BI using DAX. Although the Dataset developer can know Power Query and visualization, it is not his/her primary skill.

Users of Dataflow are Data Modelers

Dataflow’s result can be used for data modelers. It is not a great approach to give the output of Dataflow to report visualizers. Because the Dataflow still has to be loaded into a model with proper relationships and calculations added to it.

Users of Dataset are Report Visualizers

The result of a dataset is ready for report visualizers. They can have a live connection simply to the shared Dataset and build their visualizations from it.

Dataflow solves the problem of having multiple versions of the same table in different PBIX files.

Using the Dataflow, you reduce the need to copy and paste your Power Query script into other files. You can re-use a table in multiple files.

Dataset solves the problem of having multiple versions of the same DAX code in different PBIX files.

Using a shared dataset, you can have multiple reports using the same calculations and data model without duplicating the code.

Dataflow Vs. Dataset

Dataflow and Datasets are not the replacement for each other.

Dataflow Vs. Dataset

They are two essential components of Power BI and have their own places in the Power BI architecture for a multi-developer scenario.

Dataflow and Dataset are not the replacement of each other, they are the compliment for each other.

Power BI Solution architecture with Dataflow and Dataset

Dataflow Vs. Dataset Video

What about Datamart?

You learned a lot about the difference between Dataflow and Dataset. However, what about the Datamart? Shouldn’t that be in the comparison list?

Power BI Datamart is more like a container of components rather than a single object itself. When you create a Power BI Datamart, you are creating a Dataflow, an Azure SQL Database, and a Dataset. This means that Datamart is already having all the benefits mentioned for the Dataflow and Dataset in it. It also has an extra component: storing data in the Azure SQL Database. After processing by the Dataflow, the data is loaded into the Azure SQL Database. Some call this a data warehouse, and some may even call this a data mart. But Power BI Datamart includes all of these three components: Dataflow, Azure SQL Database, and Dataset.

Power BI Datamart components

Power BI Datamart also brings the unified web UI to build and manage all these components. With the appearance of the Datamart, the Multi-developer architecture of Power BI looks more like the below;

Power BI solution architecture using Datamart

Now the question is would Datamart replace the Dataset and Dataflow? Let’s find that out.

Is Dataflow getting replaced by Datamart?

No. Certainly not. Dataflow is a component by itself. As mentioned earlier in this article, you can build and use a Dataflow without needing a BI solution. Datamart is normally useful when you are building a BI solution. You may just want to integrate some tables into storage and re-use them in other applications; in that case, the Dataflow by itself can be used.

Another use case of Dataflow by itself is that even if you have a Datamart, you may still want to implement multiple layers of Dataflow for staging and transformation. Multiple layers of Dataflow is an effective technique in data source isolation and re-using Dataflow components. I have explained that in the article below;

Multi-layered Dataflow Architecture

Is Dataset getting replaced by Datamart?

The answer to this question is also No. Although it is now easier to create a Dataset from the unified UI of the Datamart, still that won’t reduce the need for the Dataset to be a separate component of its own. There are plenty of use-cases for a Dataset as a component of its own. Imagine you are implementing an architecture in which the data transformation is done using another service (such as Azure Data Factory), and the data warehouse is in Azure Synapse. You may and can still use Power BI Datasets to build the data model and calculations on top of that without building a full Datamart.

Another use case is that even if you use a Power BI Datamart, you may still create chained Datasets on top of your existing Dataset. These chained Datasets are DirectQuery to Power BI Dataset (which in this case is part of a Datamart), but can have other data sources combined in it. Chained Datasets are a very useful method of working with self-service data analysts in an enterprise organization.

Power BI Chained Dataset

Which one should I use? Datamart, Dataflow, or Dataset?

Now is time for the million-dollar question; which one of these components should you use? Each of these comes with some benefits, as you see above. Let’s answer that through a scenario;

James is a BI developer who is building a new BI solution. His implementation includes stages such as; getting data from the source, doing the transformation, loading it into a data warehouse, writing calculations in the model, and visualization. Power BI Datamart enables him to build most of that in one unified structure but still a multi-layered approach that can be easily maintained in the future.

After some time, James realizes that the data transformation side of his implementation is heavy. He wants the data transformation to be isolated from the data source so that just in case the source changes, his solution still works with minimal changes required. So he uses more Dataflows as staging and transformation Dataflows in his architecture.

James has some other colleagues who are business analysts in other departments. They want to use the Dataset that James provided as the main source but add more data sources into that and some calculations too. They create Chanied Datasets on top of that.

The above scenario is a scenario that uses all of these three components in an architecture. Choosing which components you need will be mainly based on what re-usable components you have and where you want to use them.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

3 thoughts on “Power BI Datamart Vs. Dataflow Vs. Dataset

  • Hello Reza, I truly appreciate this extremely useful and informative post you have provided. As always, excellent job! I do have one question. If I used Power-BI desktop to bring in the data from the source and performed my ETL using Power Query, performed my modeling and created my measures and then published the dataset into Power BI Service, couldn’t that serve as my one source of truth for users to consume the data in their reports?

    • Hi Allen
      Thanks for your kind words.
      That works, It would be like a shared dataset. those who wants to extend your model will be able to use DirectQuery to your dataset, or just live connection for building reports.
      However, for those who want to build totally different models but with your data tables, then the dataflow works better. They might just need your date table for example, or the customer table.
      Cheers
      Reza

Leave a Reply