I have presented about Power BI dataflow and datasets a lot, and always one of the questions I get is: What is the difference between dataflow and dataset. So I thought better to explain it in a post and help everyone in that understanding. In this post, you will learn what the differences between these two components are, when and where you use each of them, and how they work together besides other components of Power BI.
What is Dataflow?
Power BI Dataflow is the data transformation component in Power BI. It is a Power Query process that runs in the cloud, independent from Power BI report and dataset, and store the data into CDM: Common Data Model inside Azure Data Lake storage.
What is Dataset?
Power BI Dataset is the object that contains the connection to the data source, data tables, the data itself, the relationship between tables, and DAX calculations. Usually, Power BI dataset is hidden from the Power BI Desktop view, but easily can be seen in the Power BI service.
I have a set of articles on both dataflows and shared dataset, which I highly recommend you to read to get more information about it:
The Difference between Dataflow and Dataset
Now that you know the definition let’s talk about the difference between the two components.
Dataflow is Replacement of your Power Query
Dataflow is decoupling the Power Query logic and code from the Power BI file so that it can be used in multiple files
Dataset is Replacement of DAX Calculations and Relationships
Using a shared dataset, you can re-use the DAX calculations and relationships you have created for one model in the other Power BI files.
Dataflow is the ETL Layer
Dataflow is the Data Transformation layer in your Power BI implementation. The terminology for this layer is ETL (Extract, Transform, Load). This will extract data from data sources, transform the data, and load it into the CDM.
Dataset is the Modeling Layer
Dataset is the layer of all the calculations and modeling. It will get data from the dataflow (or from other sources), and build an in-memory data model using Power BI (Analysis Services) engine.
Dataflow Feeds Data into the Dataset
The result of dataflow will be fed into a dataset for further modeling; a dataflow by itself is not a visualization ready component.
Dataset Feeds Data into Visualizations
Because the dataset is an in-memory model built and ready for visualization, the result of that usually used directly to build a visualization;
Dataflow Access the Data Source Directly
Unless you use a linked entity or computed entity, a dataflow usually get data directly from the data source.
Dataset Can Access the Data from the Dataflow
Although, a dataset can directly get data from a data source, however, it is a best practice that a shared dataset gets the data from dataflows, this is to have a multi-developer implementation of Power BI.
Dataflow Developer Needs Power Query Skills
One of the reasons to use dataflows and shared datasets is to decouple the layers, so you have multiple developers building the Power BI solution at the same time. In such an environment, the skillset needed for a Dataflow developer is all about Power Query and how to build Star-Schema, etc. No DAX or Visualization skills required for a Dataflow developer.
Dataset Developer Needs DAX and Modeling Skills
On the other hand, the dataset developer, need to know everything about the relationships in Power BI, calculations in Power BI using DAX. The dataset developer, although can know Power Query and visualization, it is not his/her primary skill.
Users of Dataflow are Data Modelers
Dataflow’s result can be used for data modelers. It is not a great approach to give the output of dataflow to report visualizers. Because the dataflow still has to be loaded into a model with proper relationships and calculations added to it.
Users of Dataset are Report Visualizers
The result of a dataset is ready for report visualizers. They can have a live connection simply to the shared dataset, and build their visualizations from it.
Dataflow solves the problem of having multiple versions of the same table in different PBIX files
Using the dataflow, you reduce the need to copy and past your Power Query script into other files. You can re-use a table in multiple files.
Dataset solves the problem of having multiple versions of the same DAX code in different PBIX files
Using a shared dataset, you can have multiple reports using the same calculations and data model, without duplicating the code;
Dataflow and Datasets are not the replacement of each other.
They are two essential components of Power BI and have their own places in the Power BI architecture for a multi-developer scenario.
Dataflow and Dataset are not the replacement of each other, they are the compliment for each other.
I highly recommend you to study the links below to learn more about these two;