Dataflow Vs. Dataset: What are the Differences of these two Power BI Components

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

I have presented about Power BI dataflow and datasets a lot, and always one of the questions I get is: What is the difference between dataflow and dataset. So I thought better to explain it in a post and help everyone in that understanding. In this post, you will learn what the differences between these two components are, when and where you use each of them, and how they work together besides other components of Power BI.

What is Dataflow?

Power BI Dataflow is the data transformation component in Power BI. It is a Power Query process that runs in the cloud, independent from Power BI report and dataset, and store the data into CDM: Common Data Model inside Azure Data Lake storage.

What is Dataset?

Power BI Dataset is the object that contains the connection to the data source, data tables, the data itself, the relationship between tables, and DAX calculations. Usually, Power BI dataset is hidden from the Power BI Desktop view, but easily can be seen in the Power BI service.

I have a set of articles on both dataflows and shared dataset, which I highly recommend you to read to get more information about it:

What are the Use Cases of Dataflow for You in Power BI?

Getting Started With Dataflow in Power BI – Part 2 of Dataflow Series

What is the Common Data Model and Why Should I Care? Part 3 of Dataflow Series in Power BI

Linked Entities and Computed Entities; Dataflows in Power BI Part 4

How to Use Dataflow to Make the Refresh of Power BI Solution Faster

Move your Shared Tables to Dataflow; Build a Consistent Table in Power BI

Power BI Architecture for Multi-Developer Tenant Using Dataflows and Shared Datasets

Power BI Shared Dataset? How does it work? And why should you care?

Refresh Power BI Queries Through Power Platform Dataflows: Unlimited Times with Any Frequency

Workaround for Computed Entity in Power BI Pro: Dataflow in Power BI

The Difference between Dataflow and Dataset

Now that you know the definition let’s talk about the difference between the two components.

Dataflow is Replacement of your Power Query

Dataflow is decoupling the Power Query logic and code from the Power BI file so that it can be used in multiple files

Dataset is Replacement of DAX Calculations and Relationships

Using a shared dataset, you can re-use the DAX calculations and relationships you have created for one model in the other Power BI files.

2017-04-10_15h04_21

Dataflow is the ETL Layer

Dataflow is the Data Transformation layer in your Power BI implementation. The terminology for this layer is ETL (Extract, Transform, Load). This will extract data from data sources, transform the data, and load it into the CDM.

Dataset is the Modeling Layer

Dataset is the layer of all the calculations and modeling. It will get data from the dataflow (or from other sources), and build an in-memory data model using Power BI (Analysis Services) engine.

Dataflow Feeds Data into the Dataset

The result of dataflow will be fed into a dataset for further modeling; a dataflow by itself is not a visualization ready component.

Dataset Feeds Data into Visualizations

Because the dataset is an in-memory model built and ready for visualization, the result of that usually used directly to build a visualization;

Dataflow Access the Data Source Directly

Unless you use a linked entity or computed entity, a dataflow usually get data directly from the data source.

Dataset Can Access the Data from the Dataflow

Although, a dataset can directly get data from a data source, however, it is a best practice that a shared dataset gets the data from dataflows, this is to have a multi-developer implementation of Power BI.

Dataflow Developer Needs Power Query Skills

One of the reasons to use dataflows and shared datasets is to decouple the layers, so you have multiple developers building the Power BI solution at the same time. In such an environment, the skillset needed for a Dataflow developer is all about Power Query and how to build Star-Schema, etc. No DAX or Visualization skills required for a Dataflow developer.

Dataset Developer Needs DAX and Modeling Skills

On the other hand, the dataset developer, need to know everything about the relationships in Power BI, calculations in Power BI using DAX. The dataset developer, although can know Power Query and visualization, it is not his/her primary skill.

Users of Dataflow are Data Modelers

Dataflow’s result can be used for data modelers. It is not a great approach to give the output of dataflow to report visualizers. Because the dataflow still has to be loaded into a model with proper relationships and calculations added to it.

Users of Dataset are Report Visualizers

The result of a dataset is ready for report visualizers. They can have a live connection simply to the shared dataset, and build their visualizations from it.

Dataflow solves the problem of having multiple versions of the same table in different PBIX files

Using the dataflow, you reduce the need to copy and past your Power Query script into other files. You can re-use a table in multiple files.

Dataset solves the problem of having multiple versions of the same DAX code in different PBIX files

Using a shared dataset, you can have multiple reports using the same calculations and data model, without duplicating the code;

2017-04-10_15h04_21

Summary

Dataflow and Datasets are not the replacement of each other.

They are two essential components of Power BI and have their own places in the Power BI architecture for a multi-developer scenario.

Dataflow and Dataset are not the replacement of each other, they are the compliment for each other.

I highly recommend you to study the links below to learn more about these two;

What are the Use Cases of Dataflow for You in Power BI?

Getting Started With Dataflow in Power BI – Part 2 of Dataflow Series

What is the Common Data Model and Why Should I Care? Part 3 of Dataflow Series in Power BI

Linked Entities and Computed Entities; Dataflows in Power BI Part 4

How to Use Dataflow to Make the Refresh of Power BI Solution Faster

Move your Shared Tables to Dataflow; Build a Consistent Table in Power BI

Power BI Architecture for Multi-Developer Tenant Using Dataflows and Shared Datasets

Power BI Shared Dataset? How does it work? And why should you care?

Refresh Power BI Queries Through Power Platform Dataflows: Unlimited Times with Any Frequency

Workaround for Computed Entity in Power BI Pro: Dataflow in Power BI

Video

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

Leave a Reply