What are Power BI Dataflows and their Use Cases?

Dataflows are an important component in the Power BI architecture. Using them can enhance the development and maintenance of your Power BI solution significantly. However, there are many Power BI implementations that are not using this functionality, even though it is already more than three years passed since it was released. In this blog and video, you will learn what is dataflow and why you should use it.

Video

What is Dataflow?

Dataflow is the data transformation service that runs on the cloud independent of the Power BI dataset or solution. This data transformation service leverages the Power Query engine and uses the Power Query online and UI to do the data transformation.

One might say, well, the Power BI dataset, when published to the service, also runs the data transformation online. would it make it a Dataflow? the answer is No. Because the Power Query which is part of the Power BI dataset is loading data into the dataset directly. Where in Dataflow, the destination is not a dataset, it can be Azure Data Lake Storage, Dataverse (or some other storage coming later). This makes the Dataflow the independent data transformation component of Power BI.

How do dataflows function.
Dataflow is the data transformation in the cloud independent of the Power BI dataset

Dataflow is a service-only (cloud-only) object

You can not author or create Dataflows in the desktop tools such as Power BI Desktop. The only way to create a Dataflow is to do it in the cloud. In the Power BI service, you can do it in a workspace. The Dataflow created in the service can be used in the desktop tools (to connect and get data).

Creating an analytical dataflow in Power BI.
Creating Dataflows

Dataflow is not just for Power BI

Also, in the Power BI world, we call them Power BI dataflows. However, Dataflows are not just for Power BI. You can create Dataflows in Power Apps (these days called Power Platform Dataflows).

Creating a standard dataflow.
Dataflows created in Power Apps

Calling the dataflows by the platform in which you create them is not a common categorization of Dataflows. Instead, Dataflows are in the two most common categories of Standard and Analytical.

Where the data is stored for Dataflow?

Dataflow does not store the data in the Power BI dataset. As you read earlier, the Dataflow acts independently of the Power BI dataset. So there must be another storage for the dataflow objects. The storage for Dataflow can be Azure Data Lake Storage, or Dataverse (and there are other storage options that will be available later, such as SharePoint or Azure SQL database based on the release wave 2 plan in 2022).

You don’t need an extra license to get the data storage option for Dataflow. If you use the Power BI Pro license, then there will be an internal Azure Data Lake storage available for your Dataflow tables. If you use Power Apps licenses, then you will have some storage available in the Dataverse to use for your Dataflows.

Other Microsoft services, such as Power Platform or other services, can then connect to the Dataflow.

Dataflow integration with Microsoft Power Platform and Dynamics 365.
Dataflow stores data into Azure Data Lake Storage or Dataverse

Standard vs. Analytics Dataflows

A big categorization for Dataflows is the Standard vs. Analytical. This categorization is not only based on the storage option of the Dataflow but also based on some of the functionalities available in each option too. Analytical Dataflows give you more analytical power such as Computed entity and AI functions in the Dataflow. The Standard Dataflow stores the data into Dataverse only. There are a few other differences too. Here is a summary;

OperationStandardAnalytical
How to createPower Platform dataflowsPower BI dataflows
Power Platform dataflows by selecting the Analytical Entity checkbox when creating the dataflow
Storage optionsDataversePower BI provided Azure Data Lake Storage for Power BI dataflows, Dataverse provided Azure Data Lake Storage for Power Platform dataflows, or customer provided Azure Data Lake storage
Power Query transformationsYesYes
AI functionsNoYes
Computed entityNoYes
Can be used in other applicationsYes, through DataversePower BI dataflows: Only in Power BI
Power Platform dataflows or Power BI external dataflows: Yes, through Azure Data Lake Storage
Mapping to standard EntityYesYes
Incremental loadDefault incremental-load
Possible to change using the Delete rows that no longer exist in the query output checkbox at the load settings
Default full-load
Possible to set up incremental refresh by setting up the incremental refresh in the dataflow settings
Scheduled RefreshYesYes, the possibility of notifying the dataflow owners upon the failure
Standard vs. Analytical Dataflows

Dataflow is powered by Power Query Online

Dataflow is powered by Power Query Online. Every transformation is done by the Dataflow engine, and the UI provided for doing the transformation is the Power Query Editor online. There are more than 80 different data sources available to get data from using the Dataflow.

Data sources for the Dataflow

The online Power Query Editor enables you to do most of the transformations through the GUI. However, you can also use the Advanced Editor and work with the M script directly.

Power Query Editor online

Dataflow can be used in Power BI, Excel, and some other services

Depending on the type of the Dataflow, you can get data from it in Power BI Desktop (or Power BI Dataset), In Excel and some other services. This makes the Dataflow a fully-independent component on its own.

Get Data from Dataflow in the Power BI Desktop

And in Excel the same thing is available;

Get data from Dataflow in Excel

The use cases of these are what you are going to learn in the next section.

Why should you use Dataflow? Example Scenarios

Now comes the big question of why should someone use a Dataflow? what is the point of it? I found it best to explain it using an example;

Using One Power Query Table in Multiple Power BI Reports

Have you ever had the need to use one Power Query table in multiple Power BI Reports? Of course, you did. If you worked with Power BI for some time, you know that tables generated through Power Query are only part of one Power BI file. If you want to use the same table in another file, with a combination of some other tables which is not in the first file, then you would need to replicate the Power Query transformations (or copy and paste the M script) into the new *.pbix file. You may say, no I don’t, but here is an example: Date Dimension!

A table that is needed in multiple Power BI files

Date dimension is a table that you use in a *.pbix let’s say for Sales Analysis, and also in another *.pbix for Inventory reporting, and for HR data analysis *.pbix too. So what do you do in these situations? Copying the script for Date Dimension in all of these three files? What if after a year, you decided to add a transformation or a column to the date dimension? then you need to replicate this change in all *.pbix files that you have, otherwise, your code becomes inconsistent. It would have been much better if you did the transformation once, stored the output somewhere, and then re-use it. That is exactly what Dataflow can do for you!

Re-usable tables or queries across multiple Power BI files, are one of the best candidates for Dataflow.

processing the common table in the Dataflow and re-using it in multiple PBIX files

Different Data Sources with Different Schedules of Refresh

What if you have a dataset that includes two tables with totally different schedule options. For example, the Sales transactions table coming from the SQL Server database is changing every day, and you need to refresh this data every day. However, the mapping table used for some of the products and maintained by the product team is only changing every quarter. If you have both of these queries in one *.pbix file, then you have no other choice but to refresh at the maximum frequency needed which is once a day.

However, if there is a mechanism that can refresh the mapping table every quarter, apply all transformations needed, and store it in a table. Then on every day you just need to read it. Dataflow can do that for You; With one query running the data transformation script and loading it into a destination table. This can be scheduled for whatever plan you need.

Dataflow can run extract, transformation, and load (ETL) process on a different schedule for every query (or table).

You can have multiple Dataflows for different scheduled refresh settings

Centralized Data Warehouse

With the evolution of Power BI and other self-service technologies, many companies started to implement a BI system without having a data warehouse. However, if the number of BI systems increases, the need for a centralized data warehouse appears quickly. A data warehouse is a specifically designed database to store data in the format needed for reporting. In traditional BI systems, one of the phases of building a BI system, and one of the most important phases let’s say, is to create a centralized data warehouse. The ETL process will extract data from data sources, and load it into the centralized data warehouse. All reporting solutions, then use the data warehouse as the single source of truth.

Dataflow can be an important part of building a centralized data warehouse for your Power BI solution. You can build the structure you want through Power Query scripts in a Dataflow. Dataflow then runs those scripts and stores the data into output tables. Output tables of dataflow can act as a centralized data warehouse for your *.pbix files. Alternatively, you can have your own Azure Data Lake storage and configure it the way that you want, with the structure of tables that you want, and get dataflow to load data into those tables.

Dataflow can be the ETL engine, that fuels the centralized data warehouse in Azure data lake storage.

Dataflow can feed data into your Data Warehouse

Power BI Datamarts is the new component of Power BI, which can be a much better replacement for Dataflow if it is used for Data Warehouse purposes. I suggest strongly reading about it here.

Summary

In Summary, Dataflow is the data transformation engine of Power BI, which is independent of any other Power BI objects. Dataflow is powered by the Power Query engine and the Power Query Editor online. Using Dataflow, you can separate the ETL layer of the Power BI implementation from the rest of the work. Using Dataflows is highly recommended to re-use your existing tables in multiple files. Dataflow is not just for Power BI, then can be used in Excel too, and it can be created in Power Platform. Dataflows come in two categories of Standard and Analytical.

Learn more about Dataflows

Now that you know what the Dataflow is and why it is useful, Let’s see how you can use it. Read the next articles here to know more about it:

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

48 thoughts on “What are Power BI Dataflows and their Use Cases?

  • Thank you for useful article Reza.

    Can Power BI be used as ETL tool to gather, clean or transform data and then this pbix be used as Dataflow? (or rather add to dataflow data as raw as possible, apply all cleansing, transformations or relations and then create Datasets)
    The problem is quality of data and doubts on loading data not in shape as Dataflow (instead of loading everything as it is, especially coming outside from data warehouse)

    • Hi Kamil
      You can build your transformation logic in Power BI Desktop and then create a dataflow entity using that M code.
      There are different approaches to this problem. Some, transforms the data while bringing it in. Which may cause some issues considering the data quality. A better approach is to have like a staging environment, where you bring the data from the source as is, and then a data warehouse environment where you use that staging data as a source and do the transformation, etc. However, considering the amount of processing and possible space needed for these transformations and the data, you might need a Premium Power BI account.
      Cheers
      Reza

Leave a Reply

Your email address will not be published. Required fields are marked *