Datasets in Power BI can have connection types such as Import, DirectQuery or Live Connection. However, there is also one specific type of dataset which is different. This type of dataset is called Streaming Dataset. A streaming dataset is for a real-time dashboard and comes with various setups and configurations. In this video and article, we’ll talk about this type of dataset.
Video
Real-time Vs. Live or DirectQuery
The first question I get most when I start explaining the real-time dataset is; “What is the difference between real-time Vs. Live or DirectQuery?” The common belief is that the Live connection is real-time. This belief is not true when you learn the definition of these two types of connections. One of the best ways of understanding it is as below;
- In a Live Connection or DirectQuery connection report or dashboard; anytime you REFRESH the visual (which can be done either by refreshing the report itself or by clicking on a visual or slicer that triggers a query from the data source), you get the most up-to-date data. The report or dashboard can also be SCHEDULED to refresh automatically so that you have the most up-to-date data every 15 minutes (or whatever your scheduled frequency set up is).
- A real-time dashboard will update automatically as soon as the new data row appears in the dataset. This new data row can come from various methods, such as pushing through REST API of Power BI, Stream Analytics or other streaming services such as PubNub. A real-time dashboard does not need a scheduled refresh.
The definitions above still have many details in each aspect, but the main concept is what was mentioned already. One big difference you can spot in the two types of connections is that a real-time dataset won’t wait for a refresh, it would push that data directly to the dashboard, and the dashboard shows that change immediately. In a Live or DirectQuery connection, you get the most up-to-date data whenever you ask for it, and that is by refreshing the report (Although, even if you don’t refresh the report, perhaps you are looking at the data that is refreshed less than an hour ago).
A real-time dataset is mainly used for scenarios where the change has to be visible as soon as it happens. Let’s say we have a temperature sensor in a room that monitors the temperature of the room. We want to have a Power BI dashboard that shows the temperature changes in a line chart as soon as they change. A real-time streaming dataset would be needed here.
Live or DirectQuery connections are mainly needed when you want up-to-date data. If you do not have an IoT device that sends the data as soon as it appears, and your purpose is just to see the most up-to-date data in your report and dashboard whenever you look at it, then the Live or DirectQuery connections are your answers. There are differences between the Live Connection and DirectQuery, which I explained in other articles.
The common pattern seen in a reporting requirement is that the business asks for a real-time dashboard, but they are really after a Live or DirectQuery connection. The definitions and the differences above should be used as your reference when making that decision. As a Power BI architect, you should understand the requirement, read between the lines and come up with the best solution for that requirement.
For the rest of this article, you will learn how to set up and use the real-time streaming dataset in Power BI and its configurations.
Sample scenario
For the real-time streaming scenario of this article, I’ll explain the procedure below and how to implement it;
- A Microsoft Form is shared with users to get feedback on their Power BI roles and salaries.
- Power Automate is used to pass the feedback posted in the form (as soon as it arrives) to the Power BI streaming dataset.
- Power BI streaming dataset is receiving the feedback and presenting it in real-time on a dashboard.
The scenario above might not be the most common scenario for using the streaming dataset. Still, it is one of the simplest to understand because it doesn’t require much coding, and components of it all can be easily accessed without needing a special license for those tools and services.
Microsoft Form; capturing the inputs
I used a Microsoft Form to capture the input of users like the below;
The form above is what I used for the Power BI 2021 Salary Survey. You can create a form like this in Microsoft Forms by going to this URL; https://forms.office.com/.
Power BI Streaming Dataset
To capture the form’s data and send it to Power BI, we need to create a streaming dataset with the same set of fields. A streaming dataset cannot be created in the Power BI Desktop. You have to log in to the Power BI Service and create the streaming dataset from there;
There are three ways to create a streaming dataset in Power BI;
It is important to understand how the Power BI Streaming dataset works behind the scene, which I will explain later in this article. But for now, consider the three methods above as three different data streaming service providers; Azure Stream Analytics, PubNub, and Power BI Rest API. We will be using Power BI Rest API for this example.
In the next step, you need to set a name for this streaming dataset and set up the fields and their data types.
The data type for fields is a limited set of Text, Number and Datetime. The streaming dataset is different from a normal Power BI Dataset.
Once you create the fields, you can see a sample record generated underneath it in JSON format. This is the format that record data shall be sent from other applications to this dataset. You can also see a Historical data analysis option. Set this to true. This means that data is not only streamed through this dataset but is also stored. I will explain this later in detail.
When the dataset is created and ready, you will see the Push API URL and the format of sending data to it as a sample.
The Push URL is the API URL that other applications should use to pass data rows (or push them) into this dataset. We will be using Power Automate to push data to this dataset.
Power Automate to Push data into the Power BI dataset.
You can use many applications to push data into the streaming dataset, such as a C# application, PowerShell script etc. However, for simplicity, we use Power Automate here. Power Automate can be accessed using this URL; https://make.powerautomate.com/. Power Automate is a service in the Power Platform toolset for the If-Then-Else flow definition. Here, we will use it to set up a flow that If there is an entry in the form, then push that record to the streaming dataset in Power BI.
You can use the template below in Power Automate, which has the process we want. You will just need to enter your credentials and environment details in it.
The flow definition is simple. It has three steps; connecting to the Microsoft Form when a new response is submitted, reading the response, and pushing it to the Power BI dataset;
As you can see, the “Add rows to Power BI dataset” is the component that generates the data row and pushes it to the streaming dataset in Power BI. This flow will run automatically as soon as an entry to the Microsoft Form has been completed (Because the trigger is “When a new response is submitted”)
Power BI real-time dashboard
The last step is creating a Power BI Dashboard to show the real-time changes. This dashboard would be similar to any dashboards in Power BI. The only difference is that it is sourced from the streaming dataset. The dashboard can be created from a report, or it can be created as a blank, and then you can add a tile connecting to a streaming data source. Then you can select the streaming dataset for it.
There are limited charts and visuals you can use for a streaming dataset tile, which are listed below;
Testing the solution
You can now test the solution. Go to your form, enter a result, and see the result immediately appear in the Power BI dashboard (depending on what part of the result you have chosen to be shown in the dashboard).
Streaming Dataset Types
One of the most important aspects of a streaming dataset is understanding the types of datasets. The streaming dataset can have three types; Streaming, Push, and Hybrid.
Push dataset
When this dataset is created, the Power BI service automatically creates a database for it to store the data. For this type of real-time dataset, you can create reports.
Streaming dataset
With this dataset, the data is only stored in a temporary cache, which would expire. This dataset has no underlying database. You cannot create reports from this dataset; you can only create a dashboard.
Hybrid dataset
This dataset is the combined version of Push and Streaming. With this dataset, there will be a temporary cache for the data that is coming and going, and there will also be a database to store the data. You can create this by enabling the “Historic data analysis” option when creating the streaming dataset. This type of dataset is what we used in our scenario above.
Understanding the three types of datasets above is important because if you want to also store the streaming data for further report creation and data analysis, then a Hybrid dataset is what you need instead of a streaming dataset.
Streaming Services
You can use three different services as a real-time dataset for Power BI, which is listed in below;
- Power BI streaming dataset and REST API
- Azure Stream Analytics
- PubNub
Power BI Streaming dataset and REST API
The Power BI streaming dataset with Power BI REST API is what we have used in the sample scenario above. This is using only Power BI objects and services. You won’t need any additional services or licenses. However, there are some limitations. For example, you can have up to 1 million rows of data pushed every hour with the Push dataset; the request size also has some limitations.
Using the Power BI REST API does not necessarily mean the Power Automate. You can call the API using any other application. I have an article here and here explaining how to use a C# application to push data to a streaming dataset in Power BI.
Azure Stream Analytics
Azure Stream Analytics is the data streaming service of Microsoft Azure. It can be used for Power BI and many other tools and services in Microsoft toolset. You can use this service to capture data input from IoT devices, for example, and pass part of it to Power BI and another to Azure Machine Learning for data mining. I have written an article about a sample solution with Azure Stream Analytics and Power BI here;
PubNub
PubNub is a streaming service that is not only for Microsoft. It integrates with many tools and services. You can check the website for the details of how their service works.
Creating calculations on real-time datasets
A streaming dataset is not like a normal Power BI dataset. In a normal Power BI dataset, you can use Power BI Desktop to write DAX calculations and use Power Query to transform the data. A streaming dataset, however, is different. When you work with a streaming dataset, you are limited. There are some workarounds in writing calculations, which I explain here.
Writing calculations using Q&A.
The first workaround is to use Q&A to write some simple calculations, as I explained in the article below;
Live connection and DAX
The second method is to create a report with a live connection to the Power BI dataset. This type of connection won’t be real-time anymore, and it works when we have a Push or Hybrid dataset.
Summary
A real-time streaming dataset is different from a Live or DirectQuery connection. Using this dataset, you can see the change immediately after the data changes. The push data API or other streaming services can help you to achieve this. The streaming dataset can also store the data in a database for further analysis. A streaming dataset is not like a normal Power BI dataset; it comes with limitations. It is important to know when to use a real-time streaming dataset Vs. Live or DirectQuery datasets.
Hi Reza,
Thanks for the insightful post.
Is it the same to create a real time dashboard in Fabric?
Many thanks!
Best regards,
Jack
Hi Jack
This is using a different technology than the Fabric real-time analytics. Fabric realtime is a streaming technology by itself, it uses EventStream, KQL databases, but the Power BI real time streaming is using other APIs such as Power BI API, PubNub or Azure stream analytics.
I have another article on the Fabric real-time analytics which you can read here.
Reza