One of the questions I often get these days is what are the benefits of Microsoft Fabric for Power BI users and developers. Questions like: “If I am not a data engineer or database developer, what are the use cases for Fabric for me?” or “What is in Fabric that I can leverage in my Power BI solution as of now, something that I cannot do merely with Power BI?”
Answering these questions brings us to the point of the direct impact of Microsoft Fabric for Power BI users. In this article, I will explain that and go through the components you can use from the Power BI point of view; these would be features and components you can use in your existing Power BI solution and take it to the next level.
Video
What is Microsoft Fabric?
Microsoft Fabric is a full suite of analytics as a service, which runs as a SAAS in one platform, under one licensing, and provides everything a data analytics project needs. Microsoft Fabric utilizes multiple workloads to achieve this: Data Integration, Data Engineering, Data Warehousing, Databases, Real-time Analytics, Business Intelligence, Data science, etc.
To learn more about Microsoft Fabric, read this article here.
Some of the challenges in Power BI implementations
Although the general feeling is that Power BI is good enough for all analytical needs, the reality is that to do a data analytics project end to end, Power BI is not enough. There are limitations to what it can do and achieve (like many other tools); here are some of the shortfalls of Power BI when used just by itself in an analytical project;
- Implementing data warehousing concepts such as SCD Type 2
- refreshing your semantic model after the dataflow refresh
- copying data from a table if you want to re-use it
- Import mode needs refresh, DirectQuery is slow
- It is not easy to extend it for data engineers or data scientists
- an alert system that can be customized based on the frequency of data changes or perform an action when something happens
Of course, there are workarounds to fix any of the abovementioned issues, but the good news is that you don’t need workarounds if you use Fabric. One of the immediate advantages of Fabric is that there is a solution for everything mentioned above.
Resolution for those challenges in Fabric
Because Microsoft Fabric comes with a suite of services and they all integrate in a meaningful way, it means you can resolve the issues mentioned previously using the below;
- Warehouse or Lakehouse for SCD Type 2
- Data Pipeline for orchestration of the process for refresh
- Shortcut instead of copy
- DirectLake mode doesn’t need a refresh but is near DirectQuery for freshness.
- Data engineers and scientists can easily use Lakehouse, Warehouse, and other components.
- a more powerful alert system using Data Activator
So now let’s find out what things you will benefit from straightaway in Fabric if you are a Power BI developer.
Wins for Power BI Developer in Fabric
In this section, I focus on things you can immediately benefit from using Fabric if you are a Power BI developer. I won’t talk about data science, for example, as there is a learning curve for it, and although it is persuasive, it will take a while for you to get your head around it and get a meaningful outcome for it. However, I will talk about using something like Data Pipeline, which you can benefit from in just a few minutes of setting up, and your Power BI implementation will improve.
Things that immediately improve your Power BI implementation in Fabric, in my opinion,n are;
- Data Factory (Dataflow and Data Pipeline)
- Warehouse or Lakehouse
- Shortcut
- DirectLake
Now, let’s see what each of these are and how they can be helpful.
Dataflow: ETL as a separate layer
Dataflow isn’t a new concept in Power BI. Some years have passed since my very first blog article about Dataflow in Power BI. Dataflows give you benefits by separating the ETL component of Power BI so that you can leverage the work done in Dataflow in more than one Power BI solution.
Dataflows in Power BI (Gen1 dataflows) export the data into CSV files stored in ADLS Gen2, which is not necessarily the best way to keep structured data. Dataflows in Microsoft Fabric (Dataflow Gen2) can export data into Lakehouse, Warehouse, Azure SQL database, or KQL Database. These are much more structured data storage places capable of using T-SQL. If you want to learn how to upgrade your Gen1 dataflows to Gen2, read my article here.
More features will come in Dataflow gen2 in the future, and it makes sense to move to this new edition. Dataflow uses the Power Query engine behind the scene and is one of the components of Data Factory in Fabric.
Data Pipeline: Orchestration, Control Flow Execution
Suppose you have multiple dataflows, one for the dimension tables in the warehouse and one for the fact tables, and you want to run the latter only if the first one is executed successfully. After they both execute correctly, you want to refresh the Power BI semantic model. Data Pipeline is a component in Data Factory that enables you to do a scenario like that. In the Data Pipeline, you can set the following activity to run based on the output state of another activity.
This is a simple-to-use tool, which you can achieve wonders just by utilizing this structure, as you can see in the above screenshot; based on the outcome of the execution of Dataflow1, I can choose to run the next activity, which can be another Dataflow, or refreshing a semantic model, or many other activities from the list of activities there;
Read more about Data Pipeline in my article below;
The combination of Data Pipeline and Dataflow will give you a full ETL implementation, which immediately enhances the Power BI implementation.
Warehouse or Lakehouse: Persistent Structured Dimensional Model Storage
There is no need to say that a structured database system will be helpful for a Power BI solution. Although Power BI stores data in the xVelocity engine (file format and then loaded into memory for analysis), implementing things such as Slowly Changing Dimension or Late Arriving Fact tables or some of the more complex data warehousing requirements needs a structured data warehouse storage.
Lakehouse and Warehouse both offer database storage. The data will be stored in the format of tables, and the T-SQL querying language can be used to work with the data. Because the data is stored there, you can quickly implement any requirement for data warehousing, such as the SCD Type 2, Inferred dimension member, or anything else needed.
Read this article to learn more about Lakehouse and where to start for it;
Read this article to learn more about Warehouse;
There are differences between Lakehouse and Warehouse that I explained in the article below;
Adding a Lakehouse or Warehouse to your Power BI implementation is one of your best wins. If there is only one recommendation I want you to use, it is Lakehouse or Warehouse for your Power BI solution.
Shortcut
Often, you want to reuse a table you have created in another model. This happens a lot with common tables such as Date, Time, Currency, Account, Customer, or most of the dimensions. Although you can get data from it using Dataflow, it still copies the data into Power BI and duplicates it instead of keeping it in one place as a single version of truth.
Shortcut is a straightforward yet effective feature in OneLake, which is incorporated in Lakehouse for this purpose; instead of copying the data, using Shortcut, you access another table as if it is already in this Lakehouse. The Shortcut is just a link to that table, not a copy.
This feature also shows how effective Lakehouse can be for the reusability of your Power BI implementation. Read more about shortcuts in my article below;
DirectLake: Cutting Edge Technology
If you Import the data into Power BI, Although you gain performance, you will need to schedule the refresh of the semantic model to get the data up-to-date. If you use DirectQuery, you won’t need to refresh the semantic model, but sending the queries to the data source and waiting for the results to come back kills the performance of the Power BI solution.
DirectLake is the new technology feature added to Power BI. This happens when Power BI connects to the data stored in Delta Lake format in Lakehouse or Warehouse (and Database). When Power BI connects to those, it considers the parquet files as its proprietary file formats and won’t re-create the file structure; this means you can have the fresh data without needing to import it, and still, it performs at a swift pace similar to Import, in short; DirectLake is like DirectQuery with the performance of Import Data.
Direct Lake connections are the newest type of connection in Power BI. To learn more about it, read my article here:
Although Direct Lake semantic models won’t need refreshing, there is a REFRAME process happening, to learn more about it, read my article here:
Summary
Each part of Microsoft Fabric is helpful in Power BI implementation; Data Science, real-time analytics, and Data Activator are beneficial. However, what I mentioned in this article won’t need to expand your requirement in Power BI; it is just taking your current Power BI implementation and making it better organized, and being able to maintain it easier using Fabric features such as Data Pipeline, Data Warehouse or Lakehouse, Shortcut, and Direct Lake.
I teach Microsoft Fabric to Power BI Developers often, and I provide consulting to help them with their day-to-day Microsoft Power BI and Fabric challenges; if you want to learn Fabric and Power BI in a structured way, check out our upcoming training sessions here.
Here are some of the helpful links related to this article;
- What is Microsoft Fabric, and why is it a big deal?
- What is the Data Factory in Fabric?
- Using Dataflow Gen2 for your ETL
- Data Pipeline for orchestration
- Migrated Dataflow Gen1 to Gen2
- What is a Lakehouse
- What is a Warehouse
- Lakehouse VS Warehouse VS Datamart
- Shortcut instead of Copy
- Delta Lake table structure
- DirectLake in Power BI: Fast and Up-to-date
- Reframe in DirectLake