Power BI Architecture for Multi-Developer Tenant Using Dataflows and Shared Datasets

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Using Power BI is simple, however, using it properly, needs a good architecture. In a multi-developer tenant, the architecture of a Power BI implementation should allow multiple developers to work on the same solution at the same time. On the other hand, the Power BI architecture should be designed in a way that different layers of that can be decoupled from each other for better integration. In this article, I’m explaining how Dataflows and Shared Datasets can be helpful in building such architecture in Power BI. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

Challenges of a Single PBIX file for Everything

Before I start explaining the architecture, it is important to understand what is the challenge and then think about how to solve it. The default usage of Power BI is involving getting data imported into the Power BI data model and then visualizing it. Although there are other modes and other connection types, however, the import data is the most popular option. There are, however, some challenges in a model and a PBIX file that has everything in one file. here are some;

  • Multiple developers cannot work on one PBIX file at the same time. Multi-Developer issue.
  • It would be very hard to integrate the single PBIX file with another application or dataset. High Maintenance issue.
  • All data transformations are happening inside the model, and the refresh time would be slower.
  • The only way to expand visualization would be adding pages to the model, and you will end up with hundreds of pages after some time.
  • Every single change, even small change in the visualization, means deploying the entire model.
  • Creating a separate Power BI file with some parts of it referencing from this model would not be possible, as a result, you would need to do a lot of duplicates, and high maintenance issue again.
  • and many other issues.

If you are the only Power BI developer in your organization, and you are working on a small model, which won’t grow into multiple Power BI files in the future, then you can keep going with having everything in one file. However, I do not recommend everything in one file if the scenario is different.

Dataflow to Decouple the Data Preparation Layer

Power Query is the tool for data preparation in the world of Power BI. Based on my experience and also what I have seen from other implementations, for a proper data model, you will need to spend 70% of your time on data preparation. That means if you are doing a Power BI implementation for 10 months, then seven months out of that was spent on data preparation with Power Query! You don’t want that time to be wasted and duplicate for another work. You want to maintain the efforts you have done for your data preparation.

Instead of doing your data preparation as part of a Power BI file, which then will link that work only to that file, you can do it through Power BI dataflow. Power BI Dataflow is running Power Query independently from Power BI files.  The independence of dataflow from Power BI file means that you can do data preparation for one entity only once, and use it multiple times! In other words, you can use dataflow to decouple the data preparation (or ETL) layer of the Power BI system, from the Power BI file. after doing the data preparation using dataflow, then Power BI files can get data from it using Get Data from Dataflow.

Here is the diagram of dataflows and Power BI files:

I have written many articles about Dataflows, which I strongly recommend you to read;

Shared Dataset for the Data Modeling Layer

Although dataflow creates a shared data source for Power BI models. Still, if you want to create multiple Power BI models, you then need to set field level formatting inside the Power BI file. You do need to create DAX calculations inside each file, and you need to create hierarchies and any other modeling requirements inside the Power BI model. Because the storage of dataflow is a file-based structure, which is totally different from the Power BI in-memory storage engine.

Instead of doing all the modelings, and calculations in each Power BI file separately, you can leverage the Shared dataset concept. Shared dataset concept means that you create a Power BI file with a model only, and no visualization pages.

Get Data from Power BI dataset for the Data Visualization Layer

After publishing that dataset, then you can use Power BI Desktop to get data from that dataset using the Get Data from Power BI dataset. This would be a live connection to the Power BI dataset hosted in the service.

Here is the diagram of the Power BI shared model;

The Architecture

As a result, here is the proposed architecture diagram

In this architecture, we are using layers as below;

  • Dataflow for the ETL or data preparation layer
  • Shared dataset for the data modeling layer
  • Get data from Power BI for the visualization layer

Benefits of this Architecture

Every architecture plan has pros and cons, let’s discuss some of the benefits of this architecture

Decoupling data preparation

The development time you put using Power Query will be preserved and can be used in other Power BI models. As the dataflow stores the data in Azure Data Lake. Decoupling the data preparation from the modeling means that you can apply changes in either side with the minimum effect on the other side. Moving shared data tables to Power Query dataflow is a recommended approach that I wrote about it in another article.

Faster Refresh Time for Power BI model

You move the heavy lifting work of Power Query to the dataflow and as a result, the part of Power BI dataset refresh would be a simple load only. I have written about it previously and compared through a scenario that how it can make the refresh time of the Power BI model faster.

Dataflow is not a core developer tool

You can use SSIS or Azure Data Factory or other ETL tools to take care of the data preparation. However, the power of Power BI resides in the simplicity of it. Everyone with fair knowledge of Power Query can use dataflow too. You don’t need to be a C# or SQL Server developer to work with that tool.

Multi-Developer Environment

The proposed architecture supports multiple developers at the same time on one Power BI solution. You can have multiple ETL developers (or data engineers) working on dataflows, data modeler working on the shared dataset, and multiple report designer (or data visualizers) building reports. They don’t need to wait for each other to finish their work and then continue. They can work independently.

Re-use of Calculations and Modeling

Having a centralized model will reduce the need for writing DAX calculations again and again. You will do the modeling once, and re-use it in multiple reports. On the other hand, you are sure that all reports are getting their data from a model which is reconciled and fully tested, or in other words, it is a gold data model.

Minimum redundancy, Maximum Consistency, Low Maintenance

A solution implementation that reduces the need for duplicating the code (using dataflows and shared dataset), and re-use the existing code (using get data from dataflow and datasets), will be highly consistent. Maintenance of such implementation would be possible with minimal effort.

What about Enterprise Architecture?

You might already use Power BI in an enterprise solution using Azure Data Factory or SSIS as the ETL tool, and SSAS as the modeling tool and Power BI live connection as the visualization tool. Such architecture is similar to what I mentioned here already. If you already have that architecture in place, I recommend you to proceed with that.

Summary

Implementing a Power BI solution is simple. However, maintaining it is not! If you want to have a low maintenance, highly scalable, consistent and robust Power BI implementation, then you need to have a proper architecture in place. In this article, I explained the benefits of a proposed architecture that leverages dataflow for data preparation layer and shared dataset for data modeling layer. The proposed architecture will have the minimum maintenance efforts, it is highly scalable. And I strongly recommend you to consider using it in your implementations. Do you have any thoughts about this architecture or any questions? please let me know in the comments below.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for eight continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

14 Comments

  • Hey Reza, thanks for another one nice article!
    One question: Since you don’t pay any extra cost for Dataflow, how many data (size of data in MB or GB) can you load and transform with Dataflow since all data are stored in Data Lake Storage.

    • Hi Yiannis.
      It would be the maximum size that you get in your Pro account. Also, consider that the 10GB size in your pro account is not all for dataflows. part of that is used for datasets.
      Cheers
      Reza

  • Hi Reza,

    It may also be worthwhile just to cover off the differences in capabilities in Dataflows between those offered in Premium and Pro licensing.

    I see a lot of the value in Dataflows as being lost due to not being able to offer computed or linked entities without the Premium licensing and for a large number of small to medium organisations, they are finding the cost of entry to Premium too high.

  • Hi Reza, thx for this insight. Not that it would be best practice of any kind but more of an “options and possibilities” type of question: The model.json file in a cdm folder already presents the ability to define relationships between gen2 dl data objects which gets picked up automatically. To what extent will it also be able integrate abilities that are now described as a pro for shared datasets e.g. dax measures . Now that I think of it could you actually have multiple model.json files pointing to the same data objects in various cdm folders?

    • Hi
      Thanks for mentioning that. I didn’t want to talk about the details of setup in Dataflow in this post, as it was highlights of the architecture.
      Yes, true, the relationship helps in that essence. However, it has nothing to do then with calculations in the shared dataset. You still have to do those in the PBIX file that is for the shared model.
      Cheers
      Reza

  • Great article!!! Thank you.

    Although I don’t think it is truly multi developer environment. Only one developer can work at a single time, it is not like multiple developer can work at the same time. May be my definition multi-developer environment is different than wht is mentioned in this document or I’m totally wrong.

    Keep writing these amazing blogs…

    • Hi Parry.
      Multiple developers can work at the same time because they are working on different objects. at the time that the data developer is building dataflows, the data modeler can build the shared model and calculations, and at the same time report designers can build visualizations, because they are all working on different environments. It is not the same file. The only one that doesn’t allow this is that the shared data model has to be created in one PBIX file. I agree that having something like version control that has the ability to Merge the code would be helpful and make it even better in a multi-tenant developer environment. However, this architecture by itself is already a multi-developer environment.
      Cheers
      Reza

  • Hi Reza,
    Thanks for this great wrap-up. One point that should be noted: Unless you have PowerBI Premium, shared datasets can only be used inside a single workspace for the moment. Hopefully the XMLA endpoint becomes also available for PowerBI Pro soon. Then this allows you to create a workspace with the enterprise model that others get only read-only access.

  • Thank you for the post Reza, It’s very useful! But I have a question: isn’t it possible work in the same dataflow two or more developers yet? We have several issues with that limitation, only someone who has admin rights and has created them can work on them :S

    • Hi Jose.
      Dataflows are a new feature in Power BI. so many updates are coming that way, I’m sure of it. In the meantime, my suggestion is this: the most time-consuming part of creating dataflows is the transformation part itself. multiple developers can use Power BI Desktop and Power Query in there to create their transformations, then copy the script and send it to the person who has access to create the dataflow.
      Cheers
      Reza

  • This architecture will be great if/when Datasets will be available across all workspaces 😉
    Because for now they are not it is not so easy 😉
    But, Shared Datasets are planned to be implemented 😀

Leave a Reply

Your email address will not be published. Required fields are marked *