Have you ever had a situation where you want to re-use part of the model in another report? Imagine two report visualizers in your team who want to create Power BI report visualizations from your data model. You have already done some modeling and calculations. How can this be done the best way without a high maintenance cost? The answer is a shared dataset in Power BI. In this article, you will learn about:
- What is a shared dataset in Power BI?
- How can the shared dataset help in Power BI development?
- Where is the place of the Shared dataset in the Power BI architecture?
- How does the shared dataset work behind the scene in the Power BI service?
- What are Certified and Promoted Datasets?
To learn more about Power BI, read the Power BI book from Rookie to Rock Star.
What is the Dataset in Power BI?
When you create a Power BI report (or let’s call it a *.PBIX file), the report has two components (if the data connection mode is import data); A report and a dataset. When you are in the environment of Power BI Desktop, you can’t see the separation that easily unless you go to the task manager and see the dataset running behind the scene under the Power BI Desktop task threads.
However, when you publish the PBIX file into the service (the Power BI website), you can easily see that there are two objects; A report and a Dataset.
- The report is the visualization layer of your Power BI implementation
- The dataset includes the data, tables, relationships, calculations, and connection to the data source.
You can schedule the refresh for the dataset and connect to on-premises sources (through a gateway) or cloud-based sources.
What is included in the dataset?
So far, you know that a dataset is a separate object from the report. However, what parts of the development are precisely in the dataset. Here are some of the components that are part of the dataset;
- The connection to the data source
- tables and their data
- calculated columns, tables, and measures
- formatting and settings of the fields (visibility, formatting, display folders, sort by column, data category, etc.)
Anything that somehow is related to the data is part of the dataset.
What is a Shared Dataset?
Now that you know about the dataset let’s talk about the Shared Dataset. A shared dataset is a dataset shared between multiple reports. You can create a new report from an existing dataset through the Power BI website. This will create a report without a dataset. In fact, the dataset of that report will be the dataset that you are creating the report from. This type of report is also called a Thin report.
A thin report is a report without a dataset. This report is usually connected live to an existing dataset.
You can also create thin reports (reports without a dataset of their own) from the Power BI Desktop. To do this, you can choose Power BI Dataset under the Data Hub;
Thin reports will have the same attributes as the Power BI reports connected using Live Connection. They give you the ability to create report-level measures, but the modeling is limited beyond that point. Unless you create a composite model using DirectQuery to a Power BI dataset.
A shared dataset is a dataset that is shared between multiple reports. When that dataset gets refreshed, all of those reports will have the new data. A shared dataset is one step closer to the multi-developer tenant in the Power BI environment.
How to create a shared dataset?
Any Power BI dataset can be a shared dataset. To use it as a shared dataset, first, you need to publish your PBIX file to the Power BI Service. After the publication, you will usually have a dataset and a report. The dataset can then be used for creating other reports.
The question that may arise now is; what if I want to create a dataset without a report?
The answer is; that at the time of writing this blog article, using Power BI Desktop, it is not possible to do that. You can create a Power BI file (which includes the dataset and report in one), then publish it to the service. After publishing it to the service, you can then delete the report associated with that dataset. The challenge for this method, however, is that any future updates in the Power BI dataset and re-publish of that will re-publish the report again, and you have to go and remove it. It might be easier just to ignore the report associated with the dataset. Or simply just use that report for troubleshooting your dataset.
Sharing Datasets Across Multiple Workspaces
For a long time, sharing datasets was only possible inside a workspace. You could not use a dataset from workspace one as the source for a report in workspace 2. However, the feature became available a few years ago, and you can share the dataset even across multiple workspaces. When you get data from a Power BI dataset through the Power BI Desktop, you have the option to select which dataset you want to get data from;
External Dataset: How does Shared Dataset work behind the scene?
When you get data from a Power BI dataset which is workspace 1, and then save your report in workspace 2, you will see something called an External dataset (it was called Linked Dataset previously). The fact is that what you see is just a link. Power BI will bring a link to that dataset into the new workspace. This link helps you to understand when the dataset gets refreshed last time.
Here is what an External dataset looks like in the lineage view.
An External dataset is not a copy, It is a link to the original dataset.
Why Shared Dataset?
Now the million-dollar question is that why should you use a shared Dataset? What is good about it? What are its main benefits? Let’s see that through an example.
Let’s assume you are the Power BI Developer of a Sales.PBIX file in your team. Your team recently hired a data analyst with good visualization skills named Maggie. Maggie wants to build some visualization on your Sales.PBIX file, However, you want to do some modeling (writing calculations, bringing more tables, adding relationships, etc.) simultaneously. So how would you do this?
Suppose you give Maggie a copy of your Sales.PBIX file, and name it Maggie’s Sales.PBIX, she can build new visualization, but now her version is different from your Sales.PBIX file. What if you want to merge your changes (new calculations and tables, etc.) into her file? This brings lots of headaches in managing two different versions of the file.
Instead of copying files, you can create a shared dataset, and Maggie creates a Power BI thin report connected to the same dataset. This way, maintaining the solution will be much easier in the future. Whenever you update your dataset, Maggie needs to refresh the file to get the new changes. A shared dataset separates the modeling layer of your Power BI solution from the rest of it. Like the Dataflow that separates the ETL layer.
Shared Dataset in the Power BI Architecture
I have written a lot of articles about the architecture of Power BI; in fact, I gathered them in a book. However, in one article specifically, I explained how Dataflow and Shared datasets could play an important role in the multi-developer tenant of Power BI implementation.
In a nutshell, using the Dataflow makes sure that you can bring the data well prepared in a central area, which you can call a centralized data warehouse in the Azure Data Lake. And using the shared datasets, you can build data models that multiple reports can use. Here is how the architecture works in the diagram view;
Instead of having silos of Power BI reports and files everywhere, You can build an architecture that works best with multiple developers, less redundancy in the data, in the code, and the logic, and an easier maintenance approach. I highly recommend reading this article to learn more about this architecture and learn how the shared dataset located in this architecture is a key element.
Endorsement: Certified and Promoted Datasets
When Power BI developers use the function of Getting data from Power BI dataset, They see all datasets from all workspaces that they have access to. This might be a bit confusing. There might be tons of datasets shared in the environment. The developer ends up with questions such as: Which of these is the one I can use? Which of these are valid to use? which of these are reconciled and tested? Which of these are reliable to use? Etc.
A labeling system is added to the Power BI datasets, which helps in this scenario. You can mark some of the datasets as Certified or Promoted. To certify a dataset, there is an approval process that can assure the dataset has passed some of the tests. You can clarify through this labeling system what datasets are good to be used as the source and what are not. You can build the concept of Gold, Silver, and Bronze datasets. Gold datasets are fully tested, reconciled, and then down to other levels, whereas the Bronze dataset is a dataset that hasn’t been through any testing yet.
To use this labeling system, the creator of the dataset can go to the setting of the dataset;
In the settings, you can set the Endorsement level as below;
As you can see, the Certified option might not be available. The Power BI tenant administrator has the authority to enable that labeling and give access to whom needed in the Tenant Settings;
You can also set if promoted or certified content is discoverable throughout the tenant or not under the same Tenant settings of Admin Portal.
The endorsement labeling system helps Power BI developers then see what is the level of certification that a dataset has to be used as a shared dataset, and then can select based on that respectively;
You can use a shared dataset to have centralized data models serving multiple reports. You can reduce the maintenance time, the redundancy of the code, and the data through this approach. Having the labeling system of the certified or promoted dataset is also a great way of putting some process and governance in place to ensure the shared datasets have been through some process of testing and reconciling.
I suggest reading these links to study more;
- Live Connection in Power BI, Thin reports, and report-level measures
- Composite Model in Power BI
- DirectQuery to Power BI Dataset
- Dataflows in Power BI
- Power BI Multi-developer architecture
Are you using shared datasets in your implementation? If yes, I like to hear about your experience. If not, I like to know why you don’t find it useful for your implementation. Either way, please share your thoughts in the comments below.
27 thoughts on “Power BI Shared Datasets: What is it? How does it work? and Why should you care?”
can we have shared dataset on power bi report server
Let’s hope for it. At the moment, the only way to do that is through SSAS. but hopefully shared dataset becomes available in PBIRS too
Great article. 2 questions.
1. Dataflows allow incremental refresh – but I appears when utilizing them in a .pbix and deploying to the service that it loads all data from the dataflow. Thus, efficiency from the dataflow incremental refresh is lost by the dataset having to load all the data again.
2. While I can build new reports on shared datasets, I cannot fortify the dataset with any other new information I want for those reports. Meaning, I can’t mashup data with anything new outside that existing model
Am I doing something wrong in either case?
At the moment, the connection from Dataflow to Power BI only supports Import Data. Later in the future, you would be able to do DirectQuery hopefully as well.
on the other hand side, when using Incremental Refresh, you can set that up also in the Power BI dataset, in addition to the Power BI dataflows. Incremental refresh in the dataflow makes sure the ETL runs on delta load, setting that up in the dataset, makes sure then model also runs under partitioning.
Regarding mashing up data of the shared dataset. I would say it is a good thing that you cannot do more data mashup with a live connection to the Power BI dataset, because then your source code will spread all over places, and you cannot easily find where you have done what! If you need to use the Power BI dataset as one of the sources, and do data mashup on that and combine it with other sources, then don’t use Get Data from Power BI dataset, You can use Power BI dataset as an Import data source (it is now possible with XMLA endpoint), and then you can do whatever you want with PQ there.
To follow up on Utwku’s point 2 above:
I have a shared dataset the pulls a number of tables from salesforce and does various cleaning/transformations of those tables, links them correctly, and has a bunch of measures.
I do have one situation in which a report needs to pull from that shared dataset AND from a few .csv files.
What is the best way to go about doing that, given this limitation?
I only need those few .csv files in the one report, so it seems silly to go back and add those to the shared dataset.
Are you using dataflows too?
if yes, then the tables that you have in your shared dataset, should be already populated through dataflows. so you can build another dataset, using those tables, and your extra .csv files.
Of course, this method, still won’t give you the measures in the other dataset.
Nice article. Thanks.
We are using a shared dataset however the issue is that when my Report which is workspace 2 and pointing to Workspace 1, my users are getting issue which says “You cannot see the content of this report because you do not have permissions to the underlying dataset. Please contact the dataset owner to request access”. Once I explicitly provide read access to those users on dataset in Workspace 1, then they are able to see the report. Is this true?
Yes, a report connected to shared dataset acts as a Live connection report. Which means the user who is going to see the report, would need to have access to the dataset.
is is possible to share a dataset, so someone else can build a report based on it, but without sharing the DB password
Yes, and the default is like that anyway.
when you create a dataset and publish it to the service, the second user who wants to use your dataset, don’t need to know the password to DB, because that is only needed for the shared dataset. The second user, just need to have access to the shared dataset.
Thank you Reza.
It is really enjoyable to read your posts.
Have a good day 😉
vote here if you like this idea
I don’t, unfortunately, understand what this idea is about. can you please explain in details what is the problem? (I’ve read the description in the idea page, but couldn’t understand that too)
we are using a shared dataset, but the issue we have is around source control.
I can source control the original Dataset and the Report, but the shared Report cannot be exported as Pbit or Pbix. While the Dataset is a live connection to the original, the Report is a new independent Report, with no connection to the original. How do I source control the new one?
Any ideas greatly appreciated
Where do you create the new reports? if you create the reports in Power BI Desktop using Get Data from the Power BI dataset, then you can save them as PBIX.
and you can use the source control system you want for it.
If for example I go, Get Data > PBI Dataset > “Commercial” ….. a dataset I have built that is published to the “Commercial” workspace, and I build a report, I can ONLY publish to the same workspace, “Commercial”.
If I then use the “Save a copy” functionality in app.powerbi.com, I can select a new Workspace for my report, “Marketing” for instance, but then this new instance of the report is orphaned. I can edit in the service as I please, but cannot export as pbix.
This means V1 of the report in “Commercial” is on my desktop, safe, but V” in “Marketing” could diverge without any version control.
You CAN publish your new report that is generated using Get Data > PBI Dataset > “Commercial” into another workspace. This would generate a new dataset in the second workspace, which is a link to your “Commercial” workspace. This new dataset is not a “real” dataset, it is just a link. If you want to make any changes in the dataset, the original one in the “Commercial” workspace needs to be changed.
Save a copy, creates a “real” dataset as a copy, and then you get into the problem of having multiple copies and the version control issue.
not sure I understand, sorry. “Save a copy” gives me the linked dataset, but an orphaned report, as the “Save a copy” functionality is only available on the report:
I can edit the new report, but not export it.
I see. I think I get where the problem is, finally!
When you use save a copy, and save it to another workspace, then you cannot download the PBIX from the second workspace.
I usually don’t notice these things, because I don’t use the save a copy feature
I start with a Power BI Desktop file, Get Data -> PBI Dataset , build the report, then publish it to the second workspace. If you use this approach, you CAN download the PBIX copy from the second workspace.
Does Shared data set concept work for Power BI report server too.
I want to import a dataset from PowerBI desktop for report server and create report using that.Is this feature available for free users too because we have not took any separate license for publishing the reports,I am uploading the data sets directly to report server.
You can use a shared dataset from the Power BI service, in a report that is hosted in Report Server. so it supports the shared dataset.
However, I guess your question what that; can you have a Power BI dataset hosted not in the service, but in the Report Server only, and use it as a shared dataset for other reports in that report server. If that was your question, the answer to that is no, not yet, but hopefully, that will come in the Report Server too.
Hi Reza, do you know why is not allowed to share a dataset using O365 groups? Makes no sense we can grant access using them but not to a dataset. Any idea?
Depend on the method you use, you can!
If your shared dataset is hosted in an organizational workspace, then you can share it with an O365 group by simply adding that group as a Contributor to that workspace.
I want to have 2 roles in out reporting team: administrators and report builders:
– Administrators create dataflows and datasets that can be used inside the organization
– Report builders can use these dataflows and datasets to build reports.
I don’t see the possiblity to limit the access for report builders. I want to give them enough access to use the created dataflows and datasets when building reports, but without edit rights. Is this possible?
If my first question is possible, is there a way to see the content of a dataflow/set? Now you need edit rights before you can see the content of dataflows/sets. I was thinking of a tool or a plugin to view the content, but I can’t find a solution for this.
Thanks for your feedback.
Unfortunately, the only option for Report builders is to have Edit access on the dataset.
This can be achieved with Contributor role in the workspace.
However, if you are worried that they might change the content in this workspace, you can create another workspace, create a linked dataset in there, and get Report Builders to use the other one, however, even in that case, they need to have access to the original dataset.
Another option is to use Power BI Apps, and enable the “user can work with underlying data option”
The Power BI security roles are not yet persona based on the developer type. All developers considered the same, even though the report developers needs a different access from dataset developers. This might change in the future though, the whole concept of decoupling dataflows, datasets, and visualization is new, and I guess there will be security roles in the future to support that concept.