Linked Entities and Computed Entities; Dataflows in Power BI Part 4

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube

In previous articles, I explained what is the Dataflow and where to use it, I also explained how to create a dataflow, and what is the common data model. In this article, I’m explaining one of the differences between Dataflow and the Power Query in Power BI Desktop, which is Linked Entities and Computed Entities. Not all entities in the Dataflow are the same, there are different types of entities, and they may help in different scenarios. You would also need to consider a different licensing method for some of the entity types. Let’s see what are these two types of entities. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

Prerequisite

To understand the content of this article, it is good to have general information about what dataflow is, and scenarios of using dataflow:

Part 1: What are the Use Cases of Dataflow for You in Power BI?

Part 2: Getting Starting with Dataflow in Power BI

Part 3: What is the Common Data Model and Why Should I Care?

Entity Types in Dataflow

Unlike Power Query in the Power BI Desktop, not all entities are the same. Some entities are derived from other entities, and some built on top of an existing entity. In the Power Query in Power BI Desktop, we simply call them all as entities, Entities have different value structure (such as List, Record, and Table), but still, they are all entities. However, when it comes to using them in the Dataflow, there are some differences. In this article, I’m focusing on Computed Entities and Linked Entities.

Computed Entity

Have you ever used the Reference option in the Power Query Editor of Power BI Desktop? If not, check out the explanation about it here. Using the Reference, you can create a new query, which is referencing the original query, with some additional steps in it. Reference is a great option to create multiple branches of data transformations from one entity. Here is how you get a reference from a query in Power Query Editor in Power BI Desktop;

A Referenced query in Power Query editor will be a query like the original query, which the only difference, that the original query is the prerequisite step of this new query.

Reference in Dataflow: Computed Entity

You can generate a reference from a query in Power Query in Dataflow too, however, the outcome would be slightly different than the experience in Power BI Desktop. Here is how you create a reference in Dataflow;

The query that will be generated using the Reference command, however, is different, it is called Computed Entity

You can see in the screenshot above that the icon for a computed entity is different from normal entities.

Usages of Computed Entity

There are many usages of referencing from another query. For example, you may want to create another sub table from the original query, and you want to keep the original query intact. Let’s say, you want to create a grouped (or aggregated) version of the main query. and you may end up using the reference to create a computed entity, and then using Group By transformation to have a smaller subset of data in this new table.

Group by is not the only reason to create a computed entity but can work as a good example. If we are talking about Aggregations in Power BI, it is an important step though.

Computed Entity Vs. Reference in Power BI Desktop

Although the computed entity looks like the Reference option in the Power Query Editor of Power BI Desktop. There are some differences in the way it processes. Let’s check them out.

Original Query Will Load First, Computed Entity Will Extract from there

If you have a reference query in Power Query Editor, both queries, at the time of refresh, will be extracted from the data source and then loaded into the Power BI model. In dataflow, however, the process is different. The original query (the query that we referenced from it. In the example above; Orders table), will be extracting data from the data source, applying data transformations on it, and then loading it into Azure Data Lake. The second query, or let’s say the computed entity (in the example above Orders (2)), will not extract data from the data source. The computed entity will extract data from the result of the original table in the Azure data lake, and then apply transformations on it.

The transformations of Computed Entity will apply on the data that seats in the dataflow section of azure data lake storage, not on the data source.

Source Query should be set as Enable Load

Don’t get me wrong. You can create a reference even from a query that is the disabled load. However, that is not called a Computed entity, and as a result, it won’t have the special behavior that I mentioned about the data refresh from the staging area. It will be only considered as a computed entity if the original query is set as enable load (because that way the original query’s output will be stored in Azure data lake storage).

Computed Entity is a Premium Only Feature

At the time of writing this article, The computed entity is a premium only feature. You cannot use it on a Power BI pro account.

Linked Entity

Computed entities are good for creating another branch of data transformation through another entity. However, sometimes, you may not want to do another set of transformations. You want to just get a copy of the same entity that you have in another dataflow, in this dataflow as part of this data mart. You may not want to make any data transformation at all, just a mere copy, or let’s say just a link. This type of entity is called a Linked Entity.

Creating a Linked Entity

To create a linked entity, when you are in a dataflow, click on Add linked entity.

You can set the credentials for the Power BI dataflows.

Then select the dataflow that you want to get the entity from it.

The new entity, or let’s say the Linked Entity, will be created with a specific icon, which shows it is different from other entity types.

Usages of Linked Entity

Linked Entity can be used for scenarios that you want to create multiple data marts through multiple dataflows. You may want to re-use one of the entities from one of the dataflows in the other one. For example, the Product entity, which you have used in the sales dataflow (or let’s say data mart), might be also useful as part of the Production dataflow.

Things You Need to Know About Linked Entity

Similar to the computed entities, linked entities also comes with some limitations and features. let’s check them out.

The Linked Entity Cannot be Changed

You cannot apply any data transformation on a Linked Entity. As you recall from a few steps ago, a Linked entity is just a link to the original entity, which is an entity in another dataflow. If you want to change the transformation, you have to go to the original query and change it.

Linked Entity is a Premium Only Feature

You cannot refresh a dataflow with a linked entity on it with a Power BI pro account. Power BI premium is needed to get a linked entity refreshed.

Summary

Linked Entities are linked to other entities in other dataflows, Computed entities are a new branch of transformation referenced from another entity in the dataflow. There are use cases for each of these types of entities. but the main thing to know is that there are some specific features and limitations for these two types of entities. In this article, we talked about these features. If you like the dataflow series, stay tuned for the next article in this series, and in the meantime, you can read previous articles in this series;

Part 1: What are the Use Cases of Dataflow for You in Power BI?

Part 2: Getting Starting with Dataflow in Power BI

Part 3: What is the Common Data Model and Why Should I Care?

Part 4: Linked Entities and Computed Entities (this article)

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail
FacebooktwitterlinkedinrssyoutubeFacebooktwitterlinkedinrssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for nine continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

17 thoughts on “Linked Entities and Computed Entities; Dataflows in Power BI Part 4

  • Thank again for the informed article.
    One question, if there are a few DAX measures needs to be created using the dataset loaded by Data Flow, do you know if this is possible to be done in Data Flow? In other words, after loading a dataset by Data Flow (eg, a CSV file), can we use DAX formula to create measures within Data Flow to enrich the original datasets?
    I hope I had explained the question clearly to you.

    • Hi Steven.
      Dataflows are Power Query scripts running in the cloud, not DAX. If you have DAX measures, that won’t be calculated this way. and you won’t want it to be pre-calculated also. because DAX measures are used when you want the ability to give the user a dynamic answer based on the selection of criteria in Power BI report, If this is something that can be pre-calculated, then you have to change that calculation from DAX to Power Query transformation. I wrote about DAX measures and their dynamic nature here.

  • Wanted to point out that while you can’t modify a linked entity, you can link an entity and disable load (i.e. stage it) and then reference it to create a computed entity. This is useful when you need to take an authoritative data flow and perform additional transformations on it for other marts.

    • Hi Micah.
      It is possible to do that. but that won’t be a computed entity anymore. the whole purpose of the computed entity is to get that from the result of another query which is already loaded in Azure data lake storage of dataflow. when you disable the load of the previous query, this won’t be the case. You would have a new query that reads data from the data source again.

      • Reza – that is kind of correct but misleading….
        If you create a dataflow and have 4 source tables you want to union together, you can disable load on the source entities and the do a table append as new, the data from your sources will be brought into Azure as staging/in memory data and then via azure compute (I’m pretty sure the compute isn’t done via gateway but i’ll have to check my gateway traces again) is union-ed. The Append as new entity is not querying your sources again, it is using the data brought into Azure memory.

        A more important example…building a dataflow on top of other dataflows. We have a dataflow for each of our source systems and then create dataflows on top of these to union similar tables to created consolidated entities. The 2nd tier dataflows are only connected to existing data flows, we disable load on the source dataflow entities and then do a table append as new. This does not end up querying our on-premise source systems, it brings the 1st tier dataflow data into memory and then performs a union/append on that data. Although it is not technically a computed entity as it is bring the dataflow data into memory and then unioning instead of just unioning based on the data already existing in Azure storage, it is surprisingly fast and more more efficient compared to when we were doing all of this via one transformation in our gateway. We’ve seen a 5x improving to load/transform times via this approach.

        Another one is any time you want to do something that would require a computed entity, just disable load on the entity after performing computed functions, then create an enabled entity that only references the end result of the disabled load table.
        All of this is done with just pro licenses…not using computed or linked entities, a bit less efficient due to having to bring existing azure data lake data into-memory, but nonetheless very close to computed and linked entities and ultimately getting the close to the same results.

        • Hi Miles.
          I believe I have written it in a way that it could be read incorrectly.
          You are right. However, I meant this in my sentences (probably I haven’t conveyed the message correctly in my wording); using a computed entity on top of another query which is “enable load”; means we are reading from a data that is already loaded once. If we set the first query to “disable load”, it processes the query again; the first query, and also the second one. However, DEPENDS on the transformations, sometimes, it might use a temporary memory to store the result of the first query and the second query reads from there. So the process altogether is still processing, rather than reading it from a staged data. I would say the difference is what do you want to stage the data in an intermediate table or not.
          Cheers
          Reza

  • Hi Reza,
    Thanks for explaining this concept but I have a doubt that if I’m getting data from share point folder (say data.xlsx) & loading only 1month data into that file and when the 2 month get started I’ll replace previous month data with new month (in the same file data.xlsx) .
    Now how should I append 2month & under 1month and use as a dataflow in power bi desktop?

  • HI Reza, thanks for the article. Would you know the details of the comment in Microsoft Docs that says “A linked entity can’t be joined with a regular entity that gets its data from an on-premises data source.” Does it mean that it’s not possible to have a linked entity from an online source and the computed entity use on-premises data? Thanks!

    • Hi Adolfo
      You can use that linked Entity as a normal entity then (get data from the source again) to have a workaround for this limitation. and I don’t think this be a limitation if you use Power Platform dataflows.
      Cheers
      Reza

  • Hi Miles, Reza, I wanted to revisit your discussion from last year as I’m still a bit confused. I have the case that Mile described. I have two workspaces, one for ingestion and one for transformation. In the transformation workspace, I create a linked entity to the ingestion workspace. I am using premium capacity.

    From here it there are a few options and I’m not sure what is the “right” way forward. Lets assume I have a small set of PQ transform steps to apply.

    Option 1A : mark the linked entity as disable load. create refernce entity, apply PQ steps and ensure entity is loaded
    Option 1B : mark the linked entity as enable load. create refernce entity, apply PQ steps and ensure entity is loaded

    Option 2A : mark the linked entity as disable load, apply PQ steps. create final referenced entity
    Option 2B : mark the linked entity as enable load, apply PQ steps. create final referenced entity (not possible as linked entitys can have transformation steps and be loaded)

    I only really want to see one final entity in the transformed workspace, so prefer Options 1A and 2A that have the entity disabled.

    The advice seemed to be that you needed the linked entity enabled for your referenced entity to be considered a computed entity ie Option 1B above ?

    What does “enable load” actually do behind the scenes. Visibily it makes the entity visible but is it doing something behind the scenes with ADLg2 ? The originating entity has already been loaded in ADLg2 so surely this doesn’t load another set of that data ?

    Thanks !

    Steve

    • Hi Steve. It took me multiple times to read it to understand the difference between options you mentioned 😊
      if you already have a dataflow building an entity, then the entity is loaded into ADLSg2. there is no point just to create another entity for it to be enabled load.
      another point is that the linked entity cannot be edited (or added with more transformations), and I guess that is why you are referencing from it by another entity.
      I’d just reference from that linked entity, mark the linked entity as disable load, and do the remainder of transformations in the new entity.
      enable load means this entity will be stored in ADLSg2. if it is not enabled load, it is not really stored anywhere, it is just temporarily loaded, and then used in other entities, and then wiped out.
      Cheers
      Reza

  • Hi Reza,
    Great article, thanks !
    I am starting to use dataflows with a “Main” folder as source. “Main” folder contains two sub-folders; “Previous Year” folder with past years data (from 2015 to previous year) and “Current Year” with current year data.
    Is there any performance difference (especially for refresh) by sourcing/querying separately those two sub-folders, disable their load, and then make an append of them with blank query versus sourcing “Main” folder directly with enable load ?
    Would appreciate your advise. Regards,
    Alex

    • Hi Alex,
      I don’t exactly understand with do you mean by “folder” in your explanation.
      if your question is that; in a dataflow, you are creating an entity for each year, and then appending them all to have all years in a final query. you want to compare it with the scenario that you have one query that fetches all the years.
      The answer highly depends on the type of data source. if it supports query folding, then the performance might be the same. if it doesn’t, then each query will query the entire data and then filter it, so combining them at the end will have to be slower.
      Cheers
      Reza

Leave a Reply

%d bloggers like this: