Linked Entities and Computed Entities; Dataflows in Power BI Part 4

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

In previous articles, I explained what is the Dataflow and where to use it, I also explained how to create a dataflow, and what is the common data model. In this article, I’m explaining one of the differences between Dataflow and the Power Query in Power BI Desktop, which is Linked Entities and Computed Entities. Not all entities in the Dataflow are the same, there are different types of entities, and they may help in different scenarios. You would also need to consider a different licensing method for some of the entity types. Let’s see what are these two types of entities. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

Prerequisite

To understand the content of this article, it is good to have general information about what dataflow is, and scenarios of using dataflow:

Part 1: What are the Use Cases of Dataflow for You in Power BI?

Part 2: Getting Starting with Dataflow in Power BI

Part 3: What is the Common Data Model and Why Should I Care?

Entity Types in Dataflow

Unlike Power Query in the Power BI Desktop, not all entities are the same. Some entities are derived from other entities, and some built on top of an existing entity. In the Power Query in Power BI Desktop, we simply call them all as entities, Entities have different value structure (such as List, Record, and Table), but still, they are all entities. However, when it comes to using them in the Dataflow, there are some differences. In this article, I’m focusing on Computed Entities and Linked Entities.

Computed Entity

Have you ever used the Reference option in the Power Query Editor of Power BI Desktop? If not, check out the explanation about it here. Using the Reference, you can create a new query, which is referencing the original query, with some additional steps in it. Reference is a great option to create multiple branches of data transformations from one entity. Here is how you get a reference from a query in Power Query Editor in Power BI Desktop;

A Referenced query in Power Query editor will be a query like the original query, which the only difference, that the original query is the prerequisite step of this new query.

Reference in Dataflow: Computed Entity

You can generate a reference from a query in Power Query in Dataflow too, however, the outcome would be slightly different than the experience in Power BI Desktop. Here is how you create a reference in Dataflow;

The query that will be generated using the Reference command, however, is different, it is called Computed Entity

You can see in the screenshot above that the icon for a computed entity is different from normal entities.

Usages of Computed Entity

There are many usages of referencing from another query. For example, you may want to create another sub table from the original query, and you want to keep the original query intact. Let’s say, you want to create a grouped (or aggregated) version of the main query. and you may end up using the reference to create a computed entity, and then using Group By transformation to have a smaller subset of data in this new table.

Group by is not the only reason to create a computed entity but can work as a good example. If we are talking about Aggregations in Power BI, it is an important step though.

Computed Entity Vs. Reference in Power BI Desktop

Although the computed entity looks like the Reference option in the Power Query Editor of Power BI Desktop. There are some differences in the way it processes. Let’s check them out.

Original Query Will Load First, Computed Entity Will Extract from there

If you have a reference query in Power Query Editor, both queries, at the time of refresh, will be extracted from the data source and then loaded into the Power BI model. In dataflow, however, the process is different. The original query (the query that we referenced from it. In the example above; Orders table), will be extracting data from the data source, applying data transformations on it, and then loading it into Azure Data Lake. The second query, or let’s say the computed entity (in the example above Orders (2)), will not extract data from the data source. The computed entity will extract data from the result of the original table in the Azure data lake, and then apply transformations on it.

The transformations of Computed Entity will apply on the data that seats in the dataflow section of azure data lake storage, not on the data source.

Source Query should be set as Enable Load

Don’t get me wrong. You can create a reference even from a query that is the disabled load. However, that is not called a Computed entity, and as a result, it won’t have the special behavior that I mentioned about the data refresh from the staging area. It will be only considered as a computed entity if the original query is set as enable load (because that way the original query’s output will be stored in Azure data lake storage).

Computed Entity is a Premium Only Feature

At the time of writing this article, The computed entity is a premium only feature. You cannot use it on a Power BI pro account.

Linked Entity

Computed entities are good for creating another branch of data transformation through another entity. However, sometimes, you may not want to do another set of transformations. You want to just get a copy of the same entity that you have in another dataflow, in this dataflow as part of this data mart. You may not want to make any data transformation at all, just a mere copy, or let’s say just a link. This type of entity is called a Linked Entity.

Creating a Linked Entity

To create a linked entity, when you are in a dataflow, click on Add linked entity.

You can set the credentials for the Power BI dataflows.

Then select the dataflow that you want to get the entity from it.

The new entity, or let’s say the Linked Entity, will be created with a specific icon, which shows it is different from other entity types.

Usages of Linked Entity

Linked Entity can be used for scenarios that you want to create multiple data marts through multiple dataflows. You may want to re-use one of the entities from one of the dataflows in the other one. For example, the Product entity, which you have used in the sales dataflow (or let’s say data mart), might be also useful as part of the Production dataflow.

Things You Need to Know About Linked Entity

Similar to the computed entities, linked entities also comes with some limitations and features. let’s check them out.

The Linked Entity Cannot be Changed

You cannot apply any data transformation on a Linked Entity. As you recall from a few steps ago, a Linked entity is just a link to the original entity, which is an entity in another dataflow. If you want to change the transformation, you have to go to the original query and change it.

Linked Entity is a Premium Only Feature

You cannot refresh a dataflow with a linked entity on it with a Power BI pro account. Power BI premium is needed to get a linked entity refreshed.

Summary

Linked Entities are linked to other entities in other dataflows, Computed entities are a new branch of transformation referenced from another entity in the dataflow. There are use cases for each of these types of entities. but the main thing to know is that there are some specific features and limitations for these two types of entities. In this article, we talked about these features. If you like the dataflow series, stay tuned for the next article in this series, and in the meantime, you can read previous articles in this series;

Part 1: What are the Use Cases of Dataflow for You in Power BI?

Part 2: Getting Starting with Dataflow in Power BI

Part 3: What is the Common Data Model and Why Should I Care?

Part 4: Linked Entities and Computed Entities (this article)

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for eight continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: http://www.radacad.com/blog.
He wrote some books on MS SQL BI and also is writing some others, He was also an active member on online technical forums such as MSDN and Experts-Exchange, and was a moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He is the leader of the New Zealand Business Intelligence users group. He is also the author of very popular book Power BI from Rookie to Rock Star, which is free with more than 1700 pages of content and the Power BI Pro Architecture published by Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.

9 Comments

  • Thank again for the informed article.
    One question, if there are a few DAX measures needs to be created using the dataset loaded by Data Flow, do you know if this is possible to be done in Data Flow? In other words, after loading a dataset by Data Flow (eg, a CSV file), can we use DAX formula to create measures within Data Flow to enrich the original datasets?
    I hope I had explained the question clearly to you.

    • Hi Steven.
      Dataflows are Power Query scripts running in the cloud, not DAX. If you have DAX measures, that won’t be calculated this way. and you won’t want it to be pre-calculated also. because DAX measures are used when you want the ability to give the user a dynamic answer based on the selection of criteria in Power BI report, If this is something that can be pre-calculated, then you have to change that calculation from DAX to Power Query transformation. I wrote about DAX measures and their dynamic nature here.

  • Wanted to point out that while you can’t modify a linked entity, you can link an entity and disable load (i.e. stage it) and then reference it to create a computed entity. This is useful when you need to take an authoritative data flow and perform additional transformations on it for other marts.

    • Hi Micah.
      It is possible to do that. but that won’t be a computed entity anymore. the whole purpose of the computed entity is to get that from the result of another query which is already loaded in Azure data lake storage of dataflow. when you disable the load of the previous query, this won’t be the case. You would have a new query that reads data from the data source again.

      • Reza – that is kind of correct but misleading….
        If you create a dataflow and have 4 source tables you want to union together, you can disable load on the source entities and the do a table append as new, the data from your sources will be brought into Azure as staging/in memory data and then via azure compute (I’m pretty sure the compute isn’t done via gateway but i’ll have to check my gateway traces again) is union-ed. The Append as new entity is not querying your sources again, it is using the data brought into Azure memory.

        A more important example…building a dataflow on top of other dataflows. We have a dataflow for each of our source systems and then create dataflows on top of these to union similar tables to created consolidated entities. The 2nd tier dataflows are only connected to existing data flows, we disable load on the source dataflow entities and then do a table append as new. This does not end up querying our on-premise source systems, it brings the 1st tier dataflow data into memory and then performs a union/append on that data. Although it is not technically a computed entity as it is bring the dataflow data into memory and then unioning instead of just unioning based on the data already existing in Azure storage, it is surprisingly fast and more more efficient compared to when we were doing all of this via one transformation in our gateway. We’ve seen a 5x improving to load/transform times via this approach.

        Another one is any time you want to do something that would require a computed entity, just disable load on the entity after performing computed functions, then create an enabled entity that only references the end result of the disabled load table.
        All of this is done with just pro licenses…not using computed or linked entities, a bit less efficient due to having to bring existing azure data lake data into-memory, but nonetheless very close to computed and linked entities and ultimately getting the close to the same results.

        • Hi Miles.
          I believe I have written it in a way that it could be read incorrectly.
          You are right. However, I meant this in my sentences (probably I haven’t conveyed the message correctly in my wording); using a computed entity on top of another query which is “enable load”; means we are reading from a data that is already loaded once. If we set the first query to “disable load”, it processes the query again; the first query, and also the second one. However, DEPENDS on the transformations, sometimes, it might use a temporary memory to store the result of the first query and the second query reads from there. So the process altogether is still processing, rather than reading it from a staged data. I would say the difference is what do you want to stage the data in an intermediate table or not.
          Cheers
          Reza

Leave a Reply

Your email address will not be published. Required fields are marked *