Reference vs Duplicate in Power BI; Power Query Back to Basics

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

When you work with tables and queries in Power Query and Power BI, you get the option to copy them through these actions: Duplicate, or Reference. It has been always a question in my sessions and courses that what is the actual difference between these two actions. The explanation is simple but very important to understand. Because when you know the difference, you will use it properly. In this short blog post, I’ll explain what is the Reference, and the difference of that with Duplicate. To learn more about Power BI; read Power BI from Rookie to Rock Star book.

Duplicate

If you are looking to copy an entire query with all of its steps, then Duplicate is your friend. Let’s see this in action. As an example, Let’s assume that we got the data from a web page that shows us the best seller’s movies information. If you have done the movies example of my book previously, the website is BoxOfficeMojo. Here is the link to the page:

https://www.boxofficemojo.com/alltime/world/?pagenum=1&p=.htm

In Power Query, we got data From Web, and selected this source;

Let’s say for this table, we do some transformations. For example; removing extra “^” character from the last column (Year column);

and then some other transformations, so we end up with a number of steps for this query.

After doing all these transformations, you realize that this data is only for the first hundred best seller movies because that web page doesn’t have the remaining movies. To get the remaining, you need to navigate to page 2, which has a different URL, but the same data structure.

Well, what you need to do? You have to do all those steps on page 2 as well. Let’s keep this example static and basic, (Because in complex scenarios when you have many pages, you may use functions and parameters to loop through all pages and combine them all together. If you are interested to learn about that, read my post here). Let’s say you want to do all those steps that you have done for page one, now for page two. To do that; you can leverage Duplicate.

Create a duplicate of Box Office Mojo (I called it; Box Office Mojo Page 1)

When you create the Duplicate query, it will be an exact copy of the first query, with all steps of it. These two queries are exactly like each other. No difference!

Duplicate copies a query with all the applied steps of it as a new query; an exact copy.

After creating the copy, then you can go to the source step to change the URL:

Using Duplicate, you managed to copy a query with all steps in it, and then make changes in your new query. Your original query is intact.

Duplicate is the option to choose, when you want to copy a query, but do a different configuration in steps.

Reference

Reference is another way of copying a query, However, the big difference is that; When you reference a query, the new query will have only one step: sourcing from the original query. A referenced query, will not have the applied steps of the original query. Let’s see this option in action. Continuing the example above; let’s say we want to create a new query that is the result of combining the page 1 and page 2 result. However, we do NOT want to change any of the existing queries, because we want to use those as the source for other operations.

With a right click on Box Office Mojo Page 1, I can create a Reference.

Reference will create a new query which is a copy of the Box Office Mojo Page 1, but only contains one single step:

The only step in the new query is sourcing the data from the original query. What does it mean? It means if you make changes in the original query, then this new query will be impacted.

Reference will create a new query which has only one step: Getting data from the original query.

Now we can use this query, to append to the Box Office Mojo Page 2;

The result would be a query that contains both pages;

To learn more about append and the difference of that with Merge, read my blog post here. In this example; we used Reference option to create a copy of the original query, and then continue some extra steps. There are many other usages for the Reference.

Reference is a good choice, when you want to branch a query into different pathes. One path that follows a number of steps, and another that follows a different steps, and both are sharing some steps in the original query.

After doing the append in this example, it is a good idea to uncheck the enable load on Page 1 and Page 2 queries to save some memory in Power BI.

Query Dependency

Finding out that which query is dependent (or referenced from) which query can be a bit challenging when you have too many queries. That is why we have the Query Dependency menu option in the View tab of Power Query;

For our example above, this is the query dependency diagram;

Duplicate vs Reference

Now that you know there are two options when you copy a query, let’s have a closer look at their difference.

Isolation from the Original or Dependency to the Original

Duplicate creates a new copy with all the existing steps. The new copy will be isolated from the original query. You can make changes in the original or the new query, and they will NOT affect each other. Reference, on the other hand, is a new copy with only one single step: getting data from the original query. If you make a change in the original query, the new query will be impacted. For example; If you remove a column from the original query, the new query will not have it if it used the Reference method for copying.

Limitation of the Reference

You can not use referenced queries in all situations. As an example; If you have a Query 1, and then you created a reference from that as Query 2. You cannot use the result from Query 2 in Query 1! It will create a circular reference. You are combining a query with a reference to the query itself, It is impossible!

Some actions that invoke Reference or Duplicate

There are some actions in the Power Query that trigger Reference or Duplicate, let’s check those options:

Append Queries as New / or Merge Queries as New is a Reference action

These two actions are creating a reference from the original query and then they do Append or Merge with other queries.

Add as New Query is a Duplicate action

Believe it or not, when you right click on a column or cell and select Add as New Query, you are creating a duplicate of the original query.

This can be misleading sometimes, because you may expect the new query to source from the original, and with the change of original, this query also to change. However, the truth is that this is a duplicate action, and after this action, your original query and the new copy will be isolated from each other.

Copy and Paste is Neither Duplicate Nor Reference!

This is another misconception that Copy and Paste are similar to Duplicate. It is not, and it is not Reference either. When you do this action on a simple query (I mean a query that is not sourced from any other queries), then you get a result similar to Duplicate.

But when you do the Copy and Paste on a query that is sourced from other queries; the result is a copy of all original queries. Here is the result of Copy and Paste on Box Office Mojo All Pages (which is sourced from Page 1 and Page 2);

Summary

Duplicate and Reference are two different actions, and they are also different from Copy and Paste of a query. Duplicate will give you an exact copy of the query with all steps, Reference will create a reference to the original query instead as a new query. Duplicate is a good option to choose when you want the two copies to be isolated from each other, Reference is a good option when you create different branches from one original query. There are some actions in Power Query that trigger Duplicate or Reference as listed in this blog post. Hope this was a good post for you to understand the difference between these two actions clearly, and use them wisely from now on.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

7 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *