Reference vs Duplicate in Power BI; Power Query Back to Basics

When you work with tables and queries in Power Query and Power BI, you get the option to copy them through these actions: Duplicate, or Reference. It has been always a question in my sessions and courses that what is the actual difference between these two actions. The explanation is simple but very important to understand. Because when you know the difference, you will use it properly. In this short blog post, I’ll explain what is the Reference, and the difference of that with Duplicate. To learn more about Power BI; read Power BI from Rookie to Rock Star book.

Video

Duplicate

If you are looking to copy an entire query with all of its steps, then Duplicate is your friend. Let’s see this in action. As an example, Let’s assume that we got the data from a web page that shows us the best seller’s movies information. If you have done the movies example of my book previously, the website is BoxOfficeMojo. Here is the link to the page:

https://www.boxofficemojo.com/alltime/world/?pagenum=1&p=.htm

In Power Query, we got data From Web, and selected this source;

Let’s say for this table, we do some transformations. For example; removing extra “^” character from the last column (Year column);

and then some other transformations, so we end up with a number of steps for this query.

After doing all these transformations, you realize that this data is only for the first hundred best seller movies because that web page doesn’t have the remaining movies. To get the remaining, you need to navigate to page 2, which has a different URL, but the same data structure.

Well, what you need to do? You have to do all those steps on page 2 as well. Let’s keep this example static and basic, (Because in complex scenarios when you have many pages, you may use functions and parameters to loop through all pages and combine them all together. If you are interested to learn about that, read my post here). Let’s say you want to do all those steps that you have done for page one, now for page two. To do that; you can leverage Duplicate.

Create a duplicate of Box Office Mojo (I called it; Box Office Mojo Page 1)

When you create the Duplicate query, it will be an exact copy of the first query, with all steps of it. These two queries are exactly like each other. No difference!

Duplicate copies a query with all the applied steps of it as a new query; an exact copy.

After creating the copy, then you can go to the source step to change the URL:

Using Duplicate, you managed to copy a query with all steps in it, and then make changes in your new query. Your original query is intact.

Duplicate is the option to choose, when you want to copy a query, but do a different configuration in steps.

Reference

Reference is another way of copying a query, However, the big difference is that; When you reference a query, the new query will have only one step: sourcing from the original query. A referenced query, will not have the applied steps of the original query. Let’s see this option in action. Continuing the example above; let’s say we want to create a new query that is the result of combining the page 1 and page 2 result. However, we do NOT want to change any of the existing queries, because we want to use those as the source for other operations.

With a right click on Box Office Mojo Page 1, I can create a Reference.

Reference will create a new query which is a copy of the Box Office Mojo Page 1, but only contains one single step:

The only step in the new query is sourcing the data from the original query. What does it mean? It means if you make changes in the original query, then this new query will be impacted.

Reference will create a new query which has only one step: Getting data from the original query.

Now we can use this query, to append to the Box Office Mojo Page 2;

The result would be a query that contains both pages;

To learn more about append and the difference of that with Merge, read my blog post here. In this example; we used Reference option to create a copy of the original query, and then continue some extra steps. There are many other usages for the Reference.

Reference is a good choice, when you want to branch a query into different pathes. One path that follows a number of steps, and another that follows a different steps, and both are sharing some steps in the original query.

After doing the append in this example, it is a good idea to uncheck the enable load on Page 1 and Page 2 queries to save some memory in Power BI.

Query Dependency

Finding out that which query is dependent (or referenced from) which query can be a bit challenging when you have too many queries. That is why we have the Query Dependency menu option in the View tab of Power Query;

For our example above, this is the query dependency diagram;

Duplicate vs Reference

Now that you know there are two options when you copy a query, let’s have a closer look at their difference.

Isolation from the Original or Dependency to the Original

Duplicate creates a new copy with all the existing steps. The new copy will be isolated from the original query. You can make changes in the original or the new query, and they will NOT affect each other. Reference, on the other hand, is a new copy with only one single step: getting data from the original query. If you make a change in the original query, the new query will be impacted. For example; If you remove a column from the original query, the new query will not have it if it used the Reference method for copying.

Limitation of the Reference

You can not use referenced queries in all situations. As an example; If you have a Query 1, and then you created a reference from that as Query 2. You cannot use the result from Query 2 in Query 1! It will create a circular reference. You are combining a query with a reference to the query itself, It is impossible!

Some actions that invoke Reference or Duplicate

There are some actions in the Power Query that trigger Reference or Duplicate, let’s check those options:

Append Queries as New / or Merge Queries as New is a Reference action

These two actions are creating a reference from the original query and then they do Append or Merge with other queries.

Add as New Query is a Duplicate action

Believe it or not, when you right click on a column or cell and select Add as New Query, you are creating a duplicate of the original query.

This can be misleading sometimes, because you may expect the new query to source from the original, and with the change of original, this query also to change. However, the truth is that this is a duplicate action, and after this action, your original query and the new copy will be isolated from each other.

Copy and Paste is Neither Duplicate Nor Reference!

This is another misconception that Copy and Paste are similar to Duplicate. It is not, and it is not Reference either. When you do this action on a simple query (I mean a query that is not sourced from any other queries), then you get a result similar to Duplicate.

But when you do the Copy and Paste on a query that is sourced from other queries; the result is a copy of all original queries. Here is the result of Copy and Paste on Box Office Mojo All Pages (which is sourced from Page 1 and Page 2);

Summary

Duplicate and Reference are two different actions, and they are also different from Copy and Paste of a query. Duplicate will give you an exact copy of the query with all steps, Reference will create a reference to the original query instead as a new query. Duplicate is a good option to choose when you want the two copies to be isolated from each other, Reference is a good option when you create different branches from one original query. There are some actions in Power Query that trigger Duplicate or Reference as listed in this blog post. Hope this was a good post for you to understand the difference between these two actions clearly, and use them wisely from now on.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

22 thoughts on “Reference vs Duplicate in Power BI; Power Query Back to Basics

  • Reza, that was a great article. Do you have any articles about the Power BI 70-778 certification?
    Specifically on how to prepare for the exam.

  • Hi Reza,
    I have a question:
    If I have two queries with reference, does Power BI read once and use it for both queries or read the origin twice?
    Thanks

    • Hi Salvador
      No. Behind the scene, it will be two queries. That is why you can see in one of my other posts, I mentioned if the purpose is just referencing, then creating a calculated table in DAX can be a better option. Referencing in Power Query is the choice when you want to do branching and continue with a set of transformations on top of it.

      • Thanks for this – excellent article.
        Have you found a way to “materialize” data from the first query so that when calling a second query that references it it doesnt trigger a refresh of the data in the 1st query. I think this is the same sort of thought that Salvador is having. Thanks. Andrew

        • Hi Andrew
          The referencing does not necessarily materialize the original query. Check out the computed entity in dataflow for a feature like that.
          Cheers
          Reza

  • Great article. Is one of them better than the other in terms of refreshing data and the size of a PBIX file that is importing the data?

    • Hi Nicole.
      No difference in the size of the PBIX file or performance. behind the scene would be two queries to the data source. It is just the difference in branching rather than copying.
      Cheers
      Reza

  • I have a similar question in regards to using Power Pivot in Excel. You said to do this in Power BI : “After doing the append in this example, it is a good idea to uncheck the enable load on Page 1 and Page 2 queries”. I assume I do this same thing in Power Pivot as only creating a connection when selecting “Load to”, and deselecting “Add to Data Model.” When I click on the properties of the query, I get the option to “Enable Background Refresh.” How is that related to this article, or more specifically, when do I select this option (Extract, Transform, and/or Load queries)? Thanks for sharing.

    • Hi Roman.
      Your understand is correct. In PowerPivot, it is controlled with creating a connection only.
      I believe the “Enable Background Refresh” option that you see is similar to “Include in Report Refresh” that we have in Power BI Desktop. It means that this query will be part of the refresh process (which should be in this case), but won’t be loaded individually to the model.

  • Hi Reza!

    quick question on reference/dataflow/incremental refresh. We have 2 queries on dataflow with Premium capacity.
    First query : SQL connection, no modification, incremental refresh applied
    Second query : Reference to the first query, query has many steps.

    The pbix only import the second query. Is it a good way to work with reference/dataflow and incremental refresh?

    Thanks,

    Joel

    • Hi Joel
      It really depends on what you want to do
      Is the referencing from the first query because you MAY use that in other queries in the future?
      Is this because you MAY want to switch data sources in the future and you don’t want your whole query to not work as a result?
      depends on what is the goal here, there are different answers
      Cheers
      Reza

  • Very helpful article. Thanks verymuch Sir. I now understand the difference between Duplicating a query and Referencing a query. What I still don’t understand is the difference between Copy and Paste, and Duplicate.

    • copy paste will also create a copy of all other queries that this query is referenced from. but duplicate, just create a copy of this query.
      Cheers
      Reza

  • Hi Reza, Thanks very useful article. I have a Python script in Table 1 that downloads invoices, then it process the data in PowerBI. Each time I make a change to the table, it re-runs the python script. I thought if I create Table 2 by referencing (not duplicating) the first query. Now each time I make a change to table 2 it will not need to refresh table1. But yet it keeps refreshing both tables thus re-creating both tables.

    Any ideas?

    • referencing a query does not necessarily materialize the original query.
      If you want to do something like that the computed entity in the dataflow might be a better option for you
      Cheers
      Reza

  • Hi Resa,

    i have table 1 from SQL query and i need the same table 1 data in the another table 2. the another table 2 should not refresh and the data has to load from table 1 in power query editor.

    • You can have a duplicate of your table 1, and just change the first few lines of it so that it Gets data from the other data source. something like what I explained here but without parameters
      Cheers
      Reza

    • It depends on many things.
      If you run in Power Query dataflows, then a referenced query is using compute engine, so much better for consuming less resources. because it reads from the source once.
      if you run in Power Query in Power BI Desktop, most likely both are doing the same querying from the source, especially if query folding isn’t happening.
      Cheers
      Reza

Leave a Reply