Grouping in Power Query; Getting The Last Item in Each Group

2016-08-22_10h40_46

Power BI or Power Query in Excel (or Get Data and Transform as the new name of it) can do many data transformations. One of these transformations is grouping rows by number of fields. If you use the graphical user interface in Power Query for Power BI or Excel you have number of options to get some aggregated results such as count of rows, maximum or minimum, average of each group, or sum… But there are still heaps of operations that you cannot do through GUI, such as getting the last item in each group, or first item. Fortunately with M (Power Query Formula Language) you can apply any of these operations you want. In this post I’ll show you how to get benefit of both; start with GUI to write the Group by command for you, and then customize it in M script to achieve what you want. If you like to learn more about Power BI read Power BI online book; from Rookie to Rock Star. If you like to learn more about Power Query, start with Introduction to Power Query.

Learning Objectives for this section

By completing example of this post you will learn;

  • How to perform Group By in Power Query/Power BI
  • How to leverage pre-defined aggregation functions in Group By GUI window
  • How to extend grouping possibilities with changes in Power Query script

Prerequisite

For this example I will be using AdventureWorksDW sample Microsoft SQL Server database. You can download it from here.

Scenario

Scenario that I want to solve as an example is this:

FactInternetSales has sales transaction information for each customer, by each product, each order date and some other information. We want to have a grouped table by customer, which has the number of sales transaction by each customer, total sales amount for that customer, the first and the last sales amount for that customer. First and last defined by the first and last order date for the transaction.

Get Data

Let’s start by getting data from SQL Server, Choose AdventureWorksDW as the source database, and select DimCustomer and FactInternetSales as the only tables for this example. Click on Edit to move into Power Query window.

2016-08-22_09h33_54

Group By Transformation

FactIntenetSales table is the one we want to apply all transformations in. So Click on FactInternetSales first, then from Transform Tab, select Group By option as the first menu option.

2016-08-22_09h35_42

This will open the Group By dialog window with some configuration options

2016-08-22_09h37_34

By default Group By happens on the selected columns. because usually the first column is the selected column (in our table ProductKey), then the column mentioned under group by section is also ProductKey. You can change this to another column and add or remove columns in this section.

Choose the Group By Field

Based on your final grain of the output table the group by field will be defined. In this example we want the final table to have one record per Customer, so CustomerKey (which is the identifier for each customer) should be our Group By Column.

2016-08-22_09h43_42

Note that you can add as many fields as you want in the Group By section. the result would be one record per combination of unique values from all these fields.

Add Aggregation Fields

Group by result would be one record per each unique combination of all fields set in the “group by ” section. In addition you can also have some aggregated columns. Here are list of operations you can have by default:

2016-08-22_09h46_41

Most of items above are self explanatory. For example; when you want to count number of sales transaction. you can use Count Rows. If you want total Sales amount for each group you can choose Sum, and then in the Column section choose the column as SalesAmount. All Rows will generate a sub table in each element of the aggregated table that contains all rows in that group.

Columns that I want to create in this section are:

Order Count (Count Rows), Total Revenue (Sum of Sales Amount), Order Details (All Rows)

2016-08-22_09h50_34

Adding aggregated columns is as easy as that. Now If you click on OK, you will see the result;

2016-08-22_09h55_06

As you can see Order Count and Total Revenue show the aggregated result of each group, and Order Details (if you click not on the “Table” itself,  but on a blank area on that cell) contains the actual rows in each group. This detailed view can be used for many other calculations or transformations later on. In many cases you will find the All rows option useful.

First and Last Item in each Group

Getting some default aggregation was as easy as selecting them in Group By window. However not all types of operations are listed there. For example in detailed table above you can see that customer 11003 had 9 sales transaction,, and they happened in different Order dates, getting the first and last order date is easy with Max and Min operations. However getting the sales amount or product key associated with that record, or in the other words getting the first and last item in each group isn’t possible through GUI. Fortunately we can use M (Power Query formula language) to achieve this easily.

Sort By Date

To get the first or last item in each group I have to order the table based on that date column. Sorting is possible simply through GUI. and I have to apply that to the step before group by operation. So from the right hand side applied steps list I’ll select Navigation (which is the step before Grouped Rows);

2016-08-22_10h16_49

Now in this view you can order simply by clicking on OrderDateKey and Choose Sort Ascending.

2016-08-22_10h18_02

This will create another step, and asks you do you want to INSERT this step here?

2016-08-22_10h19_14

Click on Insert to confirm you want to insert it here before the Grouped Rows. and then you will see a Sorted Rows step added before Grouped Rows. This means Grouped Rows will use the output of Sorted Rows step as the input for grouping (which is exactly what we want).

2016-08-22_10h21_20

Now you can go to the Grouped Rows step to see the result hasn’t changed but the sub tables are sorted now. All we need from here is to get the first and last item in the sub table.

** If you want to sort based on multiple column simply go to the Advanced Editor and add as many as sections you want to Table.Sort input parameters.

Because Power Query by default goes through query folding, it is recommended that after this step, you add a usage of Table.Buffer, to make sure that the sorting runs before the grouping. That can be an step witht his code: = Table.Buffer(#”Sorted Rows”)

List.First and List.Last

Fortunately Power Query has a bunch of operations on List that we can use. List.First will return the first item in the list (based on the order defined in the list), and List.Last will return the last item. So let’s use them in the Group By operation to fetch first and last sales amount.

To make changes here you need to go to script editor in Power Query which can be achieve via Advanced Editor option in Home tab. You have to make sure that you are in the FactInternetSales Query first.

2016-08-22_10h25_54

Advanced Editor will show you M script that build the output and the group by command as well.

let
    Source = Sql.Databases("."),
    AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data],
    dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data],
    #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}),
    Custom1 = Table.Buffer(#"Sorted Rows"),
    #"Grouped Rows" = Table.Group(Custom1, {"CustomerKey"}, {{"Order Count", each Table.RowCount(_), type number}, {"Total Revenue", each List.Sum([SalesAmount]), type number}, {"Order Details", each _, type table}})
in
    #"Grouped Rows"

The script in above code section created automatically when you did transformations through GUI. The line with Table.Group is the line that does all the grouping and aggregation. It is a long line, so let me format it better for easier understanding;

let
    Source = Sql.Databases("."),
    AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data],
    dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data],
    #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}),
    Custom1 = Table.Buffer(#"Sorted Rows"),
    #"Grouped Rows" = Table.Group(Custom1, 
                                            {"CustomerKey"}, 
                                            {
                                                {"Order Count", each Table.RowCount(_), type number}, 
                                                {"Total Revenue", each List.Sum([SalesAmount]), type number}, 
                                                {"Order Details", each _, type table}
                                            }
                                    )
in
    #"Grouped Rows"

Script below is the same script. I just put some enters and tabs to format it better for reading. The above section shows Table.Group section of the script. As you can see Table.Group gets a table as input, which is the #”Sorted Rows” table from the previous step. The group by field is “CustomerKey”. and then a set of aggregated columns one after each other (which is highlighted in code above). Each column has name of the column, type of transformation (or aggregation), and the data type of the column. for example:

let
    Source = Sql.Databases("."),
    AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data],
    dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data],
    #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}),
    Custom1 = Table.Buffer(#"Sorted Rows"),
    #"Grouped Rows" = Table.Group(Custom1, 
                                            {"CustomerKey"}, 
                                            {
                                                {"Order Count", each Table.RowCount(_), type number}, 
                                                {"Total Revenue", each List.Sum([SalesAmount]), type number}, 
                                                {"Order Details", each _, type table}
                                            }
                                    )
in
    #"Grouped Rows"

Total Revenue is the name of column. calculation for this column is Sum of [SalesAmount] which is one of the fields in the table, and the output is of type number.

So by now you should thinking of how each is to create a new aggregated column here; by adding similar column in the script. I add the new column after Order Details column, so I need an extra comma (,) after that line, and the new lines would be;

let
    Source = Sql.Databases("."),
    AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data],
    dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data],
    #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}),
    Custom1 = Table.Buffer(#"Sorted Rows"),
    #"Grouped Rows" = Table.Group(Custom1, 
                                            {"CustomerKey"}, 
                                            {
                                                {"Order Count", each Table.RowCount(_), type number}, 
                                                {"Total Revenue", each List.Sum([SalesAmount]), type number}, 
                                                {"Order Details", each _, type table},
                                                {"First SalesAmount", each List.First([SalesAmount]), type number}, 
                                                {"Last SalesAmount", each List.Last([SalesAmount]), type number} 
                                            }
                                    )
in
    #"Grouped Rows"

Marked lines above uses List.First and List.Last on the exact same structure that List.Sum worked. because we have already sorted the table based on OrderDate so the first item would be the first sales transaction, and the last item would be the last.

Here is the output of this change:

2016-08-22_10h40_46

You can see that the first and the last SalesAmount picked correctly from each group as two new columns.

** Note that with adding some changes in script editor that are not supported through GUI, you will loose the GUI configuration window section. As a result of the change here you cannot change Grouping configuration in GUI anymore, if you want to change it, you have to go to Advanced Editor for this section. So if you are a GUI fan, better to apply all required configuration first, and then add extra logic in the code.

 

Save

Save

Save

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

21 thoughts on “Grouping in Power Query; Getting The Last Item in Each Group

    • Hi Julian,

      You need to restore .bak file into your SQL Server database engine instance. If you are not familiar with SQL Server, or you haven’t done such thing before let me know and I upload excel exported tables somewhere for you.

      Cheers,
      Reza

  • Hello Reza,

    Thanks a lot for your blog and articles, it helps me a lot. I followed this article and it was really helpful figuring out the “Group by” capability. I have a question though, I have an order book (excel files) representing all the order lines that my company received. I grouped by all those lines to get an aggregate view of each shipment (like you did in the articles) and have no problems with the count of order lines in each shipment or the sum of weight. However, I’d like to see how many distinct articles I have in each order (each order lines contains a “material key” column) but the Countdistinct option doesn’t allow me to chose only a column (it’s made for the entire rows). Is there a way I could include the distinct count option in this query ?

    I’m currently using a calculated table with the SUMMARIZE option which allows me to get to the expected result but it would be great to avoid extra tables.

    One more thing, after aggregating my order lines, and used the “All rows” option as explained in your tutorial, can I use those information (the order details) right away in the Report view ? Because I don’t see those infos anymore in report view.

    Thanks again a lot for your work !

    Renaud

    • Hi Renaud,

      Thanks for your kind words,
      Not sure if I get your question correctly. better to email me the file to: reza at this website’s domain dot com
      Using Group by here would be better than summarize as you mentioned because we won’t add extra tables into the memory.
      All rows option is not for report view to pick up. report view doesn’t understand table types of Power Query. All rows is for doing all other sorts of aggregation if you want to in your group by statement, because you have all detailed rows, so you can do whatever you want. for example your distinct count option I believe should be possible through an all rows starting point in group by.

      Cheers,
      Reza

  • Hello Reza,
    I’m trying to reproduce your example with one of my tables, but the grouping function ignore the sorting step, and I’m not able to obtain the correct first and last items. I’m using the last version of july 2017 of Power by desktop and a Postgresql source.
    Have you some advice on a possible cause of that?
    Thank you

      • Hi Reza,

        I am facing similar problem as Nick. As i update the data, the group by feature is not working.
        Can you please help me to solve this problem

        • Hi Jahn
          I just realized in the steps I explained here, one step is missing. Table.Buffer step is now added, add it to your script and it should work.
          if not, please send me your PBIX file to check
          Cheers
          Reza

  • Hi Reza,

    I couldn’t download the SQL file, it does not recognize the bak extension and I am not familiar with SQL. Can you send me the same excel tables ? Many thanks.

  • hi sir,

    good day…

    i desperately need your help ..i dont know this website blog still active or not ..if active then please reply .

    i neeed help related to the power query.

    thanks and regards,
    sanket

Leave a Reply