Grouping in Power Query; Getting The Last Item in Each Group

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2016-08-22_10h40_46

Power BI or Power Query in Excel (or Get Data and Transform as the new name of it) can do many data transformations. One of these transformations is grouping rows by number of fields. If you use the graphical user interface in Power Query for Power BI or Excel you have number of options to get some aggregated results such as count of rows, maximum or minimum, average of each group, or sum… But there are still heaps of operations that you cannot do through GUI, such as getting the last item in each group, or first item. Fortunately with M (Power Query Formula Language) you can apply any of these operations you want. In this post I’ll show you how to get benefit of both; start with GUI to write the Group by command for you, and then customize it in M script to achieve what you want. If you like to learn more about Power BI read Power BI online book; from Rookie to Rock Star. If you like to learn more about Power Query, start with Introduction to Power Query.

Learning Objectives for this section

By completing example of this post you will learn;

  • How to perform Group By in Power Query/Power BI
  • How to leverage pre-defined aggregation functions in Group By GUI window
  • How to extend grouping possibilities with changes in Power Query script

Prerequisite

For this example I will be using AdventureWorksDW sample Microsoft SQL Server database. You can download it from here.

Scenario

Scenario that I want to solve as an example is this:

FactInternetSales has sales transaction information for each customer, by each product, each order date and some other information. We want to have a grouped table by customer, which has the number of sales transaction by each customer, total sales amount for that customer, the first and the last sales amount for that customer. First and last defined by the first and last order date for the transaction.

Get Data

Let’s start by getting data from SQL Server, Choose AdventureWorksDW as the source database, and select DimCustomer and FactInternetSales as the only tables for this example. Click on Edit to move into Power Query window.

2016-08-22_09h33_54

Group By Transformation

FactIntenetSales table is the one we want to apply all transformations in. So Click on FactInternetSales first, then from Transform Tab, select Group By option as the first menu option.

2016-08-22_09h35_42

This will open the Group By dialog window with some configuration options

2016-08-22_09h37_34

By default Group By happens on the selected columns. because usually the first column is the selected column (in our table ProductKey), then the column mentioned under group by section is also ProductKey. You can change this to another column and add or remove columns in this section.

Choose the Group By Field

Based on your final grain of the output table the group by field will be defined. In this example we want the final table to have one record per Customer, so CustomerKey (which is the identifier for each customer) should be our Group By Column.

2016-08-22_09h43_42

Note that you can add as many fields as you want in the Group By section. the result would be one record per combination of unique values from all these fields.

Add Aggregation Fields

Group by result would be one record per each unique combination of all fields set in the “group by ” section. In addition you can also have some aggregated columns. Here are list of operations you can have by default:

2016-08-22_09h46_41

Most of items above are self explanatory. For example; when you want to count number of sales transaction. you can use Count Rows. If you want total Sales amount for each group you can choose Sum, and then in the Column section choose the column as SalesAmount. All Rows will generate a sub table in each element of the aggregated table that contains all rows in that group.

Columns that I want to create in this section are:

Order Count (Count Rows), Total Revenue (Sum of Sales Amount), Order Details (All Rows)

2016-08-22_09h50_34

Adding aggregated columns is as easy as that. Now If you click on OK, you will see the result;

2016-08-22_09h55_06

As you can see Order Count and Total Revenue show the aggregated result of each group, and Order Details (if you click not on the “Table” itself,  but on a blank area on that cell) contains the actual rows in each group. This detailed view can be used for many other calculations or transformations later on. In many cases you will find the All rows option useful.

First and Last Item in each Group

Getting some default aggregation was as easy as selecting them in Group By window. However not all types of operations are listed there. For example in detailed table above you can see that customer 11003 had 9 sales transaction,, and they happened in different Order dates, getting the first and last order date is easy with Max and Min operations. However getting the sales amount or product key associated with that record, or in the other words getting the first and last item in each group isn’t possible through GUI. Fortunately we can use M (Power Query formula language) to achieve this easily.

Sort By Date

To get the first or last item in each group I have to order the table based on that date column. Sorting is possible simply through GUI. and I have to apply that to the step before group by operation. So from the right hand side applied steps list I’ll select Navigation (which is the step before Grouped Rows);

2016-08-22_10h16_49

Now in this view you can order simply by clicking on OrderDateKey and Choose Sort Ascending.

2016-08-22_10h18_02

This will create another step, and asks you do you want to INSERT this step here?

2016-08-22_10h19_14

Click on Insert to confirm you want to insert it here before the Grouped Rows. and then you will see a Sorted Rows step added before Grouped Rows. This means Grouped Rows will use the output of Sorted Rows step as the input for grouping (which is exactly what we want).

2016-08-22_10h21_20

Now you can go to the Grouped Rows step to see the result hasn’t changed but the sub tables are sorted now. All we need from here is to get the first and last item in the sub table.

** If you want to sort based on multiple column simply go to the Advanced Editor and add as many as sections you want to Table.Sort input parameters.

List.First and List.Last

Fortunately Power Query has a bunch of operations on List that we can use. List.First will return the first item in the list (based on the order defined in the list), and List.Last will return the last item. So let’s use them in the Group By operation to fetch first and last sales amount.

To make changes here you need to go to script editor in Power Query which can be achieve via Advanced Editor option in Home tab. You have to make sure that you are in the FactInternetSales Query first.

2016-08-22_10h25_54

Advanced Editor will show you M script that build the output and the group by command as well.

The script in above code section created automatically when you did transformations through GUI. The line with Table.Group is the line that does all the grouping and aggregation. It is a long line, so let me format it better for easier understanding;

Script below is the same script. I just put some enters and tabs to format it better for reading. The above section shows Table.Group section of the script. As you can see Table.Group gets a table as input, which is the #”Sorted Rows” table from the previous step. The group by field is “CustomerKey”. and then a set of aggregated columns one after each other (which is highlighted in code above). Each column has name of the column, type of transformation (or aggregation), and the data type of the column. for example:

Total Revenue is the name of column. calculation for this column is Sum of [SalesAmount] which is one of the fields in the table, and the output is of type number.

So by now you should thinking of how each is to create a new aggregated column here; by adding similar column in the script. I add the new column after Order Details column, so I need an extra comma (,) after that line, and the new lines would be;

Marked lines above uses List.First and List.Last on the exact same structure that List.Sum worked. because we have already sorted the table based on OrderDate so the first item would be the first sales transaction, and the last item would be the last.

Here is the output of this change:

2016-08-22_10h40_46

You can see that the first and the last SalesAmount picked correctly from each group as two new columns.

** Note that with adding some changes in script editor that are not supported through GUI, you will loose the GUI configuration window section. As a result of the change here you cannot change Grouping configuration in GUI anymore, if you want to change it, you have to go to Advanced Editor for this section. So if you are a GUI fan, better to apply all required configuration first, and then add extra logic in the code.

 

Save

Save

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

6 Comments

    • Hi Julian,

      You need to restore .bak file into your SQL Server database engine instance. If you are not familiar with SQL Server, or you haven’t done such thing before let me know and I upload excel exported tables somewhere for you.

      Cheers,
      Reza

  • Hello Reza,

    Thanks a lot for your blog and articles, it helps me a lot. I followed this article and it was really helpful figuring out the “Group by” capability. I have a question though, I have an order book (excel files) representing all the order lines that my company received. I grouped by all those lines to get an aggregate view of each shipment (like you did in the articles) and have no problems with the count of order lines in each shipment or the sum of weight. However, I’d like to see how many distinct articles I have in each order (each order lines contains a “material key” column) but the Countdistinct option doesn’t allow me to chose only a column (it’s made for the entire rows). Is there a way I could include the distinct count option in this query ?

    I’m currently using a calculated table with the SUMMARIZE option which allows me to get to the expected result but it would be great to avoid extra tables.

    One more thing, after aggregating my order lines, and used the “All rows” option as explained in your tutorial, can I use those information (the order details) right away in the Report view ? Because I don’t see those infos anymore in report view.

    Thanks again a lot for your work !

    Renaud

    • Hi Renaud,

      Thanks for your kind words,
      Not sure if I get your question correctly. better to email me the file to: reza at this website’s domain dot com
      Using Group by here would be better than summarize as you mentioned because we won’t add extra tables into the memory.
      All rows option is not for report view to pick up. report view doesn’t understand table types of Power Query. All rows is for doing all other sorts of aggregation if you want to in your group by statement, because you have all detailed rows, so you can do whatever you want. for example your distinct count option I believe should be possible through an all rows starting point in group by.

      Cheers,
      Reza

  • Hello Reza,
    I’m trying to reproduce your example with one of my tables, but the grouping function ignore the sorting step, and I’m not able to obtain the correct first and last items. I’m using the last version of july 2017 of Power by desktop and a Postgresql source.
    Have you some advice on a possible cause of that?
    Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">