Power BI or Power Query in Excel (or Get Data and Transform as the new name of it) can do many data transformations. One of these transformations is grouping rows by number of fields. If you use the graphical user interface in Power Query for Power BI or Excel you have number of options to get some aggregated results such as count of rows, maximum or minimum, average of each group, or sum… But there are still heaps of operations that you cannot do through GUI, such as getting the last item in each group, or first item. Fortunately with M (Power Query Formula Language) you can apply any of these operations you want. In this post I’ll show you how to get benefit of both; start with GUI to write the Group by command for you, and then customize it in M script to achieve what you want. If you like to learn more about Power BI read Power BI online book; from Rookie to Rock Star. If you like to learn more about Power Query, start with Introduction to Power Query.
Learning Objectives for this section
By completing example of this post you will learn;
- How to perform Group By in Power Query/Power BI
- How to leverage pre-defined aggregation functions in Group By GUI window
- How to extend grouping possibilities with changes in Power Query script
Prerequisite
For this example I will be using AdventureWorksDW sample Microsoft SQL Server database. You can download it from here.
Scenario
Scenario that I want to solve as an example is this:
FactInternetSales has sales transaction information for each customer, by each product, each order date and some other information. We want to have a grouped table by customer, which has the number of sales transaction by each customer, total sales amount for that customer, the first and the last sales amount for that customer. First and last defined by the first and last order date for the transaction.
Get Data
Let’s start by getting data from SQL Server, Choose AdventureWorksDW as the source database, and select DimCustomer and FactInternetSales as the only tables for this example. Click on Edit to move into Power Query window.
Group By Transformation
FactIntenetSales table is the one we want to apply all transformations in. So Click on FactInternetSales first, then from Transform Tab, select Group By option as the first menu option.
This will open the Group By dialog window with some configuration options
By default Group By happens on the selected columns. because usually the first column is the selected column (in our table ProductKey), then the column mentioned under group by section is also ProductKey. You can change this to another column and add or remove columns in this section.
Choose the Group By Field
Based on your final grain of the output table the group by field will be defined. In this example we want the final table to have one record per Customer, so CustomerKey (which is the identifier for each customer) should be our Group By Column.
Note that you can add as many fields as you want in the Group By section. the result would be one record per combination of unique values from all these fields.
Add Aggregation Fields
Group by result would be one record per each unique combination of all fields set in the “group by ” section. In addition you can also have some aggregated columns. Here are list of operations you can have by default:
Most of items above are self explanatory. For example; when you want to count number of sales transaction. you can use Count Rows. If you want total Sales amount for each group you can choose Sum, and then in the Column section choose the column as SalesAmount. All Rows will generate a sub table in each element of the aggregated table that contains all rows in that group.
Columns that I want to create in this section are:
Order Count (Count Rows), Total Revenue (Sum of Sales Amount), Order Details (All Rows)
Adding aggregated columns is as easy as that. Now If you click on OK, you will see the result;
As you can see Order Count and Total Revenue show the aggregated result of each group, and Order Details (if you click not on the “Table” itself, but on a blank area on that cell) contains the actual rows in each group. This detailed view can be used for many other calculations or transformations later on. In many cases you will find the All rows option useful.
First and Last Item in each Group
Getting some default aggregation was as easy as selecting them in Group By window. However not all types of operations are listed there. For example in detailed table above you can see that customer 11003 had 9 sales transaction,, and they happened in different Order dates, getting the first and last order date is easy with Max and Min operations. However getting the sales amount or product key associated with that record, or in the other words getting the first and last item in each group isn’t possible through GUI. Fortunately we can use M (Power Query formula language) to achieve this easily.
Sort By Date
To get the first or last item in each group I have to order the table based on that date column. Sorting is possible simply through GUI. and I have to apply that to the step before group by operation. So from the right hand side applied steps list I’ll select Navigation (which is the step before Grouped Rows);
Now in this view you can order simply by clicking on OrderDateKey and Choose Sort Ascending.
This will create another step, and asks you do you want to INSERT this step here?
Click on Insert to confirm you want to insert it here before the Grouped Rows. and then you will see a Sorted Rows step added before Grouped Rows. This means Grouped Rows will use the output of Sorted Rows step as the input for grouping (which is exactly what we want).
Now you can go to the Grouped Rows step to see the result hasn’t changed but the sub tables are sorted now. All we need from here is to get the first and last item in the sub table.
** If you want to sort based on multiple column simply go to the Advanced Editor and add as many as sections you want to Table.Sort input parameters.
Because Power Query by default goes through query folding, it is recommended that after this step, you add a usage of Table.Buffer, to make sure that the sorting runs before the grouping. That can be an step witht his code: = Table.Buffer(#”Sorted Rows”)
List.First and List.Last
Fortunately Power Query has a bunch of operations on List that we can use. List.First will return the first item in the list (based on the order defined in the list), and List.Last will return the last item. So let’s use them in the Group By operation to fetch first and last sales amount.
To make changes here you need to go to script editor in Power Query which can be achieve via Advanced Editor option in Home tab. You have to make sure that you are in the FactInternetSales Query first.
Advanced Editor will show you M script that build the output and the group by command as well.
let Source = Sql.Databases("."), AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data], dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data], #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}), Custom1 = Table.Buffer(#"Sorted Rows"), #"Grouped Rows" = Table.Group(Custom1, {"CustomerKey"}, {{"Order Count", each Table.RowCount(_), type number}, {"Total Revenue", each List.Sum([SalesAmount]), type number}, {"Order Details", each _, type table}}) in #"Grouped Rows"
The script in above code section created automatically when you did transformations through GUI. The line with Table.Group is the line that does all the grouping and aggregation. It is a long line, so let me format it better for easier understanding;
let Source = Sql.Databases("."), AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data], dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data], #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}), Custom1 = Table.Buffer(#"Sorted Rows"), #"Grouped Rows" = Table.Group(Custom1, {"CustomerKey"}, { {"Order Count", each Table.RowCount(_), type number}, {"Total Revenue", each List.Sum([SalesAmount]), type number}, {"Order Details", each _, type table} } ) in #"Grouped Rows"
Script below is the same script. I just put some enters and tabs to format it better for reading. The above section shows Table.Group section of the script. As you can see Table.Group gets a table as input, which is the #”Sorted Rows” table from the previous step. The group by field is “CustomerKey”. and then a set of aggregated columns one after each other (which is highlighted in code above). Each column has name of the column, type of transformation (or aggregation), and the data type of the column. for example:
let Source = Sql.Databases("."), AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data], dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data], #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}), Custom1 = Table.Buffer(#"Sorted Rows"), #"Grouped Rows" = Table.Group(Custom1, {"CustomerKey"}, { {"Order Count", each Table.RowCount(_), type number}, {"Total Revenue", each List.Sum([SalesAmount]), type number}, {"Order Details", each _, type table} } ) in #"Grouped Rows"
Total Revenue is the name of column. calculation for this column is Sum of [SalesAmount] which is one of the fields in the table, and the output is of type number.
So by now you should thinking of how each is to create a new aggregated column here; by adding similar column in the script. I add the new column after Order Details column, so I need an extra comma (,) after that line, and the new lines would be;
let Source = Sql.Databases("."), AdventureWorksDW2012 = Source{[Name="AdventureWorksDW2012"]}[Data], dbo_FactInternetSales = AdventureWorksDW2012{[Schema="dbo",Item="FactInternetSales"]}[Data], #"Sorted Rows" = Table.Sort(dbo_FactInternetSales,{{"OrderDateKey", Order.Ascending}}), Custom1 = Table.Buffer(#"Sorted Rows"), #"Grouped Rows" = Table.Group(Custom1, {"CustomerKey"}, { {"Order Count", each Table.RowCount(_), type number}, {"Total Revenue", each List.Sum([SalesAmount]), type number}, {"Order Details", each _, type table}, {"First SalesAmount", each List.First([SalesAmount]), type number}, {"Last SalesAmount", each List.Last([SalesAmount]), type number} } ) in #"Grouped Rows"
Marked lines above uses List.First and List.Last on the exact same structure that List.Sum worked. because we have already sorted the table based on OrderDate so the first item would be the first sales transaction, and the last item would be the last.
Here is the output of this change:
You can see that the first and the last SalesAmount picked correctly from each group as two new columns.
** Note that with adding some changes in script editor that are not supported through GUI, you will loose the GUI configuration window section. As a result of the change here you cannot change Grouping configuration in GUI anymore, if you want to change it, you have to go to Advanced Editor for this section. So if you are a GUI fan, better to apply all required configuration first, and then add extra logic in the code.
Hi Reza,
Could you please provide me a blank workbook covering the replated tables? I don’t know how to deal with the file with extension .bak after downloading the zip file from http://msftdbprodsamples.codeplex.com. Thanks for your great help in advance.
Julian Chen
Hi Julian,
You need to restore .bak file into your SQL Server database engine instance. If you are not familiar with SQL Server, or you haven’t done such thing before let me know and I upload excel exported tables somewhere for you.
Cheers,
Reza
Hello Reza,
Thanks a lot for your blog and articles, it helps me a lot. I followed this article and it was really helpful figuring out the “Group by” capability. I have a question though, I have an order book (excel files) representing all the order lines that my company received. I grouped by all those lines to get an aggregate view of each shipment (like you did in the articles) and have no problems with the count of order lines in each shipment or the sum of weight. However, I’d like to see how many distinct articles I have in each order (each order lines contains a “material key” column) but the Countdistinct option doesn’t allow me to chose only a column (it’s made for the entire rows). Is there a way I could include the distinct count option in this query ?
I’m currently using a calculated table with the SUMMARIZE option which allows me to get to the expected result but it would be great to avoid extra tables.
One more thing, after aggregating my order lines, and used the “All rows” option as explained in your tutorial, can I use those information (the order details) right away in the Report view ? Because I don’t see those infos anymore in report view.
Thanks again a lot for your work !
Renaud
Hi Renaud,
Thanks for your kind words,
Not sure if I get your question correctly. better to email me the file to: reza at this website’s domain dot com
Using Group by here would be better than summarize as you mentioned because we won’t add extra tables into the memory.
All rows option is not for report view to pick up. report view doesn’t understand table types of Power Query. All rows is for doing all other sorts of aggregation if you want to in your group by statement, because you have all detailed rows, so you can do whatever you want. for example your distinct count option I believe should be possible through an all rows starting point in group by.
Cheers,
Reza
Hello Reza,
I’m trying to reproduce your example with one of my tables, but the grouping function ignore the sorting step, and I’m not able to obtain the correct first and last items. I’m using the last version of july 2017 of Power by desktop and a Postgresql source.
Have you some advice on a possible cause of that?
Thank you
Hi Nick,
Can you please share your M code? maybe sorting didn’t applied?
Cheers
Reza
Hi Reza,
I am facing similar problem as Nick. As i update the data, the group by feature is not working.
Can you please help me to solve this problem
Hi Jahn
I just realized in the steps I explained here, one step is missing. Table.Buffer step is now added, add it to your script and it should work.
if not, please send me your PBIX file to check
Cheers
Reza
How does the data aggregate if you simply select the “All Columns” option? Will it Sum? First Record, Avg?
Great Post!
All Columns will return a Table in every cell. this would be a sub table for every group. then you can apply any types of aggregations you want on that table.
Cheers
Reza
You are a true genius! Thank you for this post!
Thanks Shabnam jan
great to see you here
Hi Reza,
I couldn’t download the SQL file, it does not recognize the bak extension and I am not familiar with SQL. Can you send me the same excel tables ? Many thanks.
Hi Demet, download section of our website has some issues at the moment. we will fix it soon 🙂
Cheers
Reza
Thank you so much for this post! Very useful!
Hi Reza,
Thanks for this you have just made my life a lot easier 🙂
Reza! Thanks a lot! This is a great explanation! You’re most kind to have shared. Cheers.
hi sir,
good day…
i desperately need your help ..i dont know this website blog still active or not ..if active then please reply .
i neeed help related to the power query.
thanks and regards,
sanket
Hi Sanket
please feel free to ask your question here
Cheers
Reza
Worked like a charm five years after posting. Thanks, Reza!
Hello Reza,
it is so powerful. I have to learn more aboout M
Thanks a lot
Georgios