In Previous section you learned about Power Query through an example of data mash-up of movies. Also prior than that you’ve learned about Power BI and its components in Power BI online book from rookie to rockstar. In this section I would like to start exploration of different data sources in Power BI, and I want to start that with an Excel source. Excel source seems to be an easy one, but in the other hand it is one of the most common sources of the data. In this section I want to share some tips about Excel data source and then I show an example of working with Olympic data source in an Excel file.
Excel Data Source
Power Query or Power BI can connect to many data sources, one of the supported data sources in Excel. Power Query for Excel get data from Excel in this way:
Power BI Desktop connects to Excel through Get Data experience
For getting data from excel you just need to specify path of the file. After specifying the file, Power Query will analyze contents of the file and distinguish all sheets and tables in the file and list them in the Navigator dialog box as preview;
As you see in the Navigator dialog box screenshot above, Power Query will distinguish Tables and Sheets and show appropriate icon for any of these types (note to the highlighted section in screenshot above).
Loading Excel Tables into Power Query
If you have data in an Excel Table then it can be easily detected and picked by Power Query as you see in screenshot above, CashFlow is a table in Excel. screenshot below shows the Excel table for CashFlow;
As you can see in screenshot below Power Query (or Power BI) fetched the table fully
Loading Excel Sheets into Power Query
You can get data directly from Excel sheets as well, No matter if you have tables or not. Power Query will always read the data from Excel sheets from all cells that contains data. if you have even two data sets in one Excel sheet Power Query still read that and load it correctly. Here is an example of Excel sheet with two data sets;
When you load this sheet in Power Query only data range of data cells up to the last cell’s column and row will be fetched;
What Happens If Excel Contains Formatting?
Any formatting such as color, font, data type of cells would be ignored when it loads into Power Query. The reason is that Power Query is a data mash-up tools not a modelling or visualization tools. You can apply these formatting later in the model (Power Pivot) or in the report (Power View). As an example; if your data values contains decimal points such as 12.94 and in Excel you’ve formatted cell to have zero decimal points (Excel will show cell value as 13 in this case). Power Query still fetches the original value which is 12.94 .
What Happens to Power View Sheets in Excel?
You can have Power View sheets in Excel, and your Power View sheets can contain Data values, such as table. However Power Query won’t load anything from a Power View Sheet at this stage. Because usually Power View sheet uses a data source you can use that source directly in Power Query. Power View data source might be sources such as Pivot Table or SSAS Tabular connection that can be both used in Power Query directly.
Pivot Tables and Pivot Charts?
Obviously you cannot fetch data from Pivot Charts! Why? because chart is a visualization element, so same principal that I said for Power View in above paragraph works here, connect to Pivot Chart’s source directly. You can get data from PivotTables however. PivotTables data can be fetched exactly as they shaped in Excel file with same structure of columns and rows. For example, if you have a Pivot Table like below screenshot:
After loading that in Power Query, you will see the Pivot Table exactly the same (without formatting);
What If Your Excel Table Has Merged Cells?
Merged Cells in Excel are commonly used. You can have tables with cells merged vertically or horizontally or even both. Here is an example of a table with merged cells;
Power Query still reads cells in their original detailed format. It means merged cells won’t be merged in Power Query, they will be seen as separate cells.
You can use transformations such as Fill Down to fill null values in the remaining cells of merged cells later on. Also note in above screenshot that green highlighted column is a calculated column. This calculated value will be fetched in Power Query as static values.
Example Excel Data Source: Olympic Games
I’ve found list of all medalists in Olympic games from the very first game (1896) till 2008 (unfortunately 2012 london games are not included in this data set). This is fortunately a public list made available by The Guardian, that you can download here. The list is well structured with a main sheet for all medalists as below;
As you see in above sheet all medalists with their main sport category and discipline and detailed event are available. Name of Athletes and their gender and Medals as well as the Olympic game (year) all is listed. Countries information listed as three character code. This three character codes are avaiable as a reference in another sheet of the file named IOC Country Codes as below;
Data is well structured and loading it into Power BI would be just matter of seconds! Start getting this information by Get Data From Excel and then address the downloaded Excel file. In Navigator dialog box choose All Medalists and IOC Country Codes sheets both to be checked. Also note that All Medalists sheet’s data shows in non-merged cell style (as you’ve learned earlier in this section; merged cells will be un-merged in Power Query). After selecting these two sheets click on Edit.
There are just few changes that we need to make;
All Medalists query contains four heading rows which we don’t need them (it is just title and disclaimer) so better to remove those;
Go to All Medalists query and then click on Remove Rows in the Home menu. From the Remove Rows popup menu choose Remove Top Rows
Remove Top 4 Rows in the dialog box by entering 4. and the result will look like below;
Use First Row As Headers
Headers of query as you see in screenshot above are Column1, Column2…. Fortunately in the data set the first row contains column headers. We can simply set that to be used for column headers in Power Query. in the Home tab click on Use First Row As Headers.
Data set will looks like screenshot below after above change
For IOC Country Codes query just set the first row as headers. No more changes is required
Close and Load
After above changes we can now load the result set into Power BI model to build report for it. You can simply click on Close and Apply menu button in Home Tab
Power BI will load result sets into memory for further modeling and reporting.
A bit of Modeling
This section is not about data modeling. However I want you to be able to play with this data set and build some nice reports with it (if you can’t wait till modeling and visualization chapters of this book). So Let’s create the relationship between All medalists and IOC Country Code. For doing this go to Relationship tab in Power BI Desktop and the click on Manage Relationship. Then create a new relationship as below;
After relationship has been created you will see it in the relationship diagram
Now it’s your turn to play with this data set and make some nice reports and visualization items. Here is what I’ve built with this data set:
Don’t be panic if you can’t build report above. I will go through step by step process of building this report later in the Reporting Chapters.
In this section you’ve learned some tips of working with Excel Data source from Power Query or Power BI Desktop. You’ve learned that merged cells would be loaded into Power Query Un-merged. Formatting won’t be considered at the time of loading data into Power Query, and Power Query can load data from Pivot Tables, Excel Tables, and Sheets. You’ve also learned a real world example of fetching Olympic medalists data from Excel file into Power BI Desktop. In next sections I will get you through the journey of Getting data from some other data sources.