Most of the time, you would like to know how the data quality looks like in your tables. Although, this is a process that you mainly check at the time of data preparation, having such report for end users, is also helpful, because sometimes they see an odd outcome, and wonder where this is coming from. A data profiling report tab in your Power BI report tells them more about your data, and if there are any anomalies in it. In this article, I will show you an easy way to build a data profiling report in Power BI. If you like to learn more about Power BI, read Power BI book from Rookie to Rock Star.
What is Data Profiling?
Data profiling is the process of understanding more about the data. Any information that can help to understand the data would be helpful. For example, you probably like to know how many unique values you have in the column, what is the minimum values, what is the maximum, the average, standard deviation and etc. This is what Data Profiling will provide for you.
Profiling in Power Query Editor
There is a data profiling section in Power Query Editor, which helps at the time of data preparation to understand the nature of the data. This can be done by selecting the Column Profiling and Quality and Distribution details in the View tab of the Power Query Editor of Power BI Desktop.
The profiling data is great, but unfortunately not available for end users. I’m going to show you how to get that details as a report to the end user.
Power Query Function: Table.Profile
Table.Profile is a Power Query function that processes a table and gives you a table of all columns and data profiling information of those columns. Here is how Table.Profile function works;
To see how the function works, you can start with a New Source in Power Query Editor, and then Blank Query;
In the new blank query, in the formula bar (if you don’t see the formula bar, check the formula bar option in the View tab of the Power Query Editor), type below expression:
Note that this code is not complete yet, we need to provide a table as the input of this function.
Note that everything in Power Query is case sensitive when you type something. Type the table name inside the bracket. like below:
If the table name has space or any special characters in it, then you should follow the rule of variable namings for it, and put that inside double quote with a # at the beginning, for example, Fact Internet Sales should be written like below;
The good thing is that the intellisense is sometimes helpful, and you can just select the item from there.
The Profiling Data
The profiling data that you get from Table.Profile function is like below;
This will give me the list of all columns in the table, one at each row, and the information such as minimum value, maximum, average, standard deviation, count, null count, and distinct count. Great! Now we have this data as a table in Power Query, and that means we can load it into Power BI, You can call this query Data Profiling or something like that, then Close and Apply the Power Query Editor window to load it into the Power BI.
Data Profiling Report
After loading the data into Power BI, you’ll have the table with all columns, and it can be used in any visuals. I prefer to have a tab in my report (report page) named Profiling report with all details needed.
The rest of it is playing with visuals and slicers to give you the visualization of the data profiling in the report. Here is a simple example I built;
Enhance it more with Exception Report
Data profiling report is good to understand the nature of the data. However, it would be more complete, if you add also the Exception report in it. I have explained in three articles about how to do that, and here is where to start:
Information about the data in each column and each table is useful information some time to be shared with the end user. In this article, you learned that using Table.Profile function, you can fetch this information, and then load it into Power BI, and visualize it in the report. For a real-world error proof and comprehensive Power BI report, I do always recommend bringing the profiling report and the exception report in your implementation.