Prediction via KNN (K Nearest Neighbours) KNN Power BI: Part 3

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

fronpage

K Nearest Neighbour (KNN ) is one of those algorithms that are very easy to understand and it has a good level of accuracy in practice. In Part One of this series, I have explained the KNN concepts. In Part 2 I have explained the R code for KNN, how to write R code and how to evaluate the KNN model. In this post, I want to show how to do KNN in Power BI.

If you do not have the Power BI Desktop, install it from https://powerbi.microsoft.com/en-us/

In power BI, Click on “Get Data” to import the data into Power BI, the data set . we have used the same dataset as in Part Two. The dataset contains the patient data such as : their diagnosis and laboratory results (31 columns).

1

You will see (number 3) data in right side of the Power BI.

We want to clean the data first, hence we click on the “Edit Queries” in Power BI to do some data cleaning and also apply R scripts for KNN Model creation (Number 1 and 2).

2

By clicking on the “Edit Query”, we will see the “Query Editor” windows.

First of all, we want to remove the “ID” column. ID attributes does not have impact on prediction results. Hence, we right click on the “ID” column in power bi (number 1 and 2). and remove the ID column from data set.

4

Another data cleaning approach is about replacing “B” value with “Benign ” and “M” with “Malignant” in Diagnosis column. To do that, we right click on the diagnosis column (number 1). Then click on the “Replace Value” (number 2) in Transform tabe. In replace values place, for “Value To Find” type “B” (number 3) then “Replace With” the “Benign”. Do the same for Malignant.

 

5

The result of the applied query will be look like below picture:

6

Another data cleaning is about data normalization. normalization has been explained in Part Two. We want to convert all the measurement value in same scale. hence we click on “Transform” tab, then in transform tab click on the data set (any numeric column). In this step we are using R scripts to normalize data. So, we click on the R scripts to perform normalization.

7

we write the same code we have in Part 2. the whole data (wis_bc_data) will be hold in “dataset”. for doing normalization, we first write a function (number 1) and the function will be store in”normalize” variable. Then we apply normalize function on dataset. (number 3). we want to apply the function on numeric data not text (diagnosis column) hence, we refer to dataset[2:31] that means apply function on column 2 to column 31. the result of the function is data frame that will be sored in “wbcd_n” variable.

 

8-1

In next step, we are going to create test and train data set (Part 2). in this example, we going to put aside row number 1 to 469 for training and creating model and from row number 470 to 569 for testing the model. Finally, data is  ready, now we able to train  model and create KNN algorithm.

8-2

We already  installed package “Class” in R studio. Now we able to call function KNN to predict the patient diagnosis. KNN function accept the training data set and test data set as second arguments. moreover the prediction label also need for result. we want to use KNN based on the discussion on Part 1, to identify the number K (K nearest Neighbor), we should calculate the square root of observation. here for 469 observation the K is 21. the result is “wbcd_test_pred” holds the result of the KNN prediction. however, we want to have the result beside the real data  so we store the test data set in “output” variable, then add the separate column to store the prediction result (number 3). Then click on “OK” to apply the R scripts on data.

We will see below output:

9

for each output (Data frame in R scripts), we will have a table value. we store the final result in “output” in our R scripts,  so we click on the Value (table) in front of the “output” name (see above picture). The result will be look like below picture. In output, we will have the “Test” dataset and in the last column, we will have the prediction results. If you look at the left bottom side, you will see 100 rows that are the number of test case we have.

10

After R transformation, just click on the “Close&Apply” to see the result in visualization.

 

11

However, sometimes you want also see the patient ID in result data set. So what we do is to change the R code, as below :

output<-dataset[470:569]

That means, we just get the whole data set from row 470 to 569. This data set is original one that contains the patient ID.

 

14

Hence, just close and apply the query.

In the below visualization, I have shown the result of the prediction and the real data. In current dataset we have patient  ID, real data about the doctor’s diagnosis and the predicted diagnosis by KNN. in Real Diagnosis filter (picture:number 1) we able to select patient that “Malignant” and in second filter we able to choose the prediction result “Benign”. this will show us the cases that prediction could not work properly.

we have 2 cases that the KNN predict in wrong way. as you see in below picture.

 

15

In the next series, I will talk about the other algorithms such as Neural Network (Deep Learning), Time series, and Decision Tree.

Moreover, there is an upcoming series on Azure ML which will be start soon.

Save

Save

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Leila Etaati
Dr. Leila Etaati is Principal Data Scientist, BI Consultant, and Speaker. She has over 10 years’ experience working with databases and software systems. She was involved in many large-scale projects for big sized companies. Leila has PhD of Information System department, University of Auckland, MS and BS in computer science. Leila is Microsoft Data Platform MVP.

7 Comments

  • Hello,
    You have nice contents in your blog, which are helpful to learn. By following your contents, I was trying to do the same on my machine. Unfortunately I am having the following issue. Do you mind to tell me, how can I fix this? I appreciate your help.
    Formula.Firewall: Query ‘wisc_bc_data’ (step ‘Run R Script’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.
    Thanks.
    Hello Thanks for that!
    yes it sometimes may ask, can you set it as public? could you find any option that said make it publick, also look at the below weblog on power query may helps
    http://www.excelguru.ca/blog/2015/03/11/power-query-errors-please-rebuild-this-data-combination/

  • Hi Dr. Leila,
    I’ve raised an issue in response to the subject part II pending for your moderation. Please take a look at your convenience.
    With regard to this post – part 3, I thought I could follow along your coach step by step in PowerBI Desktop to come out a result as expected. Unfortunately,
    In the step using the following R scripts to normalize data I also got stuck and the error message read ” unexpected end of input”
    normalize <- function(x) {
    return ((x – min(x)) / (max(x) – min(x))) }
    wbcd_n <- as.data.frame(lapply(wbcd[2:31], normalize))
    wbcd_train <- wbcd_n[1:469, ]
    wbcd_test <- wbcd_n[470:569, ]
    wbcd_train_labels <- dataset[1:469, 1] # in part II this line read wbcd_train_labels <- wbcd[1:469, 1]
    wbcd_test_labels <- dataset[470:569, 1] # in part II this line read wbcd_test_labels <- wbcd[470:569, 1]
    install.packages("class")
    library("Class")
    wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21)
    output <- wbcd_test
    output$result <- wbcd_test_pred
    Would you please spare your time to help me dig out what's the problem. I would appreciate it.
    Julian

    • Hi Julian,
      sure, in power BI you not able to run install.packages(“”), you just able to call library…so first install.packages(“class”) in your local R studio (what ever version you have) then just in power bi R editor write library(“class”). hope it works

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">