Microsoft SQl Server ML Services: RevoScaleR Package

 

RevoScaleR Package

RevoScaleR is packages created by Revolution Analytics (own by Microsoft) with the aim of importing, transforming, and analyzing data at scale. There are different categories of functions for the data store, imports and save as, data transformation, draw some charts such as histogram, line and so forth, descriptive analysis, predictive analysis, package management and so forth [3].

In this section, I will introduce some of the function that may use. The example will be shown in SQL Server 2017.

rxLinMod

rxLinMode, fits a linear model to data. Is a regression model for linear data. In below example, I am going to show you how I use this function for creating a model for predicting the strength of the concrete regarding input variables such as the amount of water, cement, ash, and age. Calculates predictions for fitted models. This function creates a model based on the formula and training data set.

As you can you see in below code, the first line is “EXECUTE sp_execute_external_script @language=N’R’” which is calling a stored procedure that allows the user to write an R script.

EXECUTE sp_execute_external_script @language=N'R'
 ,@script = N'require("RevoScaleR")
 formula=strength~cement+ash+age+water
Model_1=rxLinMod(formula,data=inputDataSet)
Model=data.frame(payload = as.raw(serialize(Model_1, connection=NULL)))'
  ,@input_data_1 = N'Select strength,cement,ash,age,water from [dbo].[concrete]; '
 , @output_data_1_name = N'Model'
 ,@input_data_1_name = N'inputDataSet'
  WITH RESULT SETS (([concreteModel] varbinary(max) NOT NULL  ));

From second to the fifth line, is the relevant R code that stores in @script parameter. The code contains referring to RevoScaler package by using function require. There is no need to install this package as t exists in ML/ R services. The second R code line is about creating a formula. Our aim is to predict the strength of concert regarding some other variables, the formula that able to create this is formula=strength~cement+ash+age+water which stores the formula in “formula” variable. The third line, create a model using rxLinMod function to create a linear model from the formula. Finally, the last line stores the formula in an output variable named “Model”.

@input_data_1 stores the data collected from SQL database using SQL query “N’Select strength,cement,ash,age,water from [dbo].[concrete]”. Finally, the output variable “Model” will store the linear model.

After creating the model, I want to store the model for future to reuse it for predicting the new data. To store the model, I use the below codes:

IF NOT EXISTS (SELECT 1 FROM sys.objects
                                                                WHERE               object_id = OBJECT_ID(N'[dbo].[LinearModel]')
                                                                                AND [type] IN (N'U'))
BEGIN
                CREATE TABLE [dbo].[LinearModel](
                                [Model] [varbinary](MAX) NULL
                );
END

Then, we need to Insert the model into the “LinearModel” table.

 Insert into [dbo].[LinearModel]
EXECUTE sp_execute_external_script @language=N'R'
 ,@script = N'require("RevoScaleR")
 formula=strength~cement+ash+age+water
Model_1=rxLinMod(formula,data=inputDataSet)
Model=data.frame(payload = as.raw(serialize(Model_1, connection=NULL)))'
  ,@input_data_1 = N'Select strength,cement,ash,age,water from [dbo].[concrete]; '
 , @output_data_1_name = N'Model'
 ,@input_data_1_name = N'inputDataSet'

The model already stored in the database. Now in another store procedure we are going to use the model.

declare @rx_model varbinary(max) = (select [Model] from [dbo].[LinearModel] );
   exec sp_execute_external_script
                                                                                @language = N'R'
                                                                  , @script = N'
                                                                  require("RevoScaleR");
                                                                  cdr_model<-unserialize(rx_model);
                                                                  predictions <- rxPredict(modelObject = cdr_model,
                         data = PredictionData)
                                                                  prediction <-as.data.frame(predictions);'
                , @input_data_1 = N'Select strength,cement,ash,age,water from [dbo].[concrete]'
                , @input_data_1_name = N'PredictionData'
                , @output_data_1_name=N'prediction'
                , @params = N'@rx_model varbinary(max)'
                , @rx_model = @rx_model
                with result sets (("prediction" float not null ))  ;

 

There is another input for store procedure that is not just dataset, it contains an input for models that we already stored in the database.

By running the stored procedure the below prediction result has been shown.

 

There are other models for the aim regression analysis such as rxLogit that use for logistic regression, rxGlm that generalized the linear model to data and so forth. Moreover, for other analysis such as prediction of the group (classification), there are some algorithms such as rxDTree, rxBTree,rxDForest, and rxNaiveBayes [3].

[1] https://docs.microsoft.com/en-us/sql/advanced-analytics/install/sql-r-services-windows-install

[2] https://docs.microsoft.com/en-nz/sql/ssms/download-sql-server-management-studio-ssms

[3] https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler

 

Leila Etaati on LinkedinLeila Etaati on TwitterLeila Etaati on Youtube
Leila Etaati
Trainer, Consultant, Mentor
Leila is the first Microsoft AI MVP in New Zealand and Australia, She has Ph.D. in Information System from the University Of Auckland. She is the Co-director and data scientist in RADACAD Company with more than 100 clients in around the world. She is the co-organizer of Microsoft Business Intelligence and Power BI Use group (meetup) in Auckland with more than 1200 members, She is the co-organizer of three main conferences in Auckland: SQL Saturday Auckland (2015 till now) with more than 400 registrations, Difinity (2017 till now) with more than 200 registrations and Global AI Bootcamp 2018. She is a Data Scientist, BI Consultant, Trainer, and Speaker. She is a well-known International Speakers to many conferences such as Microsoft ignite, SQL pass, Data Platform Summit, SQL Saturday, Power BI world Tour and so forth in Europe, USA, Asia, Australia, and New Zealand. She has over ten years’ experience working with databases and software systems. She was involved in many large-scale projects for big-sized companies. She also AI and Data Platform Microsoft MVP. Leila is an active Technical Microsoft AI blogger for RADACAD.

Leave a Reply