RevoScaleR Package
RevoScaleR is packages created by Revolution Analytics (own by Microsoft) with the aim of importing, transforming, and analyzing data at scale. There are different categories of functions for the data store, imports and save as, data transformation, draw some charts such as histogram, line and so forth, descriptive analysis, predictive analysis, package management and so forth [3].
In this section, I will introduce some of the function that may use. The example will be shown in SQL Server 2017.
rxLinMod
rxLinMode, fits a linear model to data. Is a regression model for linear data. In below example, I am going to show you how I use this function for creating a model for predicting the strength of the concrete regarding input variables such as the amount of water, cement, ash, and age. Calculates predictions for fitted models. This function creates a model based on the formula and training data set.
As you can you see in below code, the first line is “EXECUTE sp_execute_external_script @language=N’R’” which is calling a stored procedure that allows the user to write an R script.
EXECUTE sp_execute_external_script @language=N'R' ,@script = N'require("RevoScaleR") formula=strength~cement+ash+age+water Model_1=rxLinMod(formula,data=inputDataSet) Model=data.frame(payload = as.raw(serialize(Model_1, connection=NULL)))' ,@input_data_1 = N'Select strength,cement,ash,age,water from [dbo].[concrete]; ' , @output_data_1_name = N'Model' ,@input_data_1_name = N'inputDataSet' WITH RESULT SETS (([concreteModel] varbinary(max) NOT NULL ));
From second to the fifth line, is the relevant R code that stores in @script parameter. The code contains referring to RevoScaler package by using function require. There is no need to install this package as t exists in ML/ R services. The second R code line is about creating a formula. Our aim is to predict the strength of concert regarding some other variables, the formula that able to create this is formula=strength~cement+ash+age+water which stores the formula in “formula” variable. The third line, create a model using rxLinMod function to create a linear model from the formula. Finally, the last line stores the formula in an output variable named “Model”.
@input_data_1 stores the data collected from SQL database using SQL query “N’Select strength,cement,ash,age,water from [dbo].[concrete]”. Finally, the output variable “Model” will store the linear model.
After creating the model, I want to store the model for future to reuse it for predicting the new data. To store the model, I use the below codes:
IF NOT EXISTS (SELECT 1 FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[LinearModel]') AND [type] IN (N'U')) BEGIN CREATE TABLE [dbo].[LinearModel]( [Model] [varbinary](MAX) NULL ); END
Then, we need to Insert the model into the “LinearModel” table.
Insert into [dbo].[LinearModel] EXECUTE sp_execute_external_script @language=N'R' ,@script = N'require("RevoScaleR") formula=strength~cement+ash+age+water Model_1=rxLinMod(formula,data=inputDataSet) Model=data.frame(payload = as.raw(serialize(Model_1, connection=NULL)))' ,@input_data_1 = N'Select strength,cement,ash,age,water from [dbo].[concrete]; ' , @output_data_1_name = N'Model' ,@input_data_1_name = N'inputDataSet'
The model already stored in the database. Now in another store procedure we are going to use the model.
declare @rx_model varbinary(max) = (select [Model] from [dbo].[LinearModel] ); exec sp_execute_external_script @language = N'R' , @script = N' require("RevoScaleR"); cdr_model<-unserialize(rx_model); predictions <- rxPredict(modelObject = cdr_model, data = PredictionData) prediction <-as.data.frame(predictions);' , @input_data_1 = N'Select strength,cement,ash,age,water from [dbo].[concrete]' , @input_data_1_name = N'PredictionData' , @output_data_1_name=N'prediction' , @params = N'@rx_model varbinary(max)' , @rx_model = @rx_model with result sets (("prediction" float not null )) ;
There is another input for store procedure that is not just dataset, it contains an input for models that we already stored in the database.
By running the stored procedure the below prediction result has been shown.
There are other models for the aim regression analysis such as rxLogit that use for logistic regression, rxGlm that generalized the linear model to data and so forth. Moreover, for other analysis such as prediction of the group (classification), there are some algorithms such as rxDTree, rxBTree,rxDForest, and rxNaiveBayes [3].
[1] https://docs.microsoft.com/en-us/sql/advanced-analytics/install/sql-r-services-windows-install
[2] https://docs.microsoft.com/en-nz/sql/ssms/download-sql-server-management-studio-ssms
[3] https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler