Creating an R Component

How to create a custom R component for use in analyses.

Before creating the R component, ensure that the following requirements are met:
  • The R script is written in a valid R function format.
  • The R script executes in the R GUI console.
  • The R script has at least one main function.
  • Install the packages required to run the R script either on your machine or on the SAP HANA server.
  • The R script written for In-Database analysis returns a DataFrame.
Following are the best practices you should consider while writing the R script:
  • The R script written for In-Proc analysis returns a DataFrame.
  • Type conversion of the output is recommended; for example, if a column has numeric values, mention it as as.numeric(output)
  • For categorical variables used in the R script, specify the variable using as.factor command.
  1. In the Predict room, under the list of components on the right, choose Start of the navigation path add_component.gif Next navigation step R Component End of the navigation path.
    The Create New Custom-R Component wizard appears.
  2. On the General page, enter the following information:
    1. In the Component Name text box, enter My component.
    2. From the Component Type dropdown list, select Algorithms.
    3. In the Component Description text box, enter R component for Simple Linear Regression.
  3. Choose Next.
    The Script page appears.
  4. On the Script page, choose Load Script to select a file to upload.
    Note You can write or copy and paste the following sample R script in the text box.
    Note Refer to the comments in the following R function format to help you understand and write your own R script.
    #The following is a sample script for a simple linear regression component.
    #You must write the script in a valid R function format.
    #Note that the function name and variable name in R script can be user-defined, and are supported in R.
    #The following is the argument description for the primary function SLR:
    #InputDataFrame: Dataframe in R that contains the output of the parent component.
    #The following two parameters are received from the user through the property view:
    #IndependentColumn: Column name that you want to use as independent variable for the component.
    #DependentColumn; Column name that you want to use as a dependent variable for the component.
    
    SLR<-function(InputDataFrame, IndependentColumn, DependentColumn)
    {
      finalString<-paste(paste(DependentColumn,"~" ), IndependentColumn); 
    #Formatting the final string to
    #pass to "lm" function
    slr_model<-lm(finalString); # calling the "lm" function and storing the output model in "slr_model"
    #To get the predicted values for the Training data set, call the "predict" function with this model and
    #input dataframe, which is represented by "InputDataFrame".
    result<-predict(slr_model, InputDataFrame); # Storing the predicted values in the "result" variable.
    output<- cbind(InputDataFrame, result); # combining "InputDataFrame" and "result" to get the final table.
    plot(slr_model); #Plotting model visualization.
    #returnvalue: function must always return a list that contains results "out", and model variable
    #"slrmodel", if present.
    #The output variable stores the final result.
    #The model variable is used for model scoring.
    return (list(slrmodel=slr_model, out=output))
    }
    
    #The following is the argument description for the model scoring function "SLRModelScoring":
    #InputDataFrame: Dataframe in R that contains the output of the parent component.
    #IndependentColumn: Column name to be used as independent variables for the component.
    #Model: Model variable that is used for scoring.
    
    SLRModelScoring <- function (InputDataFrame, IndependentColumn, Model)
    {
    #Calling "predict" function to get the predictive value with "Model " and "InputDataFrame".
    predicted<-predict(Model, data.frame(InputDataFrame[,IndependentColumn]), level=0.95);
    # Combining “InputDataFrame” and “predicted” to get the final table.
    output <- cbind(InputDataFrame, predicted); 
    #returnvalue: function should always return a list that contains the result ("model result"),
    #The output variable stores the final result
    return(list(modelresult=output))
    }
    

    Two examples of converting an R script to a valid R function format, recognized by Expert Analytics are given below:

    R script R function format (recognized by Expert Analytics)
     dataFrame<-read.csv("C:\\CSVs\\Iris.csv")
     attach(dataFrame)
     set.seed(4321)
     kmeans_model<-
     kmeans(data.frame(`SepalLength`,`SepalWidth`,
     `PetalLength`,`PetalWidth`),
     centers=5,iter.max=100,nstart=1,algorithm=
     "Hartigan-Wong")
     kmeans_model$cluster
    kmeansfunction<-function(dataFrame,independent,
     Clustersize,Iterations,algotype,numberofinitialdsets)
     { 
     set.seed(4321)
     kmeans_model<-kmeans(data.frame(dataFrame[,independent]),
     centers=Clustersize,iter.max=Iterations, nstart=numberofinitialdsets,
     algorithm= algotype)
     output<- cbind(dataFrame, kmeans_model$cluster);
     boxplot(output); return (list(out=output));
     }
    dataFrame<-
     read.csv("C:\\Datasets\\cnr\\Iris.csv") 
     attach(dataFrame) library(rpart)
     cnr_model<-rpart
     (Species~PetalLength+PetalWidth+SepalLength+
     SepalWidth, method="class") library(rpart) 
     predict(cnr_model, dataFrame,type = c("class"))
    cnrFunction<-function(dataFrame,IndependentColumns,dep)
     { 
     library(rpart); 
     formattedString<-paste(IndependentColumns, collapse = '+');
     finalString<-paste(paste(dep, "~" ),
     formattedString); cnr_model<-rpart(finalString, method="class");
     output<- predict(cnr_model, dataFrame,type=c("class")); 
     out<- cbind(dataFrame, output);
     return (list(result=out,modelcnr=cnr_model));
     } 
     cnrFunctionmodel<-function(dataFrame,ind,modelcnr,type)
     {
     output<-predict(modelcnr,data.frame(dataFrame[,ind]),type=type);
     out<- cbind(dataFrame, output); return (list(result=out));
    Note

    Declare parameters for the model scoring function in the primary function, except for Input Dataframe and Input Model Variable Name, which you select from the dropdown lists.

  5. In the Primary Function Details section, enter the following information:
    1. From the Primary Function Name dropdown list, select SLR.
    2. From the Input DataFrame dropdown list, select InputDataFrame.
    3. In the Output DataFrame box, enter out.
    4. Select the Option to save the model checkbox.
      The Model Variable Name field is enabled, and Model Scoring Function Details appears.
    5. In the Model Variable Name field, enter slrmodel.
    6. Select the Show Summary and Option to export as PMML checkboxes.
  6. In the Model Scoring Function Details section, enter the following information:
    1. From the Model Scoring Function Name, select SLRModelScoring.
    2. From the Input DataFrame dropdown list, select MInputDataFrame.
    3. In the Output DataFrame field, enter modelresult.
    4. From the Input Model Variable Name dropdown list, select Model.
  7. Choose, Next.
    The Settings page appears.
  8. In the Output Table Definition section of Primary Function Settings, perform the following substeps:
    1. Choose Consider None.
    2. From the Data Type dropdown list, select Integer.
    3. In the New Predicted Column Name box, enter Predicted column.
  9. In the Property View Definition section, perform the following substeps:
    1. In the Property Display Name, in the Independent column box, enter Independent Column.
    2. From the Control Type dropdown list, select Column Selector (Single) as the control type for the Independent column.
    3. In the Property Display Name, In Independent column box, enter Dependent Column.
    4. From the Control Type dropdown list, select Column Selector (Single) control type for Dependent column.
  10. In the Output Table Definition section of Model Scoring Settings, choose Consider all columns from previous component.
  11. From the Data Type dropdown list, select Integer.
  12. In the New Predicted Column Name, enter Output Column.
  13. In the Property View Definition section, perform the following substeps:
    1. In the Property Display Name, enter Independent column.
    2. From the Control Type dropdown list, select Column Selector (Single) as the control type for the Independent column.
  14. Choose Finish.
Depending on the type of analysis performed, you can create a model just like any other component.