This article provides you with a short overview of how to use JRI for using R from within a Java application. In particular, it will give you an understanding of how to use this technology for on-demand predictions based on R models.
Note: Trivial aspects such as constant definitions or exception handling are omitted in the provided code snippets.
JRI is a Java/R Interface providing a Java API to R functionality. A JavaDoc specification of this interface (org.rosuda.JRI.Rengine) can be found here. The project homepage describes how to initially set up JRI in various environments.
Classification or numeric prediction models embedded in R scripts can originate from legacy implementations or conscious decisions to use R for a certain use case. Typically, a static set of data is used to train and validate a model which can then be applied to another static set of unclassified data. However, this approach rather aims at deriving general insights from data sets than predicting concrete instances. It is in particular insufficient for systems with real-time user interaction, for example for custom welcome screens depending on the estimated value of a user that has just registered.
After installing R as well as the JRI package, any Java application will be able to instantiate org.rosuda.JRI.Rengine after adding the corresponding JARs to its build path. The following simplistic example demonstrates how we can use the R interface.
Calling Rengine.eval(String) corresponds to typing commands to the R console and hence provides access to any required functionality. Note that even in this trivial example two separate calls share a common context which is maintained throughout the lifecycle of a Rengine instance. Objects of org.rosuda.JRI.REXP encapsulate any output from R to the user. Depending on the evaluated command, other methods than REXP.asString() may be suitable for extracting its result (see JavaDoc).
Even though it would be possible to implement large R scripts in Java by passing each statement to Rengine.eval(String), this is much less convenient then writing or even re-using traditional .R scripts. So let’s have a look at how we can achieve the same result with a slightly different solution.
Any .R script can be executed like this and all variables it adds to the context will be accessible via JRI. However, this code does not work if the Java application is packaged as a JAR or WAR archive because the .R script will not have a valid absolute path. In this case, copying the script to a regular folder (e.g. java.io.tmpdir) at runtime and passing the temporary file to R is a feasible workaround.
Now that we know how to execute .R scripts using JRI we are able to integrate prediction models based on R into a Java application. The only remaining question is: how can R access the required data for training the model? The easiest way is to use the following file-based approach. We will build a linear regression model that predicts y from x1 and x2.
1. Extract suitable data from the Java persistence layer and store it in a temporary .csv file.
2. Pass the location of this file to R using JRI.
3.Execute a .R script to build the model. The script needs to be syntactically compatible with the extracted .csv file.
After executing this script, the resulting model will be available for any future calls until the entire application or the Rengine instance is re-initialized.
With the knowledge we already have, predicting a new instance (x1,x2) with unknown y is now pretty straightforward:
If you have any feedback, please write to Christian.Kroemer@comsysto.com!