Rliblinear is an R interface to the C++ LIBLINEAR library, which solves classification and regression problems having millions of instances and features. Essentially, this is a very fast, large-scale support vector machine library that has no desire to play in fancy-pants kernel spaces. The R interface automatically expands categorical features (i.e. factors) with $c$ levels into $c$ binary dimensions.
Given training vectors $\vec x_i \in \real^n, i = 1,\ldots, l$ and a vector $\vec y \in \real^{l}$ such that $\vec y_i = \{1, -1\}$, LIBLINEAR builds a weight vector $\vec w$. This weight vector is a linear predictive model; the decision function is just
\[ \mathrm{sign}\left(\vec w^\mathrm{T} \vec x + b\right), \]
with $b = 0$ unless bias = TRUE.
LIBLINEAR solves seven different problems:
L2-regularized logistic regression \[ \min_{\vec w} \quad \tfrac{1}{2}\vec w^{\mathrm{T}}\vec w + C \sum_{i=1}^{l} \log\left( 1 + \exp(-y_{i} \vec w^{\mathrm T } \vec x_{i}) \right) \]
L1-regularized logistic regression \[ \min_{\vec w} \quad \norm{\vec w}_1 + C \sum_{i=1}^{l} \log\left(1 + \exp(-y_{i} \vec w^{\mathrm T}\vec x_{i})\right) \]
L2-regularized L1-loss support vector classification primal \[ \min_{\vec w} \quad \tfrac{1}{2}\vec w^{\mathrm{T}} \vec w + C \sum_{i=1}^{l} \max\left(0, 1 - y_{i} \mathbf{w}^{\mathrm T}\vec x_{i} \right) \]
L2-regularized L2-loss support vector classification primal \[ \min_{\vec w} \quad \tfrac{1}{2}\vec w^{\mathrm{T}} \vec w + C \sum_{i=1}^{l} \max\left(0, 1 - y_{i} \mathbf{w}^{\mathrm T}\vec x_{i} \right)^{2} \]
L2-regularized L1-loss support vector classification dual \[ \min_{\vec\alpha} \quad \tfrac{1}{2}\vec\alpha^{\mathrm{T}} \mathbf{Q} \vec\alpha - \norm{\vec\alpha}_1 \qquad 0 \le \vec\alpha_{i} \le C, \quad i = 1,\ldots, l \] where $\mathbf Q_{ij} = y_i y_j \vec x_i^\mathrm{T} \vec x_j$.
L2-regularized L2-loss support vector classification dual \[ \min_{\vec \alpha} \quad \tfrac{1}{2}\vec\alpha^{\mathrm{T}} \mat{\overline{Q}} \vec\alpha - \norm{\vec\alpha}_1 \qquad 0 \le \vec\alpha_{i} \le \infty, \quad i = 1,\ldots, l \] where $\mathbf{\overline{Q}} = \mathbf Q + \mathbf D$, with $D_{ii} = \frac{1}{2C}$.
L1-regularized L2-loss support vector classification \[ \min_{\vec w} \quad \norm{\vec w}_1 + C \sum_{i=1}^{l} \max\left(0, 1 - y_{i} \mathbf{w}^{\mathrm T}\vec x_{i} \right)^{2} \]
A quick example of using Rliblinear on R’s built-in iris dataset:
require(Rliblinear)
data(iris)
liblinear(data = iris[,1:4],
labels = iris[,5],
cross = 10,
type = 'l2l2_svm_dual',
cost = 1
)
# => 0.95333
#build a model using two thirds of the iris set
l = nrow(iris)
training_indexes = sample(1:l, (2/3)*l)
model = liblinear(data = iris[training_indexes,1:4],
labels = iris[training_indexes,5],
type = 'l2l2_svm_dual',
cost = 1)
#compute the accuracy on the remaining third of the iris set
predictions = predict(model, iris[-training_indexes, 1:4])
sum( predictions == iris[-training_indexes, 5]) / length(predictions)
# => 0.98
Rliblinear expands factor variables into binary dimensions.
For instance, if we just pass in the iris labels, LIBLINEAR expands into three dimensions and returns three weight vectors (one-versus-all multiclass scheme).
data(iris)
model = liblinear(data = iris[,5]
, labels = iris[,5]
, type = 'l2l2_svm_dual'
, cost = 1
, bias = FALSE
)
#Rliblinear labels expanded dimensions with ' = '
model$w
# =>
# data = setosa data = versicolor data = virginica
# [1,] 0.9897152 -0.9897737 -0.9899944
# [2,] -0.9908603 0.9890476 -0.9901037
# [3,] -0.9898116 -0.9907606 0.9903688
You can install the latest version via git;
git clone http://github.com/lynaghk/Rliblinear/
R CMD build Rliblinear
R CMD INSTALL Rliblinear