SMIDAS is a C++ implementation of the stochastic mirror descent algorithm proposed in
It can be used for l_{1}-regularized loss minimization for both classification and regression problems. This means that it (approximately) solves the following minimization problem:
min_{w} (1/m) ∑_{i=1, … , m} L(w^{T} x_{i}, y_{i}) + λ|w|_{1}
Currently supported loss functions are the logistic loss [L(a,b) = log(1+exp(-ab))], the hinge loss [L(a,b) = max(0,1-ab)], and the squared loss [L(a,b) = (a-b)^{2}]. The x_{i}'s are example vectors with d coordinates x_{i,1}, … x_{i,d}. The y_{i}'s are labels that are either ±1 (classification) or arbitrary floats (regression). The logistic and hinge losses only make sense in a classification context but the squared loss can be used in either. λ is the regularization parameter. SMIDAS is designed to run fast even for large values of m and d and can exploit the sparsity in the examples (i.e. if a lot of coordinates in the example vectors are zero).
The source code is released under the GNU General Public License. The authors are not responsible for any consequences arising out of the use of the code, so please use it at your own risk. We do, however, appreciate receiving bug reports from you. Please direct them to Ambuj Tewari via email.
If you use SMIDAS in your research, please cite it as
After downloading the file smidas.tar.gz into a suitable directory, unpack it by typing
tar zxvf smidas.tar.gz
This will create a directory called SMIDAS with the following 9 files in it:
make
This will create an executable named smidas (plus a couple of object files).
./smidasand the following usage message is displayed:
USAGE: ./smidas [options]All 5 command line options are optional and receive their default values indicated above if they are not supplied by the user. The options lambda, loss and iters set the regularization parameter, the loss function to use, and the number of iterations for which the SMIDAS algorithm will be run, respectively. SMIDAS requires a learning rate parameter η. The option eta is meant to supply this. The printout option selects the interval after which a summary of current state of the optimization will be printed on the standard output. The following information about the current weight vector w is included in this summary:SMIDAS -- L_1 regularized loss minimization OPTIONS: -lambda regularization parameter (default = 0.001) -iters number of iterations (default = 1000) -eta learning rate (default = 0.001) -loss 0 for logistic, 1 for hinge, 2 for squared loss (default = 0) -printout after how many iterations to print w (default 1000)
Conceptually, we often think of the data as a huge m x d matrix with the examples sitting as rows of this matrix. Viewed this way, smidas expects its input data file to be in a sparse row major format as described below:
Line 1: m d Line 2: N i1 v1 i2 v2 ... ⋮ Line m+1: N i1 v1 i2 v2 ...
The first line tells smidas that there are m examples each with d features or coordinates. The second line gives non-zero features for example 1. The number N is the total number of non-zero features for that examples. This is followed by N pairs of the form i1 v1. The pair i1 v1 means that feature (i1+1) has value v1. Note that, following C/C++ convention, the indices i1, i2, ... start from zero. For example, if features 1, 3 and 10 are the only non-zero features for example 1, and these have values 0.2, 0.9 and -0.7 respectively, then line 2 would read:
3 0 0.2 2 0.9 9 -0.7The example indices need not appear in increasing order.
Line 3 then gives values for example 2, Line 4 for example 3, and so on till Line m+1.
0 0 0 0.2 0.2 0 0 0.3 0 -0.5 0.4 0.6 0 0 0The labels for the 3 examples are 1,1 and -1 respectively. Now, run smidas with logistic loss, the default value (0.001) of the regularization parameter, and learning rate 0.01, by typing
./smidas -eta 0.01 -iters 500000 -printout 50000 test_examples test_labelsThis runs smidas for 500,000 iterations printing a summary line every 50,000 iterations. You should see an output that looks something like
110 200000 21.8226 5 0.110917 0.13274 0 250 400000 28.8088 5 0.0572377 0.0860465 0 380 600000 32.9352 5 0.0384127 0.0713479 0 500 800000 35.8222 5 0.0290536 0.0648759 0 610 1000000 37.9964 5 0.0234737 0.0614701 0 730 1200000 39.7255 5 0.0197936 0.0595191 0 850 1400000 41.1593 5 0.0171696 0.0583289 0 960 1600000 42.37 5 0.0152157 0.0575857 0 1080 1800000 43.405 5 0.0137119 0.057117 0 1210 2000000 44.3091 5 0.0125135 0.0568226 0The actual output might be slightly different due to the randomness in the algorithm but the objective function should converge to 0.0551 for this toy example.