README file for cocluster 1. Run with default values, like ./cocluster-linux -r 50 -c 50 ../classic3 This will generate 50 row clusters and 50 column clusters. The input file is in CCS format, which has tfn scaling (default). No first variation (local update). No prior. Random initialization. The threshold for batch loop is 0.001 * mutual information of the original matrix and the threshold for first variation is -0.0000001 * mutual information of the original matrix. The output file will contain two lines: the first one line contains row cluster IDs for every row and the second line contains column cluster IDs for every column. 2. Parameters -a ALGORITHM specifies coclustering algorithm to run. ALGORITHM can be one of i information-theoretic coclustering algorithm (default) e euclidean coclustering algorithm (1st mssrcc) r minimum residue coclustering algorithm (2nd mssrcc) -c NUM specifies number of column clusters. -d LEVEL specifies a dump level to which dump ouput is shown. LEVEL can be one of 0 little information is printed (default) 1 detailed obj. func. value of each iteration is printed 2 detailed obj. func. value of each iteration is printed and writen in a dump file 3 all intermidate results including cluster centroids, condensed matrix etc. are writen in dump file. Used for debugging only. -e TYPE THRESHOLD specifies a threshold of either batch update or local search. TYPE can be one of b threshold for batch loop (default THRESHOLD is .001) i threshold for first variation (default THRESHOLD is -.0000001) -F FORMAT FILENAME specifies input matrix's format. FORMAT can be one of d dense matrix s sparse matrix in CCS format t transpose of a dense matrix FILENAME is the prefix of the input matrix. Note that user can specify usual path sturctures. -I sets to take inversed rows (default is not taking inversed rows.). -i INITIALIZATION specifies initialization method. INITIALIZATION can be one of r randomly assign cluster ID to each row and column s read initial cluster ID from a file which lists each co-cluster in the following format: #rows #columns rowID rowID.... columnID columnID... ... S FILENAME which has initial cluster labels and has 2 lines: the first one line contains row cluster IDs for every row and the second line contains column cluster IDs for every column. Note that this parameter is related with "-O s" parameter. -l NUM specfies length of local search (i.e. first variation chain length). -O TYPE FILENAME specifies output file format and name. TYPE can be one of b lists each co-cluster in the following format: #rows #columns rowID rowID.... columnID columnID... ... s simple format which has 2 lines: the first one line contains row cluster IDs for every row and the second line contains column cluster IDs for every column. Note that this parameter is related with "-i S" parameter. -p PRIOR specifies prior. PRIOR can be one of u uniform prior h max entropy prior n no prior (default) -R NUM speicifies number of random runs, which gives average objective function value over the specified number of runs. -r NUM specifies number of row clusters. -T TYPE FILENAME specifies type of true label file and corresponding filename. TYPE can be one of r row true labels c column true labels -t SCALING specifies scaling type (default is tfn) of matrix. -V NUM specifies type of variation of batch update algorithm (default is 0, not taking variation.) NUM can be one of 0 not taking variation (i.e., single row/col batch update, updating compressed matrx) 1 single row/col batch update, not updating compressed matrix 2 multiple row/col batch update, updating compressed matrix 3 multiple row/col batch update, not updating compressed matrix 4 flip a coin to select either row/col batch update, updating compressed matrix -v prints version number of coclustering package