README file for cocluster

1. Run with default values, like

./cocluster-linux -r 50 -c 50 ../classic3

  This will generate 50 row clusters and 50 column clusters. 
  The input file is in CCS format, which has tfn scaling (default). 
  No first variation (local update).
  No prior. 
  Random initialization. 
  The threshold for batch loop is 0.001 * mutual information of the original matrix 
  and the threshold for first variation is -0.0000001 * mutual information of the original matrix.
  The output file will contain two lines: 
    the first one line contains row cluster IDs for every row 
    and the second line contains column cluster IDs for every column. 


2. Parameters

-a ALGORITHM
   specifies coclustering algorithm to run.
   ALGORITHM can be one of
     i  information-theoretic coclustering algorithm (default)
     e  euclidean coclustering algorithm (1st mssrcc)
     r  minimum residue coclustering algorithm (2nd mssrcc)

-c NUM
   specifies number of column clusters.

-d LEVEL 
   specifies a dump level to which dump ouput is shown. 
   LEVEL can be one of
     0  little information is printed (default)
     1  detailed obj. func. value of each iteration is printed
     2  detailed obj. func. value of each iteration is printed and writen in a dump file
     3  all intermidate results including cluster centroids, condensed matrix etc. are writen in dump file. Used for debugging only.

-e TYPE THRESHOLD
   specifies a threshold of either batch update or local search. 
   TYPE can be one of
     b  threshold for batch loop (default THRESHOLD is .001)
     i  threshold for first variation (default THRESHOLD is -.0000001)

-F FORMAT FILENAME
   specifies input matrix's format. 
   FORMAT can be one of 
     d  dense matrix
     s  sparse matrix in CCS format
     t  transpose of a dense matrix
   FILENAME is the prefix of the input matrix. Note that user can specify usual path sturctures.

-I
   sets to take inversed rows (default is not taking inversed rows.).

-i INITIALIZATION
   specifies initialization method. 
   INITIALIZATION can be one of 
     r  randomly assign cluster ID to each row and column
     s  read initial cluster ID from a file which lists each co-cluster in the following format:
	#rows #columns
	rowID rowID....
	columnID columnID...
	...
     S  FILENAME
        which has initial cluster labels and has 2 lines: 
        the first one line contains row cluster IDs for every row 
	and the second line contains column cluster IDs for every column.
	Note that this parameter is related with "-O s" parameter. 

-l NUM
   specfies length of local search (i.e. first variation chain length).

-O TYPE FILENAME
   specifies output file format and name.
   TYPE can be one of 
     b  lists each co-cluster in the following format:
	#rows #columns
	rowID rowID....
	columnID columnID...
	...
     s  simple format which has 2 lines: 
        the first one line contains row cluster IDs for every row 
	and the second line contains column cluster IDs for every column. 
	Note that this parameter is related with "-i S" parameter.

-p PRIOR
   specifies prior.
   PRIOR can be one of
     u  uniform prior
     h  max entropy prior
     n  no prior (default)

-R NUM
   speicifies number of random runs, which gives average objective function value over the specified number of runs.

-r NUM
   specifies number of row clusters.

-T TYPE FILENAME
   specifies type of true label file and corresponding filename.
   TYPE can be one of 
     r  row true labels
     c  column true labels

-t SCALING
   specifies scaling type (default is tfn) of matrix.

-V NUM
   specifies type of variation of batch update algorithm (default is 0, not taking variation.)
   NUM can be one of
     0  not taking variation (i.e., single row/col batch update, updating compressed matrx)
     1  single row/col batch update, not updating compressed matrix
     2  multiple row/col batch update, updating compressed matrix
     3  multiple row/col batch update, not updating compressed matrix
     4  flip a coin to select either row/col batch update, updating compressed matrix
-v 
   prints version number of coclustering package