Setting up the Java Environment for CS 371R:
Information Retrieval and Web Search



[A] Running the code from the existing Linux installation on the department fileserver:

1. Setting up the classpath for java

On tcsh or csh shells in Unix: setenv CLASSPATH '.:/u/mooney/ir-code'
On bash shell in Unix: export CLASSPATH='.:/u/mooney/ir-code'

Instead of typing these in everytime you run the code, you can also add these lines in the .cshrc file (for tcsh or csh), or the .bashrc or .profile files (for bash). If you are adding it to your .bashrc, make sure to source ~/.bashrc to update the CLASSPATH variable. You will need to do this every time you open a new terminal window. If you are adding it to your .profile file, make sure to source ~/.profile or open a new terminal window to set the CLASSPATH. You don't need to worry about running source ~/.profile in the future because .profile will be sourced every time you open a new terminal window. You can verify that CLASSPATH has been updated by running the command echo $CLASSPATH

2. Running the code

At the command prompt, type:
java ir.vsr.InvertedIndex -html /u/mooney/ir-code/corpora/curlie-science/

Follow the trace at www.cs.utexas.edu/users/mooney/ir-course/curlie-sample-trace.txt for a list of possible commands to try out. Open a Firefox browser before you run the code in order to have selected documents displayed in the browser.

[B] Making your own copy of the code and running your own Linux installation on the department fileserverdfmw (necessary for projects):

1. Copy the ir sub-directory from the ir-code directory into your HOME directory

At the command prompt, type: cp -r /u/mooney/ir-code/ir $HOME

2. Setting up the classpath for java

On tcsh or csh shells in Unix: setenv CLASSPATH '.:/u/[your-login-name]'
On bash shell in Unix: export CLASSPATH='.:/u/[your-login-name]'

where [your-login-name] is your Unix login name.

If you copy the code into a different directory, make sure that CLASSPATH is set to the parent directory of the "ir" directory. In the example above, we set CLASSPATH to the $HOME directory because we copied the "ir" directory to the $HOME directory.

Instead of typing these in everytime you run the code, you can also add these lines in the .cshrc file (for tcsh or csh), or the .bashrc or .profile files (for bash). (See section A.1 for more information on .bashrc and .profile)

3. Running the code

At the command prompt, type: java ir.vsr.InvertedIndex -html /u/mooney/ir-code/corpora/curlie-science/

Follow the trace at www.cs.utexas.edu/users/mooney/ir-course/curlie-sample-trace.txt for a list of possible commands to try out. Open a Firefox browser before you run the code in order to have selected documents displayed in the browser.

4. Recompiling after modifying code in the ir directory

If you modify the file ABC.java, you can recompile it using the command: javac ABC.java

[C] Running the code under Windows

We do not directly support running the code under Windows, but if you wish to do so, a former student found that the following 3 changes allowed him to run it under Windows. It is up to you to determine any additional changes that are need to get it to run in your own Windows environment.

1. Location of stopwords.txt

In document.java in the ir.vsr package, you need to set the location of the stopwords.txt file. Just downloaded it and store it locally, and changed it to be something like below:
protected static final String stopWordsFile = "C:/cs371r/ir/utilities/stopwords.txt";

2. Using the Browser

Next, you need to change the way the program opens a URL. To do so, you need to change the line in browser.java in the utilities package to be like below:

for Internet explorer:
Runtime.getRuntime().exec("C:/Program Files/Internet Explorer/IEXPLORE.EXE "+url);

for Firefox:
Runtime.getRuntime().exec("C:/Program Files/Mozilla Firefox/firefox.exe "+url);

If you are using OSX, modify the exec line in Browser.java following this Stack Overflow question.

3. Location of the corpus

In the argument for the invertedIndex file, you need to change the path for the corpus folder to use. Download this directory to a local folder, and use that path as the argument.