Project 0 for CS 371R:
Due: Sept 17, 2015
Project 0 will NOT be graded, and is entirely optional. It is just
an exercise to walk you through the trace-collection and submission
procedure. The sole purpose of Project 0 is to smooth out any glitches
with Java environment setting, trace-collecting or submitting. If you
submit Project 0 and have any problems on the way, the TA can help you with it.
The TA will help people with problems regarding Project 0 until the
submission deadline. After the submission deadline for Project 0, it will be
your responsibility to make sure that you know how to:
- Set up the Java environment correctly
- Collect traces
- Submit the required files
As discussed in class, a basic system for vector-space retrieval (VSR) is
/u/mooney/ir-code/ir/vsr/. See the Javadoc for this system. Use the main
InvertedIndex to index a set of documents and then process queries.
You can use the web pages in
/u/mooney/ir-code/corpora/dmoz-science/ as a set of test
documents. This corpus contains 900 pages, 300 random samples each from the
DMOZ indices for biology,
physics, and chemistry.
See the sample trace of using the system.
Open a Firefox browser before you run the code in order to have
selected documents displayed in the browser.
- Setup your Java environment (see info at http://www.cs.utexas.edu/users/mooney/ir-course/java-info.html)
- Collect the trace using the "script" Unix utility. (see Project Submission Info)
- Index the document collection at /u/mooney/ir-code/corpora/yahoo-science/ and
give the queries "cold fusion" and "quantum mechanics".
Follow the general instructions for submitting files using Canvas
as described in Project Submission Info.
For this assignment, you need to submit the following files:
For example, the files listed under "Turned In" on Canvas should be:
[PREFIX]_code.zip InvertedIndex code in zip file (*.java and *.class file)
[PREFIX]_experiment1_trace.txtThe trace file
[PREFIX]_report.pdf A PDF report file with a mere "Hello World!" text.
and the zip file should have the following contents:
$ unzip -l proj3_jd1234_code.zip
Length Date Time Name
--------- ---------- ----- ----
21067 2015-08-24 12:57 ir/vsr/InvertedIndex.java
10049 2015-08-26 17:26 ir/vsr/InvertedIndex.class
31106 2 files
If you prefer to work in Eclipse, there is also a brief guide on creating an Eclipse project from the class code.