Project 0 for CS 371R:
Information Retrieval and Web Search


Due: September 12, 2023 at 11:59 p.m.


Project 0 will NOT be graded, and is entirely optional. It is just an exercise to walk you through the trace-collection and submission procedure. The sole purpose of Project 0 is to smooth out any glitches with Java environment setting, trace-collecting or submitting. If you submit Project 0 and have any problems on the way, the TA can help you with it.

The TA will help people with problems regarding Project 0 until the submission deadline. After the submission deadline for Project 0, it will be your responsibility to make sure that you know how to:

  1. Set up the Java environment correctly
  2. Collect traces
  3. Submit the required files

Project 0

As discussed in class, a basic system for vector-space retrieval (VSR) is available in /u/mooney/ir-code/ir/vsr/. See the Javadoc for this system. Use the main method for InvertedIndex to index a set of documents and then process queries.

You can use the web pages in /u/mooney/ir-code/corpora/curlie-science/ as a set of test documents. This corpus contains 900 pages, 300 random samples each from the Curlie indices for biology, physics, and chemistry.

See the sample trace of using the system. Open a Firefox browser before you run the code in order to have selected documents displayed in the browser.

Your task

  1. Setup your Java environment (see info at http://www.cs.utexas.edu/users/mooney/ir-course/java-info.html)
  2. Collect the trace using the "script" Unix utility. (see Project Submission Info)
  3. Index the document collection at /u/mooney/ir-code/corpora/curlie-science/ and give the queries "cold fusion" and "quantum mechanics".

Submission Instructions

Follow the general instructions for submitting files using Gradescope as described in Project Submission Info. For this assignment, you need to submit the following files:
  1. InvertedIndex.java and InvertedIndex.class (*.java and *.class file)
  2. trace/curlie.txt The trace file under the 'trace' folder. If you change this name, the trace match test will fail
  3. report.pdf A PDF report file with a mere "Hello World!" text.

Zipping Guidelines

Autograder

If you prefer to work in Eclipse, there is also a brief guide on creating an Eclipse project from the class code.