In this project, along with the SPECTER2 Base Model, we shall be using the Proximity Adapter to embed documents and the Adhoc Query Adapter to embed queries. We will compare two variants of SPECTER2 embeddings, one with just the base model and one with adapter modules attached to the base model. Each of the two cases is described below.
We have precomputed embeddings for the CF documents and queries stored as space-separated 768-dimensional vectors in the following files:
/u/mooney/ir-code/embeddings/specter2_base/docs/RN-00001...RN-01239/u/mooney/ir-code/embeddings/specter2_base/queries/Q001...Q100/u/mooney/ir-code/embeddings/specter2_adapter/docs/RN-00001...RN-01239/u/mooney/ir-code/embeddings/specter2_adapter/queries/Q001...Q100'Deep' versions of Experiment and ExperimentRated used in Project 2 are also in ir.eval.DeepExperiment and ir.eval.DeepExperimentRated that produce precision-recall and NDCG plots evaluating the DeepRetriever. These use the normal 'queries' and 'queries-rated' files used by the normal experiment code but also takes a directory of query embeddings containing an embedding file (list of real-values). The query embedding directory should have files names Q1,..Qn giving the embeddings of the queries in the order they are in the original 'queries' file. The files will be lexicographically sorted by name and this order should correspond to the order in the queryFile so file numbers should have leading '0''s as needed to sort properly, i.e. Q001, Q002, ... Q099, Q100. A sample DeepExperimentRated trace is here.
Implement and evaluate such a simple hybrid approach by writing the following classes: ir.vsr.HybridRetriever, ir.eval.HybridExperiment, and ir.eval.HybridExperimentRated. The HybridRetriever should combine a DeepRetriever with a normal InvertedIndex to produce a simple weighted linear combination of the results. The evaluation code can be fairly easily generated by properly combining code from the deep and original versions of Experiment and ExperimentRated. The main methods for the Hybrid Experiment classes should take the following args:
Command args: [DIR] [EMBEDDIR] [QUERIES] [QUERYDIR] [LAMBDA] [OUTFILE] where:
DIR is the name of the directory whose files should be indexed.
EMDEDDIR is the name of the directory whose files have embeddings of the documents in DIR
QUERIES is a file of queries paired with relevant docs (see queryFile).
QUERYDIR is the name of the directory where the query embeddings are stored in files Q1...Qn
LAMBDA is the weight [0,1] to put on the deep cos similarity with (1-LAMBDA) on the VSR cosine sim
OUTFILE is the name of the file to put the output. The plot data for the recall precision curve is
stored in this file and a gnuplot file for the graph is the same name with a ".gplot" extension
For your experiments with HybridRetriever use the SPECTER2 Base embeddings and cosine similarity for the DeepRetriever (so it uses the same general metric as InvertedIndex constrained to be between 0 and 1).
ir.eval.DeepExperimentRated class to evaluate the embeddings. The commands to be run are as follows:
java ir.eval.ExperimentRated /u/mooney/ir-code/corpora/cf/ /u/mooney/ir-code/queries/cf/queries-rated results/vsr
java ir.eval.DeepExperimentRated /u/mooney/ir-code/embeddings/specter2_base/docs /u/mooney/ir-code/queries/cf/queries-
rated /u/mooney/ir-code/embeddings/specter2_base/queries results/specter2_basejava ir.eval.DeepExperimentRated /u/mooney/ir-code/embeddings/specter2_adapter/docs /u/mooney/ir-code/queries/cf/queries-
rated /u/mooney/ir-code/embeddings/specter2_adapter/queries results/specter2_adapterjava ir.eval.HybridExperimentRated /u/mooney/ir-code/corpora/cf/ /u/mooney/ir-code/embeddings/specter2_base/docs /u/mooney/ir-code/queries/cf/queries- rated /u/mooney/ir-code/embeddings/specter2_base/queries 0.5 results/hybrid05
You should submit your work on Gradescope. In submitting your solution, follow the general course instructions on submitting projects on the course homepage. Along with that, follow these specific instructions for Project 4:
code/ - A folder containing all your code, added and modified *.java and *.class files. Please do not modify the original java files but extend each class and override the appropriate methods.report.pdf - A PDF report of your experiment as described above with the plots referenced in the instructions.results/ - A folder containing the data files used to generate your plots with the following contents:
vsr, vsr.ndcgspecter2_base, specter2_base.ndcgspecter2_adapter, specter2_adapter.ndcghybrid, hybrid.ndcgMake sure that these files match the output of the code that you submit.
Name
---------------------------------------
ir/vsr/HybridRetriever.java
ir/eval/HybridExperiment.java
ir/eval/HybridExperimentRated.java
Name
---------------------------------------
vsr vsr.ndcg
specter2_base specter2_base.ndcg
specter2_adapter specter2_adapter.ndcg
specter2_base_cos specter2_base_cos.ndcg
specter2_adapter_cos specter2_adapter_cos.ndcg
hybrid03 hybrid03.ndcg
hybrid05 hybrid05.ndcg
hybrid07 hybrid07.ndcg
hybrid08 hybrid08.ndcg
hybrid09 hybrid09.ndcg
Please make sure that your code compiles and runs on the UTCS lab machines.
# Test a single hybrid experiment
java -cp /u/mooney/ir-code:code ir.eval.HybridExperimentRated \
/u/mooney/ir-code/corpora/cf \
/u/mooney/ir-code/embeddings/specter2_base/docs \
/u/mooney/ir-code/queries/cf/queries-rated \
/u/mooney/ir-code/embeddings/specter2_base/queries \
0.5 results/hybrid05
# Verify output files created
ls -lh results/hybrid05*
# Should show: hybrid05, hybrid05.gplot, hybrid05.ndcg, hybrid05.ndcg.gplot
# Check format (PR: 11 lines, NDCG: 10 lines)
wc -l results/hybrid05 results/hybrid05.ndcg
This project includes an automated autograder on Gradescope. Its score is for reference only and does not directly determine your final grade. The autograder provides feedback on:
| Test | Points | Checks |
|---|---|---|
| 1.1 File Structure | 5 pts | All required Java files present in correct package structure |
| 1.2 Compilation | 10 pts | Code compiles without errors (or pre-compiled .class files work) |
| 2.1 Experiment Execution | 15 pts | All 10 experiment configurations run successfully |
| 2.2 Numerical Accuracy | 20 pts | Hybrid results match reference solution within tolerance |
VSR and deep experiments (specter2_base, etc.) use the base framework and are not graded by the autograder, but you still need to submit those results for your report.