CS 371R Information Retrieval and Web Search: Course Syllabus

Course Syllabus for CS 371R
Information Retrieval and Web Search

Chapter numbers refer to the text: Introduction to Information Retrieval

Introduction: Chapter 1.
Goals and history of IR. The impact of the web on IR.
Basic IR Models: Chapters 1 & 6.
Boolean and vector-space retrieval models; ranked retrieval; text-similarity metrics; TF-IDF (term frequency/inverse document frequency) weighting; cosine similarity.
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval: Chapters 2 & 6.
Simple tokenizing, stop-word removal, and stemming; inverted indices; efficient processing with sparse vectors; Java implementation.
Experimental Evaluation of IR: Chapter 8.
Performance metrics: recall, precision, F-measure, and NDCG; Evaluations on benchmark text collections.
Query Operations: Chapters 9 and 3.
Relevance feedback; Query expansion.
Text Representation: Section 5.1 and Chapter 10.
Word statistics; Zipf's law; Porter stemmer; morphology; index term selection; using thesauri.
Web Search: Chapters 19, 20, & 21.
Search engines; spidering; metacrawlers; directed spidering; link analysis (e.g. hubs and authorities, Google PageRank); shopping agents.
Text Categorization: Chapters 13 & 14.
Categorization algorithms: Rocchio, nearest neighbor, and naive Bayes. Applications to information filtering and organization.
Language-Model Based Retrieval : Chapter 12.
Using naive Bayes text classification for ad hoc retrieval. Improved smoothing for document retrieval.
Text Clustering: Chapters 16 & 17.
Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM). Applications to web search and information organization.
Recommender Systems: Read this paper by Herlocker et al.
Collaborative filtering and content-based recommendation of documents and products.
Ethical Issues in IR :
Privacy, Fairness, Fake news and disinformation, Filter bubble , Viewpoint diversity, Fostering extremism, Internet addiction.
Information Extraction and Integration:
Extracting data from text; semantic web; collecting and integrating specialized information on the web.
Question Answering :
Semantic parsing. Question Answering from structured data and text.
Deep Learning for IR :
Word embeddings. Neural language models.

Course Syllabus for CS 371R Information Retrieval and Web Search

Course Syllabus for CS 371R
Information Retrieval and Web Search