CS 371R:
Information Retrieval and Web Search
Instructor
Raymond J. Mooney
, office hours: Tu 11am-12pm, Thu 2-3pm (in person in GDC 3.806 or on Zoom by appointment via email)
Teaching Assistant
Priyanka Mandikal
, mandikal@cs.utexas.edu, Office hours: Mon, Tue 8 - 9 am,
on Zoom
Time and Place
Fall, 2023; TuTh 9:30-11:00 AM; GDC 5.302 (in-person, no Zoom option).
General Course Information
Basic course information
Course syllabus
Information on course Java code
Information on submitting projects
Textbook
Introduction to Information Retrieval
by
Christopher D. Manning
, Prabhakar Raghavan and Hinrich Schütze
Programming Projects
Project 0: Optional Software Test
(due 9/12)
Project 1: Vector Space Retrieval
(due 9/20)
Project 2: Evaluating Performance of Query Operations
(due 10/4)
Project 3: Web Spidering and Link Analysis
(due 11/1,
NOTE:
parts 1 and 2 due 10/24 and 10/26 respectively)
Project 4: Deep Learning
(due 11/29)
Exams
Midterm: Tuesday, October 10, (
Equation sheet 1
) (
NDCG Equations
)(
Link analysis algorithms sheet
)
Previous year's midterm in PDF
Final: Monday, December 11,
BUR 108
, 1:00 pm-3:00 pm, (
Equation sheet 2
) (
Equation sheet 3
) (
Learning algorithms sheet
) (
Preceptron algorithm sheet
)
Previous year's final in PDF
See /u/mooney/ir-code/solns/ for sample solutions.
PowerPoint Presentations
Introduction (
PowerPoint
) (
PDF
) (
PDF handout
)
Boolean and Vector-Space Retrieval Models (
PowerPoint
) (
PDF
) (
PDF handout
)
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval (
PowerPoint
) (
PDF
) (
PDF handout
)
Performance Evaluation of Information Retrieval Systems (
PowerPoint
) (
PDF
) (
PDF handout
)
Query Operations (Relevance Feedback / Query Expansion) (
PowerPoint
) (
PDF
) (
PDF handout
)
Text Properties and Languages (
PowerPoint
) (
PDF
) (
PDF handout
)
Web Search: Introduction (
PowerPoint
) (
PDF
) (
PDF handout
)
Web Search: Spidering (
PowerPoint
) (
PDF
) (
PDF handout
)
Web Search: Link Analysis (
PowerPoint
) (
PDF
) (
PDF handout
)
Automated Text Categorization and Neural Network Learning (
PowerPoint
) (
PDF
) (
PDF handout
)
Automated Text Categorization: IR, kNN, and Naive Bayes (
PowerPoint
) (
PDF
) (
PDF handout
)
Language Models (
PowerPoint
) (
PDF
) (
PDF handout
)
Language-Model Based Retrieval (
PowerPoint
) (
PDF
) (
PDF handout
)
Deep Learning (
PowerPoint
) (
PDF
) (
PDF handout
)
Recommender Systems (
PowerPoint
) (
PDF
) (
PDF handout
)
Ethical Issues in IR (
PowerPoint
) (
PDF
) (
PDF handout
)
Clustering, Information Extraction, and Semantic Parsing (
PowerPoint
) (
PDF
) (
PDF handout
)
Servlet Demos
Simple Search Engine
Java Course Code
A jar file for the course Java code is available
here
.
JavaDoc for Course Code
All packages
Vector-Space Retrieval
Performance Evaluation
Web Utilities
Text Classifers
Utilities
Related Courses
Information Retrieval Course at UMass
Web Search and Mining Course at Stanford
Information Retrieval and Web Agents Course at Johns Hopkins
Intelligent Information Retrieval Course at DePaul
Miscellaneous Links
ACM Special Interest Group on Information Retrieval (SIGIR)
Text REtrieval Conference (TREC)
World-Wide Web Consortium (W3C)
On-line textbook on Information Retrieval by C. J. van Rijsbergen (1979)
Information Retrieval Links
UMass Center for Intelligent Information Retrieval
Bibliography on Zipf's Law
Web Robots Pages
Prosecuting Bots for Trespassing (e.g. Ebay vs. Bidder's Edge)
(or try a Google search on "robots.txt lawsuit")
Search Engine Watch
Search Tools for Web Sites
History of Search Engines
Scientific American articles on
XML
and the
Semantic Web
Web IR and IE
Reading List on Machine Learning and Information Retrieval
Repository of Online Information Sources Used in Information Extraction Tasks
Bibliography on Automated Text Categorization
Recommender Systems Links
NY Times article on Text Mining
Relevant Books Written for the General Public
Weaving the Web: The original design and ultimate destiny of the World Wide Web, by its inventor
, Tim Berners-Lee with Mark Fischetti, 1999.
Speeding the Net: The Inside Story of Netscape and How It Challenged Microsoft
, Joshua Quittner, Michelle Slatalla, 1998.
The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture
, John Battelle, 2005.
The Google Story
, David Vise and Mark Malseed, 2005.
Planet Google: One Company's Audacious Plan To Organize Everything We Know
, Randall Stross, 2008.
In The Plex: How Google Thinks, Works, and Shapes Our Lives
, Stephen Levy, 2011.
Linked: The New Science of Networks: How Everything is Connected to Everything Else and What it Means for Science, Business and Everyday Life
, A.L. Barabasi, 2002. (book on the statistical properties of the Web and other graph structures in nature)
The Long Tail: Why the Future of Business is Selling Less of More
, Chris Anderson, 2006. (book on how Zipfian power laws also describe the range of choices available to consumers on the web)
mooney@cs.utexas.edu