Department of Computer Sciences Distinguished Lecture Series- Jeffrey S. Vitter/Texas A&M University: "Compressed Data Structures and Searching Document Collections for the Most Relevant Documents," ACE 2.402, Friday, November 13, 2009, 11:00 am

Contact Name: 
Jenna Whitney
Nov 13, 2009 11:00am - 12:05pm

There is a sign-up schedule for this talk that can be found at


Type of Talk: Departmetn of Computer Sciences Distinguished Lecture


Speaker/Affiliation: Jeffrey S. Vitter/ Texas A&M University

Date/Time: Friday, November 13, 2009, 11:00 am

Location: ACES 2.


Host: Vijaya Ramachnadran

Talk Title: "Compressed Data Struct

ures and Searching Document Collections for the Most Relevant Documents"

Talk Abstract:
The world is drowning in data. A key challenge is how

to make use of this data in a meaningful way. This talk addresses compresse

d data structures, a new paradigm for fast search with small space usage.

We address in particular how to search huge document collections and, for

a given query pattern, to find the most relevant documents that contain th

at pattern. "Most relevant" may mean the documents that contain the greates

t number of instances of the pattern or the documents with the highest Page
Rank (such as used by Google). Inverted indexes do not handle general patt

ern search. Suffix trees and suffix arrays support general pattern search,
but they are too expensive in terms of space usage, and in addition, the

y require the finding of every occurrence of the pattern, which can be ver

y expensive when the number of pattern occurrences is much larger than the

number of documents. We improve upon the results of Muthukrishnan with a li

near-space data structure that yields optimum time performance. We also dev

elop a more succinct search structure whose size is proportional to the siz

e of the documents when compressed using a high-order entropy method.

Joint work with Wing-Kai Hon and Rahul Shah

Speaker Bio:
Jeff Vitter
is professor of computer science and engineering at Texas A&M University.

From 2008 to 2009, he served as provost and executive vice president for a

cademics at Texas A&M. Previously he served six years as the Frederick L. H

ovde Dean of the College of Science and Professor of Computer Science at Pu

rdue University. From 1993 to 2002, Dr. Vitter held a distinguished profes

sorship at Duke University, and he served as chair of the Department of Co

mputer Science at Duke from 1993-2001 and as co-director and a founding mem

ber of Duke''s Center for Geometric and Biological Computing from 1997-2002

. He received a PhD in computer science from Stanford University in 1980. D

r. Vitter is a Fellow of the ACM and IEEE.

One theme in Dr. Vitter''s

research and teaching is how to alleviate the I/O bottleneck between fast i

nternal memory and slow external storage (such as disk) that can occur when
processing massive data sets. He is credited as a founder of the field of

external memory algorithms.