Department of Computer Sciences Distinguished Lecture Series- Jeffrey S. Vitter/Texas A&M University: "Compressed Data Structures and Searching Document Collections for the Most Relevant Documents," ACE 2.402, Friday, November 13, 2009, 11:00 am
There is a sign-up schedule for this talk that can be found at
http://www.cs.utexas.edu/department/webevent/utcs/events/cgi/list_events.cg
i
Type of Talk: Departmetn of Computer Sciences Distinguished Lecture
Series
Speaker/Affiliation: Jeffrey S. Vitter/ Texas A&M University
Date/Time: Friday, November 13, 2009, 11:00 am
Location: ACES 2.
402
Host: Vijaya Ramachnadran
Talk Title: "Compressed Data Struct
ures and Searching Document Collections for the Most Relevant Documents"
Talk Abstract:
The world is drowning in data. A key challenge is how
to make use of this data in a meaningful way. This talk addresses compresse
d data structures, a new paradigm for fast search with small space usage.
We address in particular how to search huge document collections and, for
a given query pattern, to find the most relevant documents that contain th
at pattern. "Most relevant" may mean the documents that contain the greates
t number of instances of the pattern or the documents with the highest Page
Rank (such as used by Google). Inverted indexes do not handle general patt
ern search. Suffix trees and suffix arrays support general pattern search,
but they are too expensive in terms of space usage, and in addition, the
y require the finding of every occurrence of the pattern, which can be ver
y expensive when the number of pattern occurrences is much larger than the
number of documents. We improve upon the results of Muthukrishnan with a li
near-space data structure that yields optimum time performance. We also dev
elop a more succinct search structure whose size is proportional to the siz
e of the documents when compressed using a high-order entropy method.
Joint work with Wing-Kai Hon and Rahul Shah
Speaker Bio:
Jeff Vitter
is professor of computer science and engineering at Texas A&M University.
From 2008 to 2009, he served as provost and executive vice president for a
cademics at Texas A&M. Previously he served six years as the Frederick L. H
ovde Dean of the College of Science and Professor of Computer Science at Pu
rdue University. From 1993 to 2002, Dr. Vitter held a distinguished profes
sorship at Duke University, and he served as chair of the Department of Co
mputer Science at Duke from 1993-2001 and as co-director and a founding mem
ber of Duke''s Center for Geometric and Biological Computing from 1997-2002
. He received a PhD in computer science from Stanford University in 1980. D
r. Vitter is a Fellow of the ACM and IEEE.
One theme in Dr. Vitter''s
research and teaching is how to alleviate the I/O bottleneck between fast i
nternal memory and slow external storage (such as disk) that can occur when
processing massive data sets. He is credited as a founder of the field of
external memory algorithms.
;
- About
- Research
- Faculty
- Awards & Honors
- Undergraduate
- Graduate
- Careers
- Outreach
- Alumni
- UTCS Direct