UTCS Colloquia - Robert Grimm/New York University, "Parsing All of C by Taming the Preprocessor", ACES 2.402

Contact Name: 
Jenna Whitney
Date: 
Jul 19, 2011 10:30am - 11:30am

There is a sign-up schedule for this event that can be found at

http://apps.cs.utexas.edu/talkschedules/cgi/list_events.cgi

Type o

f Talk: UTCS Colloquia

Speaker/Affiliation: Robert Grimm/New York Univ

ersity

Talk Audience: UTCS Faculty, Graduate Students, Undergraduate
Students and Outside Interested Parties

Date/Time: Tuesday, July 19

, 2011, 10:30 a.m.

Location: ACES 2.402

Host: Kathryn McKinley

n
Talk Title: Parsing All of C by Taming the Preprocessor

Talk Abstr

act:
Given the continuing popularity of C for building large-scale
prog

rams, such as Linux, Apache, and Bind, it is critical to provide
effe

ctive tool support, including, for example, code browsing, bug
findin

g, and automated refactoring. Common to all such tools is a
need to par

se C. But C programs contain not only the C language
proper but also pre

processor invocations for file inclusion
(#include), conditional compila

tion (#if, #ifdef, and so on), and
macro definition/expansion (#define

). Worse, the preprocessor is a
textual substitution system, which is

oblivious to C constructs and
operates on individual tokens. At the same
time, the preprocessor is
indispensable for improving C''s expressivity

, abstracting over
software/hardware dependencies, and deriving variati

ons from the same
code base. Linux 2.6.38, for example, depends on abo

ut 18,000 header
files for file inclusion, 10,000 configuration variab

les for
conditional compilation, and 616,000 macros for code expansion.

In this talk, I present a new tool for parsing all of C, including

arbitrary preprocessor use. Our tool, which is called SuperC, is
bas

ed on a systematic analysis of all interactions between lexing,
preproce

ssing, and parsing to ensure completeness. It first lexes and
preproces

ses source code while preserving conditionals. It then
parses the result
using a novel variant of LR parsing, which
automatically splits parsers
when encountering a conditional and joins
them again when reaching the s

ame input in the same state. The result
is a well-formed AST, containin

g static choice nodes for conditionals.
While the parsing algorithm and e

ngine are new, neither grammar nor LR
parser table generator need to cha

nge. I discuss the results of our
problem analysis, the parsing algorit

hm itself, the pragmatics of
building a real-world tool, and a demonstr

ation on the x86 version of
the Linux kernel.

This is joint work wit

h Paul Gazzillo.

Speaker Bio:
Robert Grimm received his Ph.D. in 200

2 from the University of
Washington for his work on system support for pe

rvasive applications.
He then became a faculty member in the Department o

f Computer Science
at New York University, where he continues to teach.
His research
covers both systems and languages, including stream proces

sing and
multilingual programming, respectively.