UTCS Colloquium: Lars Nyland/NVIDIA The Simplest CUDA program (worth writing) ACES 2.402 Monday February 11 2008 9:30 a.m.

Contact Name: 
Jenna Whitney
Feb 11, 2008 9:30am - 10:30am

There is a sign-up schedule for this event:

Type of talk: UTCS Colloquium

Speaker/Affiliation: Lars Nyla


Date/Time: Monday February 11 2008 9:30 a.m.


cation: ACES 2.402

Host: Keshav Pingali

Talk Title The Sim

plest CUDA Program (worth writing)

Talk Abstract
CUDA is the prog

ramming language that gives direct access to the
high-performance comput

ing hardware built by NVIDIA (although there
is no reason it is restric

ted to that domain). It adds a few features to
the C programming langu

age to create terminate and synchronize
threads and specify how the

different memories shall be used (main
memory device memory and user-

controlled cache memory).

Why even consider writing CUDA programs?

There are two reasons:
reducing the time to result and improving the qu

ality of the result. The
hardware that runs CUDA applications can run so

me applications hundreds
of times faster than similar code on a high-pe

rformance CPU although
not all applications can see this kind of impro

vement. The extra
computational horsepower can also be used to improve
the quality of
results rather than achieving the same results more qu

ickly (or any mix
in between).

In this talk I will present a si

mple yet time-consuming algorithm that can
be drastically sped up usin

g CUDA. I''ll present the CUDA implementation
of signal correlation (cr

oss-correlation) in 1D and 2D. The algorithm is
simple even though the
reasons it works so well may be more subtle.
I''ll show the C impleme

ntation followed by a simple port to CUDA and
conclude with modificati

ons to further improve performance (by making
better use of key compone

nts of the hardware). As a necessary step
along the way I''ll describ

e the G80 hardware currently being sold by
NVIDIA and show several exa

mples of successful computing results
using such hardware.


er Bio:
Lars Nyland is a senior architect in the ''''compute'''' group a

where he designs develops and tests architectural features t

o support
non-traditional uses of graphics processors. Prior to joining

Lars was an associate professor of computer science at the Colo

School of Mines in Golden Colorado. He ran the Thunder Graphics L

where demanding computational applications were coupled with immersi

3D graphics. Between Lars'' PhD and his position in Colorado he w

as a
member of the research faculty at UNC Chapel Hill where he was a

member of the high-performance computing and image-based rendering

roups. Some notable achievements were the development of the
scene digitizer and its use at Monticello to provide an
immersive expe

rience for visitors to the New Orleans Museum of Art''s
Jefferson and Na

poleon exhibit. He also spent considerable time studying
N-Body algorith

ms parallelizing N-Body algorithms for Molecular Dynamics
and parallel
programming languages. Lars earned his PhD at Duke
University in 1991

under the direction of John Reif exploring high-level
parallel program

ming languages.