CS 378 Natural Language Processing

CS 378 Natural Language Processing
Elaine Rich

Schedule of Topics

Class Policies:
  Textbook
  Contact information
  Office hours
  Grading

Homeworks

Projects

NLP Resources on the Web

Lectures



Course Description   Natural languages (Engish, Chinese, Swahili, etc.) have evolved over thousands of years as efficient vehicles for human to human communication. Enter computers. What's the connection? In this class we'll attempt to answer that question. In particular, we will look at the following more specific questions:

Is there any reason a computer should know English or Chinese or Swahili?
Answer: Yes. There are several "killer apps" for natural language processing including retrieving information from the web, translating documents from one language to another, and spoken front ends to all kinds of application programs.
Why is natural language processing hard? Stated another way: What would a program have to know to be able to work effectively with natural language input?
Answer: Three kinds of things:

Properties of the language itself at many levels: the facts about sounds, syllables, words, sentences, and sensible paragraphs and dialogues.
Knowlege about the things that are being talked or written about.
The rules that map between linguistic structures and their meanings.

What's the current state of the art in natural language processing?
Answer: We'll look at a variety of systems and see what they can do. To get a sample, take a look at the demo systems you'll find listed on the class Resources page

What computational techniques are available for working with natural languages?
Answer: There are a variety of techniques at each level of linguistic analysis. In this class, we will focus primarily on typed inputs, although we will look briefly at spoken language. There are two main families of techniques for working with natural language:

Logical techniques, which depend on symbolic models of languages. These models typically take the form of rules for forming sounds into words, rules (grammars) for forming words into sentences, rules for mapping between words and their meanings, and rules for reasoning about the meanings themselves.
Statistical techniques, which depend on patterns that can be extracted automatically from large linguistic corpora.

Text
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Dan Jurafsky, James H. Martin, Keith Vander Linden, Nigel Ward, Daniel Jurafsky, James H. Martin. Prentice-Hall, 2000. The book's web site contains a lot of useful information, including an errata sheet and a chapter by chapter list of resources on the web.

Contact Information
Elaine Rich - ear@cs.utexas.edu