

For corporate sponsorship opportunies, please visit our corporate sponsorship page.
Tuition scholarships for computationally-oriented students are being sponsored by the North American chapter of the Association for Computational Linguistics.
For a full list of all Institute courses, please follow the Courses link on the left.
Course registration will open on Friday, January 19th. You must go through the process of administrative registration before you can register for courses.
Dan Jurafsky, Christopher Manning (T/F 3:45-5:30 PM)
Lecture series on industry applications of computational linguistics, natural language processing, and speech and dialogue processing. Lectures will be given by industry professionals. You will have a chance to hear what computational linguists are doing in industry, the problems they get to work on, and their advice on how you should prepare yourself if you're interested in computational linguistics in industry.
Course Areas: Computational Linguistics
Prerequisites: Recommended but not required: concurrent enrollment in other computational linguistics course(s).
Martha Palmer (M/TH 3:45-5:30 PM)
One of the great challenges of Natural Language Processing is the multitude of choices that language gives us for expressing the same thing in different ways, whether we are considering translations into other languages, or simply paraphrases within the same language. In this course we will examine several different styles of semantic representation that have been proposed by different researchers, often with the goal of creating a more abstract representation that could match several different paraphrases or even translations. We will investigate the inherent difficulties in defining these representations as well as their potential utility in different types of NLP applications. We will explore in particular the use of annotated data and machine learning techniques as an increasingly popular approach to building and making use of both shallow and deep semantic representations, and will also discuss briefly the allure of applying machine learning techniques to text without annotations.
Course Areas: Computational Linguistics, Semantics/Pragmatics
Prerequisites: Undergraduate Semantics and a familiarity with and interest in Natural Language Processing.
Dan Klein (M/TH 1:30-3:15 PM)
This course will present an overview of statistical methods for learning grammar from data. We will start with an introduction to probabilistic models of syntax, considering both their adequacy as linguistic models and their place in language processing systems. We will then discuss how such models can be learned from data. On one extreme, one can learn grammars from fully parsed corpora (very highly specified data which would be unavailable to a human learner). On the other extreme, one can learn grammars from entirely unlabeled streams of words (in some ways a harder task than human learners face). In both cases, surprising amounts of linguistic structure can be induced. We will discuss various attempts at grammar learning systems, why they have variously succeeded or failed, and how they relate to classical learnability arguments.
Course Areas: Computational Linguistics, Empirical Methods, Language Acquisition, Morphology/Syntax
Prerequisites: The course will assume basic knowledge of probability and statistics, as well as some familiarity with basic parsing methods.
Steven Bird, Ewan Klein(M/TH 1:30-3:15 PM)
This course is an introduction to the principal methods and theories in computational linguistics. Topics to be covered include: tokenization, tagging, chunking, grammars, parsing, and semantics. More specialized topics may be added if time is available. The course will have an empirical focus and a strong reliance on annotated linguistic corpora. Laboratory sessions will give participants hands-on experience in writing programs to manipulate and analyze linguistic data, including annotated corpora and language documentation, using the Python programming language and the Natural Language Toolkit (NLTK). Assessment will involve completion of some programming tasks. This course will also be of interest to instructors who are considering adoption of NLTK.
Course Areas: Computational Linguistics, Empirical Methods
Required Presession Courses: Python Programming for Linguists (waived for participants who are already experienced in Python)
Emily M. Bender, Dan Flickinger, Stephan Oepen (T/F 10:15-12 PM)
Precision grammar implementation is the practice of encoding linguistic constraints in ever larger, machine-readable grammar fragments and testing those fragments against hand-constructed test suites as well as naturally occurring text. By using the machine to compare the grammar to the data, grammar engineers are able to test their hypotheses against thousands of sentences in mere minutes, test their analyses of different phenomena for consistency, and test their hypotheses against corpus data that goes beyond the carefully selected examples needed in analyses of particular linguistic phenomena. This class combines lectures and hands-on laboratory sessions to explore the methodology and implications of precision grammar implementation, including basic grammar engineering techniques; treebank annotation, using the English Resource Grammar and other existing large resource grammars; test suite development in the context of multilingual grammar engineering; and machine translation as an application for precision grammars.
Course Areas: Computational Linguistics, Empirical Methods, Morphology/Syntax
Prerequisites: Familiarity with formal syntax and basic computer use. No programming experience required.
Required Presession Courses: Introduction to HPSG
Dan Jurafsky (M/TH 3:45-5:30 PM)
Introduction to automatic speech recognition and speech synthesis. In speech recognition we will learn key algorithms in the noisy channel paradigm, focusing on the standard 3-state Hidden Markov Model (HMM), including the Viterbi decoding algorithm and the Baum-Welch training algorithm. We will also learn about representations of the acoustic signal like MFCC coefficients, and the use of Gaussian Mixture Models (GMMs) and context-dependent triphones for acoustic modeling. Finally, we will cover N-gram language modeling and perplexity. In speech synthesis we will focus on concatenative synthesis, covering text normalization, grapheme-to-phoneme conversion, prosodic modeling, and waveform synthesis. We will also give a brief overview of other speech processing tasks, such as speaker and language ID and the use of forced alignment for automatic phonetic labeling. Course will involve lectures and programming homeworks.
Course Areas: Computational Linguistics, Phonetics/Phonology
Prerequisites: Strictly Required: Programming ability, a class in Phonetics, and some probability theory [the probability theory can be acquired from presession Math Refresher course] Recommended: any basic intro to computational linguistics, or intro to artificial intelligence
Required Presession Courses: Mathematics Refresher for Computational Linguistics or equivalent
Julia Hirschberg (T/F 10:15-12 PM)
The course will discuss current issues in spoken dialogue systems, including the modeling of human-human behavior such as turn-taking, grounding, timing, lexical entrainment, feedback, and clarification strategies; automatic speech act identification, error detection and correction strategies; and the technologies supporting such systems and how we evaluate them. Examples will be drawn from commercial and research spoken dialogue systems. This course is suitable for all levels and will be conducted as lecture/discussion.
Course Areas: Computational Linguistics, Discourse
Christopher Manning (T/F 8-9:45 AM)
Over the last decade, statistical parsing has transformed our ability to produce automatic, high-accuracy parses of arbitrary human language text. This course aims to teach from the basics up to the state-of-the-art in this domain. It will begin by reviewing the phenomena that motivated statistical approaches to parsing, context-free grammars (CFGs), and probabilistic CFGs. Next it will present basic parsing algorithms, concentrating on generalized CKY and A* parsing algorithms, and discuss treebanks, their design and nature, and the methods of building and evaluating parsers based on them. The course will then turn to the well-known and successful Collins and Charniak generative parsing models of the late 1990s, and discuss issues such as smoothing, head lexicalization, engineering for efficiency, and what kinds of information parsers use and need. Finally, we will turn to discriminative methods of parsing, and discuss both parse re-ranking techniques and the direct construction of discriminative parsers.
Course Areas: Computational Linguistics, Empirical Methods Quantitative Methods
Prerequisites: Basic familiarity with notions of probability theory. Reasonable competence with mathematical notation. A developed sense of algorithmic thinking, such as from a CS algorithms course.
Patrick Blackburn, Johan Bos (M/TH 8-9:45 AM)
This is an introductory course in computational semantics for natural language. It focuses on building and performing inference with Discourse Representation Structures (DRSs). In particular, it shows how to build DRSs computationally, and how to use a logical inference architecture to carry out linguistically interesting work. We will work with DRSs in which reference to events is possible, and within this setting we shall give a uniform treatment of binding theory (reflexive and anaphoric pronouns) and presupposition. Throughout the course we will emphasize the interdependence between semantic construction and inference, and we will show how our approach can be combined with robust parsing methods. Ideally, students attending this course will have some familiarity with predicate logic, and know some basic syntax, but beyond that, the course is self-contained. In particular, we will be teaching Discourse Representation Theory (DRT) from scratch, and we will not be assuming that our students have any background in computational linguistics.
Course Areas: Computational Linguistics, Discourse, Semantics/Pragmatics
Prerequisites: This course is not particularly technical (we won't be delving deeply into implementation techniques or the underlying logical ideas). However some familiarity with first-order predicate logic would be useful, and students who feel comfortable thinking computationally are likely to get more out of the course.
Kevin Knight, Philip Resnik, Philipp Koehn (T/F 1:30-3:15 PM)
The statistical approach to machine translation provides a set of techniques for (1) automatically learning translation knowledge from bilingual data, and (2) applying that knowledge to translate previously-unseen sentences. When it was first introduced, statistical MT was far too slow and inaccurate to be useful -- it was an interesting lab experiment. Now, statistical MT significantly outperforms other methods in many language pairs and domains, at speeds permitting commercial applications like foreign news broadcast translation. What made this possible? How has use of phrasal and syntactic knowledge helped? This course will cover the basic theory, the major technical advances of the past few years, and known limitations. Topics will include the architecture of statistical MT systems, "phrase based" translation models, synchronous grammar models, n-gram language models their role, decoding (i.e. search for good translation hypotheses guided by a model), and formal evaluation of MT technology. A special emphasis of this course will be on the application of syntactic knowledge in the translation process.
Fermín Moscoso del Prado Martín (7-9 PM)
Steven Bird, Ewan Klein (3:45-5:45 PM)
This course is an introduction to programming and algorithmic problem solving in the Python programming language. Python is well-suited to linguistic programming and is particularly easy for novice programmers to learn. Topics to be covered include: fundamental data types, control structures, functions, regular expressions, simple tokenization, arrays, dictionaries, files, and corpora. Laboratory sessions will give participants hands-on experience in writing simple Python programs.
Course Areas: Computational Linguistics