CS 421: Natural Language Processing

Fall 2020

Contact Information

Professor: Natalie Parde (parde@uic.edu)
Office Hours: Virtual, Tuesday 9:30 - 10:30 a.m. / Thursday 3:00 - 4:00 p.m. CST
 
Teaching Assistant: Usman Shahid (hshahi6@uic.edu)
Office Hours: Virtual, Wednesday / Friday 9:00 - 11:00 a.m. CST
 
Piazza: https://piazza.com/uic/fall2020/cs421

What is this class about?

Natural language processing (NLP) is the subfield of artificial intelligence that focuses on automatically understanding and generating natural language (e.g., Arabic, Navajo, Spanish, or English). It is crucial to many everyday applications ...if you've searched for something online or engaged in dialogue with one of your devices today, you've made use of many different NLP technologies already. This class will provide an introduction to the foundations and most popular applications of natural language processing, through a combination of readings, videos, short assignments, and projects. Topics covered will include text preprocessing, part-of-speech tagging, syntactic and dependency parsing, language modeling, word embeddings, text classification, and dialogue systems, among others.

Textbooks

Readings, learning content, and (some) assignments for this class will be drawn from the following source:
- Daniel Jurafsky and James H Martin. Speech and Language Processing (3rd Edition). Draft, 2019.

This textbook is still being written; its current draft can be freely accessed at the link above.

Deliverables

This is a 400-level course, designed for both graduate students and advanced undergraduates. Depending on your classification, you may have enrolled in either the four-hour version (grad students) or the three-hour version (undergrad students). There are slightly different requirements for the two versions of the course, with the biggest difference being that grad students will be required to complete a semester-long research highlight. Undergrads may opt to complete a research highlight as well if they would like, in which case their final course grade will be determined according to the same breakdown as that used for graduate students; however, doing this extra work is certainly not a requirement. Some further details about the work you will be expected to complete for this course are provided below: Grading rubrics for all deliverables will be posted in their descriptions, and solutions for assignments will be posted after grading is complete. You are encouraged to use these solutions to further your understanding of the course material. Final course grades will be determined according to the following breakdowns:

Schedule

Below is a list of course topics, readings, and deadlines, by week. Links to short, prerecorded videos about content pertaining to a week's topic will be posted approximately one week after the in-class premiere. Links to the accompanying slides can be found in the video description. The version of the schedule here is subject to change. All deliverables are due by 12:00 p.m. (noon) CST on the specified due date.

Week Topic Readings Deliverables Video Links
8/24-8/28 Introduction and Dialogue Systems and Chatbots Chapter 26 Properties of Human-Human Conversation

Rule-Based Chatbots

Corpus-Based Chatbots

Frame-Based Dialogue Systems

Dialogue Management
8/31-9/4 Text Preprocessing and Edit Distance Chapter 2 Chatbot: WOZ Study (9/4) Regular Expressions

Finite State Automata

Finite State Transducers

Text Tokenization

Edit Distance
9/7-9/11 Hidden Markov Models and Language Modeling with N-Grams Appendix A and Chapter 3 Assignment 1 (9/11) Hidden Markov Models

Forward Probabilities

Viterbi Algorithm

N-Grams and Maximum Likelihood Estimation

Evaluating Language Models

N-Gram Smoothing Techniques
9/14-9/18 Naive Bayes and Text Classification Chapter 4 Chatbot: Design (9/18) Introduction to Naive Bayes

Training a Naive Bayes Classifier using BOW Features

Naive Bayes as a Language Model

Evaluating Text Classification Models

Multi-Label and Multinomial Classification
9/21-9/25 Logistic Regression and Conditional Random Fields Chapter 5 Assignment 2 (9/25) Basic Logistic Regression Classifier

Cross-Entropy Loss

Gradient Descent

Conditional Random Fields
9/28-10/2 Vector Semantics and Embeddings Chapter 6 Chatbot: Dialogue Manager (10/2) Basic Word Representations

TF-IDF

Cosine Similarity

Word2Vec

Other Word Embedding Types
10/5-10/9 Neural Networks and Neural Language Models Chapter 7 Assignment 3 (10/9) Feedforward Neural Network Basics

Building Blocks for Neural Networks

Activation Functions

Combining Computational Units

Neural Language Models
10/12-10/16 Part-of-Speech Tagging and Constituency Grammars Chapters 8 and 12 Chatbot: Natural Language Generation (10/16) POS Tagsets

Statistical POS Tagging

Context-Free Grammars
10/19-10/23 Rule-Based and Statistical Constituency Parsing Chapters 13 and 14 Assignment 4 (10/23) Top-Down and Bottom-Up Parsing

CKY Algorithm

Earley Parsing

Partial Parsing

Probabilistic CKY Algorithm

Probabilistic Lexicalized CFGs

Probabilistic CCG Parsing
10/26-10/30 Dependency Parsing and Logical Representations of Sentence Meaning Chapters 15 and 16 Assignment 5 (10/30) Dependency Relations

Transition-Based Dependency Parsing

Graph-Based Dependency Parsing

Components of Representational Systems

Model-Theoretic Semantics

First-Order Logic

Description Logics
11/2-11/6 Word Senses and WordNet and Semantic Role Labeling Chapters 19 and 20 Chatbot: Natural Language Understanding (11/6) Overview of WordNet

Word Sense Disambiguation

Overview of FrameNet

Semantic Role Labeling

Semantic Roles

Selectional Restrictions
11/9-11/13 Lexicons for Sentiment, Affect, and Connotation Chapter 21 Assignment 6 (11/13) Creating Sentiment and Affect Lexicons

Supervised Learning of Word Sentiment

Affect Recognition
11/16-11/20 Coreference Resolution Chapter 22 Chabot: Evaluation (11/20) Overview of Coreference Resolution

Coreference Tasks

The Mention-Pair Architecture

Neural Coreference Resolution Models

Winograd Schema Problems
11/23-11/27 Discourse Coherence Chapter 23 Assignment 7 (11/25) Rhetorical Structure Theory

Penn Discourse Treebank

Discourse Parsing

Centering Theory

Entity Grid Model

Global Coherence
11/30-12/4 Wrap-Up and Research Highlight Videos Research Highlights (11/30)
12/7-12/11

Final Notes

This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the course pages on Blackboard and Gradescope, or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them.

Happy studying!