Natalie Parde

Get In Touch

Prefer using email? Say hi at



CS 521: Statistical Natural Language Processing

Spring 2020

Contact Information

Professor: Natalie Parde (parde@uic.edu)
Office Hours: Tuesday 1:30 - 2:30 p.m. / Thursday 3:00 - 4:00 p.m.
 
Piazza: https://piazza.com/uic/spring2020/cs521

What is this class about?

Natural language processing (NLP) is increasingly driven by statistical and neural methods. These techniques are leveraged in many everyday applications, including intelligent virtual assistants (e.g., Siri or Alexa), machine translation systems (e.g., Google Translate), and language models (such as those used for predictive text). This class will introduce advanced topics in statistical and neural NLP, and provide an overview of active research in those topic areas, through a combination of readings, paper presentations and critiques, a midterm exam, and a semester-long project. Topics covered will include logistic regression, vector semantics, neural networks, coreference resolution, discourse coherence, language modeling, language generation, figurative language processing, and multimodal NLP, among others.

Textbooks

Readings for the first part of the semester will be drawn from the following source:
- Daniel Jurafsky and James H Martin. Speech and Language Processing (3rd Edition). Draft, 2019.

Note that this resource is a draft of the upcoming third edition of Speech and Language Processing. Chapter order and content is subject to change throughout the semester (I'll try to update the course website whenever I notice this occurring, but feel free to ping me if you think something seems out of date). Readings for the second part of the semester will be drawn primarily from journals and conference proceedings. Some suggested papers for each topic in the second part of the semester are provided in the course syllabus. You are perfectly welcome to present a paper that is not in the list of suggestions, as long as I approve it.

Assignments and Exams

This course acts as a middle ground between UIC's introductory, lecture-based NLP course (CS 421) and the advanced, seminar-style NLP courses (CS 594) periodically offered by the department. Most of the coursework will be closer to what you would expect to find in a seminar-style course, but the general format of the class will contain elements of both. For the first portion of the semester, traditional lectures will be given about different statistical and neural techniques; this portion will conclude with a midterm exam. For the second portion of the semester, students will present and critique research from recent NLP papers. Some further details about the work you will be expected to complete for this course are provided below:
  • Paper Critiques: Each "paper discussion" week, you will be required to submit a short (~ one page double-spaced) critique of one of the papers being discussed. You will also do this twice in the first part of the semester, critiquing any paper of your choice that is relevant to one of the lecture topics from the previous three weeks. The critique should include a brief summary of the paper, highlights of aspects of the paper that are particularly good or should be improved, an analysis of the soundness of the methodology and evaluation, and an explanation of whether or not the conclusions drawn by the authors are justified. I've posted several example paper critiques on Blackboard to provide some guidance as to what a good paper critique might look like.
  • Exam: There will be one exam, at the conclusion of the first part of the semester. The exam will contain a mixture of multiple choice, true/false, and free-response questions.
  • Paper Presentation: You will be required to present an overview and critical analysis of one paper in the second part of the semester. The presentation should summarize the work, present the paper's strengths, and (diplomatically!) present its weaknesses. I've posted an example presentation slide deck on Blackboard to provide some guidance as to what this might look like.
  • Project: A central component of this course is the semester-long project. You may complete your project individually or in pairs (if completing the project in a pair, the workload needs to scale accordingly). You'll be afforded considerable flexibility in selecting your project topics---ideally, if you're working on a thesis or dissertation, you'll be able to incorporate the work resulting from this course into your research. A short (two-page) project proposal will be due near the end of the first part of the semester, detailing your plans and defining your research objectives.
  • Project Write-Up: You will be required to write a conference/journal-style paper about your project, including a literature review, methodology, evaluation, and conclusions. The formatting can vary, depending on which venue you're writing it for ...you can view this as an easy way to have a paper ready to submit by the end of the semester, complete with feedback from a faculty member!

Grading rubrics will be posted on Blackboard. Final course grades will be determined according to the following breakdown:
  • Project Proposal: 7%
  • Exam: 20%
  • Paper Critiques: 28% (4% per paper critique)
  • Paper Presentation: 10%
  • Project Presentation: 10%
  • Project Implementation: 10%
  • Project Write-Up: 15%

Schedule

The most recent version of the course schedule is available below. This schedule is subject to change ...check back regularly for updates! I'll post my own lecture slides in the "Downloads" column soon after they are presented in class. All deadlines are at 12 p.m. (noon) unless otherwise stated.

Important Notice (3/11/20): This class now meets online.



Week Topic Readings Deliverables Downloads
1/13 - 1/17 Introduction and Language Modeling Chapter 3 Introduction to CS 521

Language Modeling
1/20 - 1/24 Data Collection and Logistic Regression Chapter 5 Data Collection

Logistic Regression
1/27 - 1/31 Vector Semantics, Word2Vec, and GloVe Chapter 6 1/31: Paper Critique (Topic from 1/13 - 1/31) Vector Semantics

Word2Vec and GloVe
2/3 - 2/7 Feedforward Neural Networks and Convolutional Neural Networks Chapter 7 Feedforward Neural Networks

Backpropagation and Convolutional Neural Networks
2/10 - 2/14 Recurrent Neural Networks, Encoder-Decoder Models, and Attention Chapter 9, Chapter 10 2/14: Project Proposal Recurrent Neural Networks

LSTMs, GRUs, Encoder-Decoder Models, and Attention
2/17 - 2/21 Coreference Resolution and Discourse Coherence Chapter 22, Chapter 23 2/21: Paper Critique (Topic from 2/3 - 2/21) Coreference Resolution

Discourse Coherence
2/24 - 2/28 Review and Exam (2/27) Exam Review
3/2 - 3/6 Neural Language Modeling A neural probabilistic language model

Language models are unsupervised multitask learners
3/2: Paper Critique
3/9 - 3/13 Contextual Word Embeddings Deep contextualized word representations

BERT: Pre-training of deep bidirectional transformers for language understanding

Universal language model fine-tuning for text classification
3/9: Paper Critique
3/16 - 3/20 Spring Break 3/16: Paper Critique (No Late Penalty Until 3/30)
3/23 - 3/27 Spring Break
3/30 - 4/3 Natural Language Generation and Knowledge-based NLP Deep reinforcement learning for dialogue generation

The curious case of neural text degeneration

Learning neural templates for text generation

Non-monotonic sequential text generation

K-BERT: Enabling Language Representation with Knowledge Graph
3/30: Paper Critique
4/6 - 4/10 Multimodal NLP and Sarcasm Detection MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

Are you looking? Grounding to multiple modalities in vision-and-language navigation

Learning multi-modal word representation grounded in visual context

Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal
4/6: Paper Critique
4/13 - 4/17 Metaphor Interpretation and Healthcare Applications End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories

Automatic prediction of linguistic decline in writings of subjects with degenerative dementia

Detecting influenza outbreaks by analyzing Twitter messages

Large-scale analysis of counseling conversations: An application of natural language processing to mental health
4/13: Paper Critique
4/20 - 4/24 Project Presentations
4/27 - 5/1 Project Presentations 5/1: Project Implementation and Write-Up
5/4 - 5/8 Finals Week (No Class)


Final Notes

This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the course page on Blackboard or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them.

Happy studying!