Natalie Parde

Get In Touch

Prefer using email? Say hi at



CS 521: Statistical Natural Language Processing

Spring 2021

Contact Information

Professor: Natalie Parde (parde@uic.edu)
Office Hours: Tuesday 9:30 - 10:30 a.m. / Thursday 3:00 - 4:00 p.m.
 
Piazza: piazza.com/uic/spring2021/2021springcs52143232

What is this class about?

Natural language processing (NLP) is increasingly driven by statistical and neural methods. These techniques are leveraged in many everyday applications, including intelligent virtual assistants (e.g., Siri or Alexa), machine translation systems (e.g., Google Translate), and language models (such as those used for predictive text). This class will introduce advanced topics in statistical and neural NLP, and provide an overview of active research in those topic areas, through a combination of readings, paper presentations and critiques, and a semester-long project. Topics covered will include data collection, common neural architectures for NLP, machine translation, question answering, automated speech recognition, natural language generation, bias in NLP, and NLP applications, among others.

Textbooks

Readings for the first part of the semester will be drawn from the following source:
- Daniel Jurafsky and James H Martin. Speech and Language Processing (3rd Edition). Draft, 2020.

Note that this resource is a draft of the upcoming third edition of Speech and Language Processing. Chapter order and content is subject to change throughout the semester (I'll try to update the course website whenever I notice this occurring, but feel free to ping me if you think something seems out of date). Readings for the second part of the semester will be drawn primarily from journals and conference proceedings. Some suggested papers for each topic in the second part of the semester are provided in the course syllabus. You are welcome to present a paper that is not in the list of suggestions, as long as I approve it.

Assignments

This course acts as a middle ground between UIC's introductory, lecture-based NLP course (CS 421) and the advanced, seminar-style NLP courses (CS 532) offered by the department. The coursework will be closer to what you would expect to find in a seminar-style course, but the general format of the class will contain elements of both. For the first portion of the semester, lectures will be given about different statistical and neural techniques that are common in natural language processing, both at a fundamental level (e.g., deep learning architectures) and for specific applications (e.g., question answering). For the second portion of the semester, students will present and critique research from recent NLP papers. Some further details about the work you will be expected to complete for this course are provided below:
  • Paper Critiques: Most "research" weeks, you will submit a short (~ one page) critique of one of the papers being presented and discussed. You will also do this twice in the first part of the semester, critiquing any paper of your choice that is relevant to one of the lecture topics covered thus far. The critique should include a brief summary of the paper, highlights of aspects of the paper that are particularly good or should be improved, an analysis of the soundness of the methodology and evaluation, and an explanation of whether or not the conclusions drawn by the authors are justified. I've posted several example paper critiques on Blackboard to provide some guidance as to what a good paper critique might look like. You will need to submit a total of seven paper critiques (two in the first part of the semester, and five in the second part of the semester).
  • Paper Presentation Video: You will create a video overview and critical analysis of one paper in the second part of the semester. The video should summarize the work, present the paper's strengths, and (diplomatically!) present its weaknesses. I've posted an example presentation slide deck on Blackboard to provide some guidance as to what this might look like; however, feel free to creatively incorporate the required material however you prefer. Paper presentation videos should be 15 minutes long.
  • Paper Discussion: You will contribute to intriguing discussions of the papers presented during research weeks, either synchronously (during group discussion-style office hours) or asynchronously (by posting a stimulating question or comment on the paper's dedicated Piazza post). You will need to contribute to synchronous or asynchronous discussion for at least five papers (spread across at least five separate "research" weeks). Feel free to contribute more beyond that when papers interest you!
  • Project: A central component of this course is the semester-long project. You may complete your project individually or in pairs (if completing the project in a pair, the workload should scale accordingly). You'll be afforded considerable flexibility in selecting your project topics—ideally, if you're working on a thesis or dissertation, you'll be able to incorporate the work resulting from this course into your research. The project will comprise four different deliverables:
    • Proposal Video: You will create a short (5 minutes) video detailing your plans and defining your research objectives, due at the end of the first part of the semester.
    • Project Source: You will submit the source code (either directly or through a link to the repository) used for your project, along with well-documented instructions for replicating the work.
    • Project Video: You will create a 10-15 minute video presenting your methodology, evaluation, and key findings to the class. The format of this video should be similar to what you would see in a conference or workshop presentation. Feel free to browse the ACL Anthology for this year's ACL presentation videos as inspiration.
    • Project Report: You will write a conference/journal-style paper about your project, including a literature review, methodology, evaluation, and conclusions. The formatting can vary, depending on which venue you're writing it for ...you can view this as an easy way to have a paper ready to submit by the end of the semester, complete with feedback from a faculty member! Project reports should range from 2500-4500 words, not including references.

Grading rubrics will be posted on Blackboard. Final course grades will be determined according to the following breakdown:
  • Research Discussion:
    • Paper Presentation Video: 15%
    • Paper Discussion: 10% (2% per week)
  • Paper Critiques: 35% (5% per paper critique; two in lecture weeks and five in research weeks)
  • Project:
    • Proposal Video: 10%
    • Project Source: 5%
    • Project Video: 10%
    • Project Report: 15%

Schedule

The most recent version of the course schedule is available below. This schedule is subject to change ...check back regularly for updates! All deadlines are at 12 p.m. (noon) unless otherwise stated.


Week Topic Readings Deliverables Video Links
1/11 - 1/15 Introduction and Data Collection
1/18 - 1/22 Feedforward and Convolutional Neural Networks Chapter 7
1/25 - 1/29 Deep Learning Architectures for Sequence Processing Chapter 9 1/25: Paper Critique (Topic from 1/11 - 1/29)
2/1 - 2/5 Machine Translation Chapter 11 2/1: Paper Selection Deadline
2/8 - 2/12 Question Answering Chapter 23 2/8: Paper Critique (Topic from 1/11 - 2/12)
2/15 - 2/19 Automatic Speech Recognition and Text-to-Speech Chapter 26
2/22 - 2/26 Project Proposals 2/22: Project Proposal
3/1 - 3/5 Contextual Word Embeddings Deep Contextualized Word Representations

Universal Language Model Fine-Tuning for Text Classification

BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

ELECTRA: Pre-Training Text Encoders as Discriminators Rather Than Generators
3/1: Paper Critique
3/8 - 3/12 Sustainable NLP Energy and Policy Considerations for Deep Learning in NLP

Quantifying the Carbon Emissions of Machine Learning

Green AI

DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
3/8: Paper Critique
3/15 - 3/19 Low-Resource Languages The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

Attention-Informed Mixed-Language Training for Zero-Shot Cross-Lingual Task-Oriented Dialogue Systems

Unsupervised Cross-Lingual Representation Learning at Scale

Simulated Multiple Reference Training Improves Low-Resource Machine Translation
3/15: Paper Critique
3/22 - 3/26 Spring Break
3/29 - 4/2 Natural Language Generation Learning Neural Templates for Text Generation

Pun Generation with Surprise

Non-Monotonic Sequential Text Generation

Synthetic QA Corpora Generation with Roundtrip Consistency

Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
3/29: Paper Critique
4/5 - 4/9 Multimodal NLP The Neurosymbolic Concept Learner: Interpreting Scenes, Words, and Sentences from Natural Supervision

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

Visually Grounded Neural Syntax Acquisition

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

Grounded Language Learning Fast and Slow
4/5: Paper Critique
4/12 - 4/16 Bias in NLP Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

Perturbation Sensitivity Analysis to Detect Unintended Model Biases

Social Biases in NLP Models as Barriers for Persons with Disabilities

Language (Technology) is Power: A Critical Survey of "Bias" in NLP
4/12: Paper Critique
4/19 - 4/23 NLP Applications Detecting Cognitive Impairments by Agreeing on Interpretations of Linguistic Features

Finding Your Voice: The Linguistic Development of Mental Health Counselors

Decoding Brain Activity Associated with Literal and Metaphoric Sentence Comprehension Using Distributional Semantic Models

Improving Segmentation for Technical Support Problems

Multi-Label and Multilingual News Framing Analysis
4/19: Paper Critique
4/26 - 4/30 Project Videos 4/26: Project Video and Source
5/3 - 5/7 Finals Week (No Class) 5/3: Project Report


Final Notes

This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the course page on Blackboard or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them.

Happy studying!