CS 421: Natural Language Processing
Fall 2025
Contact Information
What is this class about?
Natural language processing (NLP) is the subfield of artificial intelligence that focuses on automatically understanding and generating human language. It is pervasive in modern technology; popular examples include large language models and chatbots. This class will provide an introduction to and historical overview of natural language processing foundations and tasks, through readings, lectures, short assignments, and projects. Topics covered will include text preprocessing, part-of-speech tagging, language modeling, language representations, text classification, and dialogue systems, among others.
Textbooks
Readings, learning content, and (some) assignments for this class will be drawn from the following source:
-
Daniel Jurafsky and James H Martin. Speech and Language Processing (3rd Edition). Draft, 2025.
This textbook is still being written; its current draft can be freely accessed at the link above.
Deliverables
This is a 400-level course, designed for both graduate students and advanced undergraduates. Depending on your classification, you may have enrolled in either the four-hour version (grad students) or the three-hour version (undergrad students). There are slightly different requirements for the two versions of the course, with the biggest difference being that students in the four-hour version will be required to research an advanced NLP topic over the course of the semester and teach it during the last week of class. Undergrads may opt to complete this component as well if they would like, in which case their final course grade will be determined according to the same breakdown as that used for graduate students; however, doing this extra work is certainly not a requirement. Some further details about the work you will be expected to complete for this course are provided below:
- Python Bootcamp (Assignment 0): One introductory coding "bootcamp" assignment will be due before submitting the first standard deliverable, to ensure necessary Python proficiency.
- Assignments: Four assignments will be due over the course of the semester (due dates are indicated on the course calendar). These assignments will contain a mix of theoretical and coding questions. Code should be written in Python.
- Project: All students will complete a semester-long project, divided into two deliverables (due dates are indicated on the course calendar). Code, when applicable, should be written in Python.
- Practical Tutorial: Students will work collaboratively on a "production team" to develop a practical tutorial corresponding to one week's topic. Topics are predefined on Blackboard, and students can sign up for a week of their choice on a first come, first served basis. Tutorials will be submitted as 20-minute videos, to be shown as part of the Thursday lecture for the specified week.
- New Topic Instruction: Students enrolled in the four-hour version of the course (graduate students and some BS/MS students) will work individually or in groups of up to four students to teach a new NLP topic that is not otherwise covered in the CS 421 syllabus. This two-part course component will require both a topic proposal in mid-semester and the instruction itself (delivered as either a video or live presentation, depending on the student/group's choice) at the end of the semester. Exact instruction length will be determined after topic proposals are received, based on the total number of topics covered (instruction will be allocated the same length of time regardless of group size).
Grading rubrics will be posted in the deliverables' descriptions. Final course grades will be determined according to the following breakdowns:
- Undergraduate Students (3-Hour Version):
- Python Bootcamp (Assignment 0): 4%
- Project: 30% (15% for each deliverable)
- Assignments: 48% (12% for each assignment)
- Practical Tutorial: 18%
- Graduate Students (4-Hour Version):
- Python Bootcamp (Assignment 0): 2%
- Project: 20% (10% for each deliverable)
- Assignments: 40% (10% for each assignment)
- Practical Tutorial: 18%
- New Topic Instruction: 20% (5% proposal; 15% final deliverable)
Schedule
Below is a list of course topics, readings, deadlines, and slides by week. The version of the schedule here is subject to change. All deliverables are due by 12:00 p.m. (noon) CST on the specified due date. Deliverables listed in
purple are for graduate students only; undergraduates enrolled in the three-hour version of CS 421 do not need to submit these.
Week |
Topic |
Readings |
Deliverables |
Slides |
8/26-8/28 |
Introduction and Dialogue Systems and Chatbots |
Chapter 15 |
— |
Download |
9/2-9/4 |
Text Preprocessing and Edit Distance |
Chapter 2 |
— |
Download |
9/9-9/11 |
N-Gram Language Models and Hidden Markov Models |
Chapter 3 and Appendix A |
Assignment 0 (9/12)
Assignment 1 (9/12)
|
— |
9/16-9/18 |
Text Classification |
Chapters 4 and 5 |
— |
— |
9/23-9/25 |
Vector Semantics |
Chapter 6 |
Assignment 2 (9/26) |
— |
9/30-10/2 |
Deep Learning for NLP |
Chapters 7 and 9 (just skim!) |
— |
— |
10/7-10/9 |
Generative AI and Practical Guidelines for Data-Driven NLP |
Chapters 10–12 (just skim!) |
Project Part 1 (10/10) |
— |
10/14-10/16 |
Syntactic Parsing |
Chapters 8, 17, and 18 |
New Topic Proposal (10/17) |
— |
10/21-10/23 |
Semantic Parsing |
Chapters 19 and 21 |
Assignment 3 (10/24) |
— |
10/28-10/30 |
Temporality and Affect |
Chapters 20 and 22 |
— |
— |
11/4-11/6 |
Word Sense Disambiguation and Coreference Resolution |
Appendix G and Chapter 23 |
— |
— |
11/11-11/13 |
Discourse Coherence and Question Answering |
Chapters 24 and 14 |
Assignment 4 (11/14) |
— |
11/18-11/20 |
Automated Speech Recognition and Text-to-Speech Synthesis |
Chapter 16 |
— |
— |
11/25 |
Co-Working Day |
— |
Project Part 2 (11/26) |
— |
12/2-12/4 |
New Topic Instruction |
— |
Videos (12/1) or Presentations (In Class) |
— |
12/9-12/11 |
— |
— |
— |
— |
Final Notes
This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the course pages on Blackboard and Gradescope, or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them. If you use these materials for your own work, please cite the publication
here.
Happy studying!