CS 421: Natural Language Processing
What is this class about?
Natural language processing (NLP) is the subfield of artificial intelligence that focuses on automatically understanding and generating natural language (e.g., Arabic, Navajo, Spanish, or English). It is crucial to many everyday applications ...if you've searched for something online or engaged in dialogue with one of your devices today, you've made use of many different NLP technologies already. This class will provide an introduction to the foundations and most popular applications of natural language processing, through a combination of readings, videos, short assignments, and projects. Topics covered will include text preprocessing, part-of-speech tagging, syntactic and dependency parsing, language modeling, word embeddings, text classification, and dialogue systems, among others.
Readings, learning content, and (some) assignments for this class will be drawn from the following source:
- Daniel Jurafsky and James H Martin. Speech and Language Processing (3rd Edition). Draft, 2019.
This textbook is still being written; its current draft can be freely accessed at the link above.
This is a 400-level course, designed for both graduate students and advanced undergraduates. Depending on your classification, you may have enrolled in either the four-hour version (grad students) or the three-hour version (undergrad students). There are slightly different requirements for the two versions of the course, with the biggest difference being that grad students will be required to complete a semester-long research highlight. Undergrads may opt to complete a research highlight as well if they would like, in which case their final course grade will be determined according to the same breakdown as that used for graduate students; however, doing this extra work is certainly not a requirement. Some further details about the work you will be expected to complete for this course are provided below:
- Assignments: Seven short assignments will be due over the course of the semester (due dates are indicated on the course calendar). These assignments will contain a mix of theoretical and coding questions. Code should be written in Python.
- Chatbot Project: All students will build their own chatbot over the course of the semester. This project will be divided into six short deliverables (due dates are indicated on the course calendar). Code, when applicable, should be written in Python.
- Research Highlight: Graduate students (and any undergraduates who choose to do so) will complete a semester-long research highlight, due the week before finals week. The research highlight can be either (1) a written literature review and video overview, or (2) a custom project and video overview. Research highlights can be completed individually or in pairs; if done in pairs, the submission must be accompanied by a statement detailing which component(s) each student worked on, signed by both students.
Grading rubrics for all deliverables will be posted in their descriptions, and solutions for assignments will be posted after grading is complete. You are encouraged to use these solutions to further your understanding of the course material. Final course grades will be determined according to the following breakdowns:
- Undergraduate Students:
- Chatbot Project: 30% (5% for each deliverable)
- Assignments: 70% (10% for each assignment)
- Graduate Students:
- Chatbot Project: 15% (2.5% for each deliverable)
- Assignments: 70% (10% for each assignment)
- Research Highlight: 15% (7.5% for the literature review or custom project, and 7.5% for the video overview)
Below is a list of course topics, readings, and deadlines, by week. Links to short, prerecorded videos about content pertaining to a week's topic will be posted approximately one week after the in-class premiere. The version of the schedule here is subject to change. All deliverables are due by 12:00 p.m. (noon) CST on the specified due date.
||Introduction and Dialogue Systems and Chatbots
Properties of Human-Human Conversation
Frame-Based Dialogue Systems
||Text Preprocessing and Edit Distance
||Chatbot: WOZ Study (9/4)
Finite State Automata
Finite State Transducers
||Hidden Markov Models and Language Modeling with N-Grams
||Appendix A and Chapter 3
||Assignment 1 (9/11)
||Naive Bayes and Text Classification
||Chatbot: Design (9/18)
||Logistic Regression and Conditional Random Fields
||Assignment 2 (9/25)
||Vector Semantics and Embeddings
||Chatbot: Dialogue Manager (10/2)
||Neural Networks and Neural Language Models
||Assignment 3 (10/9)
||Part-of-Speech Tagging and Constituency Grammars
||Chapters 8 and 12
||Chatbot: Natural Language Generation (10/16)
||Rule-Based and Statistical Constituency Parsing
||Chapters 13 and 14
||Assignment 4 (10/23)
||Dependency Parsing and Logical Representations of Sentence Meaning
||Chapters 15 and 16
||Assignment 5 (10/30)
||Word Senses and WordNet and Semantic Role Labeling
||Chapters 19 and 20
||Chatbot: Natural Language Understanding (11/6)
||Lexicons for Sentiment, Affect, and Connotation
||Assignment 6 (11/13)
||Chabot: Evaluation (11/20)
||Assignment 7 (11/25)
||Wrap-Up and Research Highlight Videos
||Research Highlights (11/30)
This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the course pages on Blackboard and Gradescope, or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them.