Natalie Parde

Get In Touch

Prefer using email? Say hi at



CS 421: Natural Language Processing

Fall 2019

Contact Information

Professor: Natalie Parde (parde@uic.edu)
Office Hours: SEO 1132, Tuesday 1:30 - 2:30 p.m. / Thursday 3:00 - 4:00 p.m.
 
Teaching Assistant: Usman Shahid (hshahi6@uic.edu)
Office Hours: SELW 4029, Wednesday 12:00 - 2:00 p.m.
 
Piazza: https://piazza.com/uic/fall2019/cs421

What is this class about?

Natural language processing (NLP) is the subfield of artificial intelligence that focuses on automatically understanding and generating natural language (e.g., Arabic, Navajo, Spanish, or English). It is crucial to many everyday applications ...if you've searched for something online or engaged in dialogue with one of your devices today, you've made use of many different NLP technologies already. This class will provide an introduction to the foundations and most popular applications of natural language processing, through a combination of readings, short assignments, exams, and (for grad students and optionally undergrads) a semester-long project. Topics covered will include text preprocessing, part-of-speech tagging, syntactic and dependency parsing, language modeling, word embeddings, statistical and neural models, dialogue systems, question answering, and machine translation, among others.

Textbooks

Readings and (some) assignments for this class will be drawn from the following sources:
- Daniel Jurafsky and James H Martin. Speech and Language Processing (2nd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2009.
- Daniel Jurafsky and James H Martin. Speech and Language Processing (3rd Edition). Draft, 2019.

The third edition of the book is still being written; its current draft can be freely accessed at the link above. The second edition is available for purchase from the UIC bookstore and other retailers, and can generally be found at affordable prices since it's been out for awhile. You'll ultimately be responsible for content from those official sources, so if you purchase a different version (e.g., an international edition of the second version), make sure to check for any misaligned material.

Assignments and Exams

This is a 400-level course, designed for both graduate students and advanced undergraduates. Depending on your classification, you may have enrolled in either the four-hour version (grad students) or the three-hour version (undergrad students). There are slightly different requirements for the two versions of the course, with the biggest difference being that grad students will be required to complete a semester-long project. Undergrads may opt to complete a project as well if they would like, in which case their final course grade will be determined according to the same breakdown as that used for graduate students; however, doing this extra work is certainly not a requirement. Some further details about the work you will be expected to complete for this course are provided below:
  • Assignments: Five short assignments will be due over the course of the semester (current due dates are indicated on the course calendar). These assignments will contain a mix of theoretical and coding questions. Code should be written in Python.
  • Project: Graduate students (and any undergraduates who choose to do so) will complete a semester-long project, due the week before finals week. The project will be selected from one of the following topics: (1) build your own chatbot, (2) build your own essay grader, or (3) custom project (requires pre-approval). Deliverables will include a (well-documented!) working implementation, a ~2500-word report, and a short presentation the week before finals week. Projects can be completed individually or in pairs; if done in pairs, the submission must be accompanied by a statement detailing which component(s) each student worked on, signed by both students.
  • Exams: There will be three exams over the course of the semester: two (non-cumulative) midterms, and one (cumulative) final exam. The exams will contain a mixture of multiple choice, true/false, and free-response questions.

Grading rubrics for assignments, exams, and the project will be posted on Gradescope. Once grading is complete for a given assignment or exam, the solution will also be posted. You are encouraged to use these solutions to further your understanding of the course material and to prepare for future exams. Final course grades will be determined according to the following breakdowns:
  • Undergraduate Students:
    • Exams: 50% (15% for each midterm, and 20% for the final exam)
    • Assignments: 50% (10% for each assignment)
  • Graduate Students:
    • Exams: 40% (12% for each midterm, and 16% for the final exam)
    • Assignments: 40% (8% for each assignment)
    • Project: 20% (6% for the implementation, 6% for the presentation, and 8% for the report)

Schedule

The most recent version of the course schedule is available below. This schedule is subject to change ...check back regularly for updates! I'll post my own lecture slides in the "Downloads" column soon after they are presented in class. For the readings, (v2) corresponds to the second edition of the Jurafsky and Martin text, and (v3) corresponds to the third edition draft.


Week Topic Readings Deliverables Downloads
8/26-8/30 Introduction, Text Preprocessing, and Edit Distance (v2) Chapter 1 (all), Chapter 2 (2.1), Chapter 3 (3.8-3.11) Introduction to CS 421

Text Preprocessing and Edit Distance
9/2-9/6 Automata, Transducers, and Hidden Markov Models (v2) Chapter 2 (2.2), Chapter 3 (3.1-3.7), Chapter 6 (6.1-6.5) Assignment 1: 9/6 by 12 p.m. (noon) Automata, Transducers, and Hidden Markov Models
9/9-9/13 Part-of-Speech Tagging and Formal Grammars (v2) Chapter 5 (all), Chapter 12 (all) Part-of-Speech Tagging and Formal Grammars
9/16-9/20 Syntactic and Dependency Parsing (v2) Chapter 13 (all), (v3) Chapter 14 (all) Assignment 2: 9/20 by 12 p.m. (noon) Syntactic and Dependency Parsing
9/23-9/27 First-Order Logic and Review/Catch-Up (v2) Chapter 17 (all) First-Order Logic

Midterm 1 Review
9/30-10/4 Exam 1 (10/1) and N-Gram Language Modeling (v2) Chapter 4 (4.1-4.7) Language Models & N-grams
10/7-10/11 Word Embeddings (v3) Chapter 6 (all) Assignment 3: 10/11 by 12 p.m. (noon) Word Embeddings
10/14-10/18 Naïve Bayes, Text Classification, and Evaluation Metrics (v3) Chapter 4 (all) Naïve Bayes, Text Classification, and Evaluation Metrics
10/21-10/25 Neural Networks and Neural Language Models (v3) Chapter 7 (all) Assignment 4: 10/25 by 12 p.m. (noon) Neural Networks and Neural Language Models
10/28-11/1 Sequence Processing with Recurrent Networks and Review/Catch-Up (v3) Chapter 9 (all) Sequence Processing with Recurrent Networks

Midterm 2 Review
11/4-11/8 Exam 2 (11/5) and Information Extraction (v2) Chapter 22 (all) Information Extraction
11/11-11/15 Dialogue Systems and Chatbots (v3) Chapter 26 (all), (v2) Chapter 24 (24.2) Dialogue Systems and Chatbots
11/18-11/22 Question Answering and Summarization (v2) Chapter 23 (23.3-23.7), (v3) Chapter 25 (all) Assignment 5: 11/22 by 12 p.m. (noon) Question Answering and Summarization
11/25-11/29 Machine Translation (v2) Chapter 25 (25.1-25.9) Machine Translation
12/2-12/6 Project Presentations and Review Code/Paper: 12/2 by 12 p.m. (noon) Final Exam Review
12/9-12/13 Exam 3 (12/11, 10:30 a.m. - 12:30 p.m.)


Final Notes

This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the course pages on Blackboard and Gradescope, or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them.

Happy studying!