Natalie Parde

Get In Touch

Prefer using email? Say hi at

CS 594: Language and Vision

Spring 2019

Contact Information

Professor: Natalie Parde
Blackboard: CS 594 Language and Vision (33648) 2019 Spring
Office: SEO 1132
Office Hours: Tuesday/Thursday 2:00-3:00 p.m.

What is this class about?

Researchers in artificial intelligence are increasingly applying multimodal solutions to traditional problems, especially within the realm of natural language processing. In particular, synthesizing NLP with computer vision allows intelligent systems to harness both visual and linguistic information to generate content and derive meaning. This seminar course will introduce you to current research in fundamental language + vision problems, and provide you with a scientific background in relevant application areas. By the end of the course, you will have gained exposure to core concepts through a combination of: lectures on fundamental principles of NLP, CV, and deep learning; paper discussions; and a semester-long project in a focus area that you select. Topics covered will include grounded language learning, physically situated dialogue, automated image captioning, automated video description, visual question answering, text-to-image generation, visual story entailment, and language disambiguation via images.

Textbooks and Readings

Recommended reading for the first three weeks (lectures on fundamental principles of NLP, CV, and deep learning) will be from the following sources:
- Dan Jurafsky and James H Martin. Speech and language processing, volume 3. Pearson London, 2014
- Richard Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015

Free versions of these resources can be accessed at the links above. Reading materials for the remainder of the course will be comprised of conference and journal papers, all accessible online. A list of suggested discussion papers is provided on Blackboard. You're also welcome to suggest discussion papers that are not on this list (all suggested papers are subject to my approval). I'll post the list of papers that will be discussed each week once the paper selections have been finalized.


This is a seminar-style class. Instead of having traditional homework and exams, we'll have presentations and projects! Specifically, your course work will be comprised of the following:
  • Paper Critiques: Each "paper discussion" week, you will be required to submit a short (~ one page double-spaced) critique of one of the papers being discussed. The critique should include a brief summary of the paper, highlights of aspects of the paper that are particularly good or should be improved, an analysis of the soundness of the methodology and evaluation, and an explanation of whether or not the conclusions drawn by the authors are justified. I've written an example paper critique and posted it on Blackboard to provide some guidance as to what a good paper critique might look like.
  • Paper Presentations: Each paper discussion will be led by three students: one will provide an overview of the paper, one will present the paper's strengths, and one will (diplomatically!) present the paper's weaknesses. You will be required to fill each role once. I've posted example presentations on Blackboard to provide some guidance as to what each of these components might look like.
  • Project Presentations: You will be required to give three short presentations about your semester-long project: a proposal at the beginning of the semester, a mid-semester project update, and an end-of-semester final presentation. The first two presentations will each be five minutes with an additional minute for questions, and the final presentation will be ten minutes with two additional minutes for questions.
  • Project Write-Up: You will be required to write a conference/journal-style paper about your project, including a literature review, methodology, evaluation, and conclusions. The formatting can vary, depending on which venue you're writing it for can view this as an easy way to have a paper ready to submit by the end of the semester, complete with feedback from a faculty member!
  • Project: A central component of this course is the semester-long project. You'll complete your projects independently (if you and one or more of your classmates are working collaboratively on ongoing research, please clearly define separate sub-projects), and your projects must be relevant to the central theme of the course (language and vision) ...I'd encourage you to run your proposed topic by me before your project proposal, just to make sure you're on the right track. That being said, you'll be afforded considerable flexibility in selecting your project topics. Ideally, if you're working on a thesis or dissertation, you'll be able to incorporate the work resulting from this course into your research.

I've posted my grading rubrics for each of these assignments on Blackboard, in the interest of transparency. Your final course grade will be determined according to the following breakdown:
  • Paper Critiques: 24% (3% per paper critique)
  • Paper Presentations: 26% (10% overview, 8% pros, and 8% cons)
  • Project Presentations: 20% (5% proposal, 5% update, and 10% final presentation)
  • Project Write-Up: 15%
  • Project: 15%


The most recent version of the course schedule is available below. This schedule is subject to change ...check back regularly for updates! I'll post my own lecture slides in the "Downloads" column soon after they are presented in class. If/when students give permission, I will also post their presentation slides for others to download. Note that whether or not you give permission for me to post your slides has zero bearing on your course grade—if you'd like them to be made available, that's great, and if not, that's perfectly fine as well.

Week Topic Deliverables Downloads
1/14-1/18 Introduction and NLP Overview Introduction to CS 594
1/21-1/25 NLP and CV Overview Paper Selection: 1/26 by 11:59 p.m. Introduction to NLP

Introduction to Computer Vision
1/28-2/1 Deep Learning Overview Pros and Cons Selections: 2/2 by 11:59 p.m. Introduction to Deep Learning
2/4-2/8 Project Proposals In-Class Presentations
2/11-2/15 Principles of Grounded Language Learning

Multimodal Machine Learning: A Survey and Taxonomy
Paper Critique: 2/11 by 12:00 p.m. Principles of Grounded Language Learning
2/18-2/22 Game-based Grounded Language Learning

Interactive Language Acquisition with One-Shot Visual Concept Learning through a Conversational Game

Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy"

Grounding Language through Evolutionary Language Games
Paper Critique: 2/18 by 12:00 p.m. Game-based Grounded Language Learning
2/25-3/1 Physically Situated Dialogue

Visual Dialog

Learning to Recognize Novel Objects in One Shot through Human-Robot Interactions in Natural Language Dialogues
Paper Critique: 2/25 by 12:00 p.m. Physically Situated Dialogue
3/4-3/8 Visual Dependency Parsing and Visual Sentiment Analysis

Image Description using Visual Dependency Representations

Cross-Media Learning for Image Sentiment Analysis in the Wild

A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion
Paper Critique: 3/4 by 12:00 p.m. Visual Dependency Parsing and Visual Sentiment Analysis
3/11-3/15 Automated Image Captioning and Image-Text Alignment

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Deep Visual-Semantic Alignments for Generating Image Descriptions
Paper Critique: 3/11 by 12:00 p.m. Automated Image Captioning and Image-Text Alignment
3/18-3/22 Automated Video Description and Visual Story Entailment

Sequence to Sequence - Video to Text

The Amazing Mysteries of the Gutter: Drawing Inferences between Panels in Comic Book Narratives
Paper Critique: 3/18 by 12:00 p.m. Automated Video Description and Visual Story Entailment
3/25-3/29 Spring Break
4/1-4/5 Project Updates In-Class Presentations
4/8-4/12 Text-to-Image Generation and Visual Question Answering

Text to 3D Scene Generation with Rich Lexical Grounding

Visual7w: Grounded Question Answering in Images

From Recognition to Cognition: Visual Commonsense Reasoning
Paper Critique: 4/8 by 12:00 p.m. Text-to-Image Generation and Visual Question Answering
4/15-4/19 Language Disambiguation via Images

Black Holes and White Rabbits: Metaphor Identification with Visual Features

Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search
Paper Critique: 4/15 by 12:00 p.m. Language Disambiguation via Images
4/22-4/26 Project Presentations In-Class Presentations (Some Students)
Final Project: 4/22 by 12:00 p.m.
4/29-5/3 Project Presentations In-Class Presentations (Some Students)
Final Paper: 5/3 by 12:00 p.m.
5/6-5/10 Finals Week (No Class)

Final Notes

This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the official course page on Blackboard, or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignment descriptions, example assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them.

Happy studying!