"For now, what is important is not finding the answer, but looking for it." - Douglas R. Hofstadter
My primary research interests lie in several areas:
Creative (primarily figurative) language
Grounded language learning
Furthermore, I am interested in deploying research in those areas to facilitate cognitive and psychological well-being. Below, I provide a brief overview of the projects in which I have been involved to date; for additional details, please refer to the relevant papers listed (if applicable) or send me an email. My most recent work has been conducted in UIC's Natural Language Processing Laboratory, which I co-direct. Most of my earlier research was conducted in UNT's Human Intelligence and Language Technologies Laboratory.
Visual storytelling is the generation of a cohesive, sequential set of descriptions across multiple images. It differs from image captioning in that the text generated is subjective (and in many cases metaphoric), and it hinges on greater context (the order of the images presented). We have performed a thorough error analysis of current visual storytelling models by specifying common types of errors and providing examples for the same. Based on these errors, we made recommendations regarding how to improve the current models, and are now working on developing our own improved model. More details about our error analysis can be found below:
We developed an automated approach for dementia detection that learns to predict individuals' Alzheimer's disease or related dementia (ADRD) status based on their linguistic patterns in transcribed speech samples. These speech samples were collected during a short, conversational, picture description task. We utilized a deep CNN-LSTM architecture designed to leverage both implicitly learned and targeted linguistic features, achieving high performance (F1 > 0.9) when classifying individuals into ADRD and control groups. More details about this currently ongoing work can be found here:
For my dissertation work, I developed a human-robot book discussion system that focuses its discussions on particularly novel or creative metaphors in the books being discussed. A central component of this project was in developing an accurate metaphor novelty scoring model---essentially, the system needed to avoid asking questions about conventional metaphors (e.g., spending an hour on homework) and instead only ask questions about those that the reader was unlikely to have encountered on a regular basis (e.g., frowning like a thunderstorm). However, work on computational metaphor processing prior to this had confined itself to the problem of metaphor detection (that is, determining whether or not a fragment of text is a metaphor) rather than extending to the problem of determining how novel that metaphor might be.
I built a deep neural network to predict metaphor novelty for new word pairs along a continuous scale. I investigated a wide array of features for the task, in order to determine which data characteristics commonly used for detecting metaphors transferred well to this new scoring problem, as well as which types of features were particularly suitable for this task. I also compared my scoring model with the performance of a high-performing metaphor detection approach, modified such that it produced continuous labels rather than discrete 1s and 0s, to provide supportive evidence that scoring metaphor novelty is a distinct task from simple metaphor detection (as opposed to merely the same task solved with a regression model).
I found that my approach outperformed the standard metaphor detection model by more than 60%. I also found that a combination of syntactic (POS tags, syntactic relation type, and word distance) and semantic (word embeddings) features proved most beneficial for this task, while some features known to perform well in metaphor detection (concreteness, imageability, and sentiment) were not particularly useful for scoring metaphor novelty. Somewhat surprisingly, features based on learned topic models also fell into this category of less-useful features; this suggests that the same conceptual mappings are generating both conventional and novel metaphors, with these mappings merely being linguistically instantiated in different ways. Many more details about my work on automatically scoring metaphor novelty can be found here:
To evaluate my work on automatically scoring metaphor novelty, I built a large, publicly available dataset of syntactically-related word pairs labeled for metaphor novelty on a continuous scale. The word pairs were extracted from running text originating in the VU Amsterdam Metaphor Corpus, the most widely used metaphor detection dataset to date. The VUAMC is comprised of text fragments from news articles, academic publications, fiction narratives, and transcribed conversations; within the text fragments, individual words are labeled as metaphors. I extracted 18,439 syntactically-related pairs of nouns, verbs, adjectives, and adverbs in which at least one of the two words was originally tagged as a metaphor in the VUAMC, and collected annotations for the word pairs from Amazon Mechanical Turk workers on a scale from 0-3, with 3 meaning the word pair formed a highly novel metaphor. This dataset will be described in our upcoming LREC paper:
The dataset can be downloaded here. I am currently in the process of developing a complementary corpus of syntactically-related word pairs extracted from Project Gutenberg books, also labeled for metaphor novelty; when finished, this dataset will be publicly available as well.
Automatically Aggregating Crowdsourced Labels
I collected multiple annotations for each word pair via Amazon Mechanical Turk when building my metaphor novelty dataset, so to determine the best "true" score for each instance, I had to decide how to best aggregate its multiple crowdsourced annotations. Particularly since determining the novelty of a given metaphor is a difficult, somewhat subjective task for humans, I wanted to avoid using standard label aggregation techniques like taking the average or majority annotation (both of these aggregation strategies could be skewed by confused or malicious workers).
Instead, I built a supervised regression model to predict "gold standard" label aggregations based on features extracted from the crowdsourced annotations themselves. These features included information primarily based on different aspects of label distribution and worker correlation. When I evaluated this new aggregation strategy against other common label aggregation techniques, on both my dataset and on third-party crowdsourcing datasets, I found that my method predicted aggregations closer to gold standard values than did other methods. My label aggregation dataset and my source code for this approach are available here, and many more details about the approach can be found in the paper here:
In addition to my dissertation project, a major project that I worked on throughout the course of my Ph.D. was "I Spy," which focuses on enabling robots to automatically ground language using local visual information captured during game-based interactions. My research advisor (Rodney D. Nielsen) and I originally conceptualized this project as part of a month-long summer school I attended in Athens, Greece, at the National Center for Scientific Research; I developed the original source code there in collaboration with researchers from N.C.S.R. as well as with visiting researchers from the University of Texas at Arlington. The project won both First Prize and People's Choice Award at the summer school, competing against projects developed by other teams of researchers from around the world.
The basic format of the project is such that a robot is first placed in front of an everyday object, and it captures images of the object from different angles and distances using its built-in cameras. A user then provides a natural-language description of the object. The robot parses the description and identifies keywords, or concepts, for which it then proceeds to build models grounded in visual features extracted from the images it captured of the object. So, for example, if a robot has learned about an apple and a mug and both were described as being red, the robot would at that time have images of those two objects represented in its mental model of the word "red." As it learned about additional red objects later on, this model would expand to include more diverse samples of "red" objects, over time allowing the robot to learn better grounded models.
During a guessing game, the robot then attempts to determine which of a set of objects on the ground in front of it the human player has in mind. It does so by asking questions featuring the concepts it has learned (e.g., "Is it red?"), with the human providing positive or negative responses. These positive and negative responses, combined with images that the robot captured during the game itself, can be used as additional feedback for improving upon its existing language models. Eventually, the robot reaches a confidence threshold for one of the objects, and makes a guess; it subsequently either wins or loses the game.
Since returning from the summer school in Athens, I have continued working on this project with many of the undergraduate and high school students whom I mentor. Additional information about the project can be found in the following paper:
As a side project, I worked on developing a domain-general approach to sarcasm detection. I trained my sarcasm detection model using tweets that had been self-tagged by Twitter users as either #sarcasm (the positive class) or #happiness, #sadness, #anger, #fear, #disgust, or #surprise (the negative class; it was assumed that these tweets expressed emotion but in a non-sarcastic way), and applied it to sarcastic and non-sarcastic Amazon product reviews. I developed a variety of syntactic and semantic features for the task, including those based on word and sentence polarity, those based on pointwise mutual information, and those based on two different bag-of-words models. I experimented with models trained only on tweets, only on product reviews, on a combination of the two, and on a combination of the two with an added domain adaptation transformation step, finding that the latter approach worked best and outperformed prior sarcasm detection work that learned only from in-domain data. More information about this project, as well as a comprehensive error analysis, can be found in the following papers:
In the past, I have occasionally collaborated with researchers from UNT's Center for Information and Cyber Security on NLP- or cognitive science-based aspects of work in the cybersecurity domain. In one study, I worked with these collaborators and additional researchers from UNT's College of Business and UNT's Department of Electrical Engineering to conduct an electroencephalogram (EEG) and eye tracking study to determine whether there are neural signatures or gaze patterns associated with the performance of malicious computer activity (e.g., hacking). My primary role was in data acquisition; I assisted in designing the experimental setup, ran participants, and monitored the EEG and eye tracking systems during data collection. Additional studies using the data collected are still underway, but one paper co-authored on this project won the Dr. Hermann Zemlicka Award for Most Visionary Paper at the Gmunden Retreat on NeuroIS; that paper can be found here: