Natural Language Processing

I use natural language processing (NLP) to analyze text from participants and automate analyses that typically require subjective scoring by raters. By developing NLP tools for the memory research community, I aim to reduce scoring burden and enable large-scale and longitudinal studies of autobiographical memory

Project 1: Automatically identify, classify, and count details in memories with natural language processing

To study memory, researchers ask participants to describe events from their past. To quantify the content in these memories, researchers typically use a procedure known as the autobiographical interview (AMI) to split each narrative into event-related details and other details (e.g. general information).

The AMI is widely used in the field of memory research, and has been used in over 300 studies. However, it is also extremely time-consuming (typically taking several hours to transcribe, score, and count details for each participant). As a result, large studies with this procedure are often impractical, and even conducting small studies is time-consuming.

To reduce scoring burden and enable large studies of memory, we developed an approach to automatically score responses with natural language processing. We fine-tuned an existing language model (distilBERT) to identify the amount of event and non-event content in each sentence. These predictions were aggregated to obtain detail estimates for each narrative. We evaluated our model by comparing manual scores with automated scores in five datasets. We found that our model performed well across datasets, as reported in a preprint of our paper. To make automated scoring available to other researchers, we provide a Colab notebook that is intended to be used without additional coding.

Project 2: Quantifying the amount of schematic content in memories

When we imagine or remember an event, we draw on schematic information, or generalized ideas about what is typical of a situation.

To study the role of schematic information in remembered or imagined events, we developed a method for quantifying the amount of typical content present in narratives. This method, which we adapted from a common text analysis approach (e.g. Tausczik & Pennebaker, 2010), involves comparing the language used in each narrative to a relevant lexicon, or list of words. We first constructed new lexicons with GloVe for each memory cue, then used those word lists to count the number of typical details in each memory. For example, to quantify the amount of schematic content in a memory about the beach, we counted instances of beach lexicon words (e.g. ‘wave’, ‘sand, ‘sunny’) present in the narrative.

A preliminary application of this method is used in our paper, which was recently accepted at Consciousness and Cognition. A validation project is underway.

Project 3a: Using NLP to study suicidal thinking

In progress.

Project 3b: Using NLP to cluster symptoms of depression

In progress.

Posted on:: May 23, 2021

Length:: 3 minute read, 454 words

Tags:: hugo software

See Also:: Other Memory Studies; Neuroimaging; Machine Learning