Home | INTERPLAY

Language models have grown increasingly powerful at performing complex tasks, motivating the study of their behavior and internals. However, distinct research communities often pursue these two objectives in isolation. As a result, we lack robust and standardized interpretability methods to assess LM behavior in complex, real-world scenarios comprehensively. This workshop promotes research and discussion on the interplay between behavior and model internals to address this gap. We aim to explore how understanding internal mechanisms can enhance our knowledge of complex model behaviors, and vice versa, by addressing the key questions such as:

How can we jointly evaluate model behavior and internals?
How do model interventions influence behavior, internals, and their interplay?
How can we disentangle the influence of internal dynamics from external behavior?
How do behavioral and internal evaluations align? Where and why do they differ?
How do model size, architecture, and pre-training data influence the link between internals and behavior?

Organizers: Leshem Choshen, Vagrant Gautam, Yufang Hou, Anne Lauscher, Tamar Rott Shaham, Andreas Waldis

Steering Committee: Jacob Andreas, David Bau, Yonatan Belinkov, Iryna Gurevych, Kyle Mahowald

News

We are super happy that 25 accepted papers will be presented at the INTERPLAY workshop.
🚨2nd call for pre-reviewed papers published! Submit your pre-review paper including reviews and response letter here.🚨
We extend the deadline for submission the 30th of June
We look for reviewers, register here
Call for papers published!
Workshop accepted at COLM ‘25!

Important Dates

June 30 - Submission due

July 10 - Submission due pre-reviewed paper

July 24 - Acceptance notification

October 10 - Workshop day

First Workshop on the Interplay of Model Behavior and Model Internals

A Workshop at the Conference for Language Modeling (COLM ‘25)

News

Important Dates