Schedule (Friday, October 10th @ Room 518C)

09:00 am: Opening remarks


09:10 am: Invited talks

Sarah Wiegreffe Assistant Professor, Department of Computer Science, University of Maryland

If steering is the answer, what was the question?

John Hewitt Assistant Professor of Computer Science, Columbia University


10:20 am: Workshop paper talks and coffee break (11:05am)

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking. (Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye)

Localizing Persona Representations in LLMs (Celia Cintas, Miriam Rateike, Erik Miehling, Elizabeth M. Daly, Skyler Speakman)

Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions (Sasha Boguraev, Christopher Potts, Kyle Mahowald)

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence (Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, Himabindu Lakkaraju, Shichang Zhang)

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps (Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov)

One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs (Jacob Dunefsky, Arman Cohan)


12:00 pm: Organized lunch


01:00 pm: Invited talks

Kyle Mahowald Assistant Professor in Linguistics at University of Texas at Austin

The INTERPLAY Between Verbal Representations and Verbal Behavior

Aaron Mueller Assistant Professor of Computer Science of Data Science at Boston University

Building a More Predictive Science of Language Model Behaviors with Interpretability


02:10 pm Poster session


03:20 pm Round table discussion and coffee break

Moderated by Aaron Mueller, John Hewitt, Kanishka Misra, Kyle Mahowald, Marius Mosbach


04:50 pm Closing remarks


05:00 pm Workshop social TBD