Schedule (Friday, October 10th @ Room 518C)
09:00 am: Opening remarks
09:10 am: Invited talks
Sarah Wiegreffe Assistant Professor, Department of Computer Science, University of Maryland
If steering is the answer, what was the question?
John Hewitt Assistant Professor of Computer Science, Columbia University
10:20 am: Workshop paper talks and coffee break (11:05am)
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking. (Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye)
Localizing Persona Representations in LLMs (Celia Cintas, Miriam Rateike, Erik Miehling, Elizabeth M. Daly, Skyler Speakman)
Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions (Sasha Boguraev, Christopher Potts, Kyle Mahowald)
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence (Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, Himabindu Lakkaraju, Shichang Zhang)
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps (Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov)
One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs (Jacob Dunefsky, Arman Cohan)
12:00 pm: Organized lunch
01:00 pm: Invited talks
Kyle Mahowald Assistant Professor in Linguistics at University of Texas at Austin
The INTERPLAY Between Verbal Representations and Verbal Behavior
Aaron Mueller Assistant Professor of Computer Science of Data Science at Boston University
Building a More Predictive Science of Language Model Behaviors with Interpretability
02:10 pm Poster session
03:20 pm Round table discussion and coffee break
Moderated by Aaron Mueller, John Hewitt, Kanishka Misra, Kyle Mahowald, Marius Mosbach
04:50 pm Closing remarks
05:00 pm Workshop social TBD