Carnegie Mellon University at EMNLP 2025
CMU researchers are presenting 50 papers at the Thirtieth Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), held from November 4 – 9 in Suzhou, China. This includes 27 paper...
CMU researchers are presenting 50 papers at the Thirtieth Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), held from November 4 – 9 in Suzhou, China. This includes 27 papers in the main conference, 19 papers in the Findings track, 2 system demonstrations papers, and 2 industry track papers. This blog post provides aggregated information about EMNLP 2025 papers published by CMU researchers.
Key areas addressed are visualized below (representing 30 of the 50 total papers), illustrating the breadth of NLP and machine learning research being conducted at CMU :

Note: All information in this post has been obtained through the ACL Anthology API and the EMNLP 2025 Presentation Information spreadsheet. Please contact CMU ML Blog editors if you would like any information added or changed.
Table of Contents
- Special Theme: Interdisciplinary Recontextualization of NLP
- Multimodality and Language Grounding to Vision, Robotics and Beyond
- Resources and Evaluation
- Human-AI Interaction/Cooperation
- Interpretability, Model Editing, Transparency, and Explainability
- Mathematical, Symbolic, and Logical Reasoning in NLP
- Generalizability and Transfer
- NLP Applications
- Safety and Alignment in LLMs
- Natural Language Generation
- Question Answering
- Multilinguality and Language Diversity
- Computational Social Science, Cultural Analytics, and NLP for Social Good
- AI/LLM Agents
- Code Models
- Summarization
- Retrieval-Augmented Language Models
- Phonology, Morphology and Word Segmentation
- Low-resource Methods for NLP
- Special Theme: Interdisciplinary Recontextualization of NLP
- Resources and Evaluation
- Human-AI Interaction/Cooperation
- Interpretability, Model Editing, Transparency, and Explainability
- Multilinguality and Language Diversity
- AI/LLM Agents
- Code Models
- Retrieval-Augmented Language Models
- Speech Processing and Spoken Language Understanding
- Semantics: Lexical, Sentence-Level Semantics, Textual Inference, and Other Areas
- Ethics, Bias, and Fairness
- Dialogue and Interactive Systems
- LLM Efficiency
Main Conference Papers
Special Theme: Interdisciplinary Recontextualization of NLP
Spontaneous Giving and Calculated Greed in Language Models
Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
Multimodality and Language Grounding to Vision, Robotics and Beyond
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation
Resources and Evaluation
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Human-AI Interaction/Cooperation
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
Interpretability, Model Editing, Transparency, and Explainability
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
Mathematical, Symbolic, and Logical Reasoning in NLP
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Agentic-R1: Distilled Dual-Strategy Reasoning
Generalizability and Transfer
SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding Tasks
Searching for the Most Human-like Emergent Language
NLP Applications
PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs
Safety and Alignment in LLMs
Anecdoctoring: Automated Red-Teaming Across Language and Place
Natural Language Generation
CIE: Controlling Language Model Text Generations Using Continuous Signals
Question Answering
Table-R1: Inference-Time Scaling for Table Reasoning Tasks
Multilinguality and Language Diversity
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Computational Social Science, Cultural Analytics, and NLP for Social Good
Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication
AI/LLM Agents
On the Fine-Grained Planning Abilities of VLM Web Agents
Code Models
An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Summarization
Summarizing Speech: A Comprehensive Survey
Retrieval-Augmented Language Models
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
Phonology, Morphology and Word Segmentation
Morpheme Induction for Emergent Language
Low-resource Methods for NLP
Language Models Can be Efficiently Steered via Minimal Embedding Layer Transformations
Findings Papers
Special Theme: Interdisciplinary Recontextualization of NLP
FicSim: A Dataset for Multi-Faceted Semantic Similarity in Long-Form Fiction
Resources and Evaluation
SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
mrCAD: Multimodal Communication to Refine Computer-aided Designs
Human-AI Interaction/Cooperation
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Interpretability, Model Editing, Transparency, and Explainability
Linear Steerability in Language Models: When It Emerges and How It Evolves
Predicting Language Models’ Success at Zero-Shot Probabilistic Prediction
Multilinguality and Language Diversity
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
AI/LLM Agents
FLAIRR-TS – Forecasting LLM-Agents with Iterative Refinement and Retrieval for Time Series
Code Models
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Retrieval-Augmented Language Models
GAMIC: Graph-Aligned Molecular In-context Learning for Molecule Analysis via LLMs
Speech Processing and Spoken Language Understanding
SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Semantics: Lexical, Sentence-Level Semantics, Textual Inference, and Other Areas
Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications
Ethics, Bias, and Fairness
Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models
Dialogue and Interactive Systems
LLM Efficiency
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
System Demonstrations
AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories
BioGraphia: A LLM-Assisted Biological Pathway Graph Annotation Platform
Industry Track Papers
Leveraging LLMs to Streamline the Review of Public Funding Applications
Semantic Agreement Enables Efficient Open-Ended LLM Cascades