Freda Shi and colleagues win EMNLP 2025 Outstanding Paper Award for research on LLM meta-linguistic reasoning | Cheriton School of Computer Science

Professor Freda Shi and her collaborators Changbing Yang, Franklin Ma and Jian Zhu from the University of British Columbia have received an Outstanding Paper Award at EMNLP 2025, the 30^th Conference on Empirical Methods in Natural Language Processing. Their paper, LingGym: How Far Are LLMs from Thinking Like Field Linguists?, introduced a new benchmark that evaluates how effectively large language models can perform meta-linguistic reasoning.

“Congratulations to Freda and her colleagues,” said Raouf Boutaba, University Professor and Director of the Cheriton School of Computer Science. “Their research offers valuable insight into how well large language models can interpret and reason in low-resource languages, and opens promising avenues for linguistic analysis and language documentation.”

Professor Freda Shi in Waterloo's Davis Centre

Freda Shi is an Assistant Professor at the Cheriton School of Computer Science and a Faculty Member at the Vector Institute, where she holds a Canada CIFAR AI Chair. Her research interests are in computational linguistics and natural language processing. She works towards deeper understandings of natural language and the human language processing mechanism, as well as how these insights can inform the design of more efficient, effective, safe and trustworthy NLP systems. She is particularly interested in learning language through grounding, computational multilingualism, and related machine learning aspects.

Professor Shi leads the CompLING Lab, a research group dedicated to exploring human language through computational methods.

More about this award-winning research

Researchers are exploring how large language models can assist and accelerate scientific discoveries, but comparatively little work has examined how LLMs can advance the social sciences. LLMs capable of reasoning about meta-linguistic knowledge have the potential to become powerful tools for language documentation, linguistic hypothesis testing, and typological research.

Linguists often generalize across languages by examining structures such as morphology, syntax and word order. Models with strong meta-linguistic reasoning could assist this process. Examples include proposing morpheme segmentations (breaking words into their smallest meaningful units), supplying glosses for each morpheme (explanations of what each morpheme means), identifying patterns or counterexamples to test hypotheses, and comparing structural features across languages.

To this end, Professor Shi and her colleagues developed LingGym, a task-oriented benchmark that evaluates the capacity of LLMs for meta-linguistic reasoning using interlinear glossed text (a way of presenting a sentence from one language with a word-by-word or morpheme-by-morpheme breakdown directly underneath) and grammatical descriptions extracted from 18 endangered and low-resource languages.

Unlike previous work focused on specific downstream tasks, LingGym evaluated whether LLMs can generalize linguistic inference across low-resource languages and structures not seen during their training. The researchers conducted a controlled evaluation known as word-gloss inference, in which models infer a missing word or gloss from context, with inputs that vary in how much structured linguistic information is provided, such as gloss lines, grammatical explanations and translations.

The researchers found that incorporating structured linguistic cues leads to consistent improvements in reasoning performance across all models. Their results highlight both the promise and current limitations of LLMs for typologically informed linguistic analysis and low-resource language documentation.

The research team plans to expand LingGym to support more diverse and in-depth use cases, particularly for endangered and low-resource languages. To help support LLM-assisted linguistic research broadly, their LingGym benchmark is freely available on GitHub.

To learn more about the research on which this article is based, please see Changbing Yang, Franklin Ma, Freda Shi, Jian Zhu. LingGym: How Far Are LLMs from Thinking Like Field Linguists? In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 1314–1340, Suzhou, China. Association for Computational Linguistics.