Victor Zhong, Jimmy Lin awarded $1.64M NSERC Alliance grant to develop deep research agents for natural science research and development | Cheriton School of Computer Science

Three-year, $3.5-million project with BASF Canada will develop intelligent research agents that can retrieve, reason over and act on complex scientific data

Professor Victor Zhong, principal investigator, and co-investigator Professor Jimmy Lin have been awarded $1,641,776 through the NSERC Alliance Grant program. This funding is complemented by more than $1.8 million in combined cash and in-kind contributions from industry partner BASF Canada, bringing the total project value to roughly $3.5 million.

Titled “Deep Research Agents for Natural Science R&D,” the three-year project will co-develop and deploy an end-to-end intelligent research assistant capable of retrieving, reasoning over, and acting on diverse experimental data sources. Developed in partnership with BASF Canada, the project aims to transform chemical informatics from static information retrieval into dynamic, action-oriented knowledge systems that accelerate scientific discovery.

The project will also play a significant role in training the next generation of researchers. Over its three years, it will support two postdoctoral researchers, six PhD students, and two master’s students.

Composite image, from left to right, of Professors Victor Zhong and Jimmy Lin

Left to right: Professors Victor Zhong, Jimmy Lin

Victor Zhong is an Assistant Professor at the Cheriton School of Computer Science and a Canada CIFAR AI Chair at the Vector Institute. His research lies at the intersection of machine learning and natural language processing, with an emphasis on using language understanding to learn more generally and efficiently. He leads the R2L Lab, which builds intelligent generalist agents that read, reason and act across digital and physical environments.

Jimmy Lin is a Professor at the Cheriton School of Computer Science, where he holds the David R. Cheriton Chair in Software Systems. His research focuses on building tools that help users make sense of large amounts of data. He works at the intersection of information retrieval, natural language processing, and data management. He is a Fellow of the Association for Computing Machinery, a Fellow of the Association for Computational Linguistics, and a member of the SIGIR Academy.

More about this research

Innovation in the chemical sciences requires integrating diverse data sources — from reaction protocols and compound properties to safety data sheets and spectral data — that are often stored across separate, incompatible systems. This fragmentation creates major bottlenecks in research and development. As a result, researchers in the chemical sciences can spend a significant share of their time retrieving, cleaning and integrating data manually rather than focusing on discovery.

To evaluate a single reaction candidate, for instance, a chemist may need to consult reaction yields in legacy PDF documents, find corresponding nuclear magnetic resonance spectra in a separate repository, and verify safety requirements using independent inventory management systems. These tasks are typically manual, time-consuming and inefficient.

This reliance on manual knowledge curation hinders the adoption of FAIR (findable, accessible, interoperable, reusable) data principles in industrial research environments, limiting the effective use and reuse of digital information across workflows.

Overcoming these barriers requires more than improved data accessibility. It requires AI systems that can reason across diverse information sources and interact seamlessly with laboratory data.

Current AI methods, however, face limitations in such environments. Retrieval-augmented generation systems struggle with complex, hybrid scientific queries, while text-to-SQL systems require rigid schemas that are often poorly suited for unstructured lab notes. Moreover, neither system provides the compositional reasoning needed to integrate internal knowledge with specific experimental contexts. An effective system must not only interpret information but also take meaningful actions based on it.

This project aims to advance scientific discovery by developing a multimodal, general-purpose agent equipped with computer vision to interpret data across graphical user interfaces and logical reasoning capabilities to support experimental planning, a measurable step forward from passive text-based processing systems toward autonomous scientific execution.

The research will be conducted with BASF Canada Inc. In addition to supporting postdoctoral researchers and graduate students, and providing direct research costs, BASF’s cash contribution includes $500,000 earmarked for research infrastructure, building upon its track record of integrating AI into research and development through its Global AI Innovation Center.

The overarching goal of this project is to develop a generalist agentic AI system capable of accelerating scientific discovery by unifying data retrieval, compositional reasoning, and tool use.

To achieve these ambitious results, the project relies on the complementary expertise of its investigators. Professor Zhong will contribute expertise in frontier agentic behaviours and reasoning, aligning closely with BASF’s AI Innovation Center goals, while Professor Lin brings expertise in scalable retrieval infrastructure to enable efficient processing of the large, complex datasets involved in industrial R&D.