Stuck on that paper? AI tool suggests citations and autocompletes writing | Cheriton School of Computer Science

Ever spent hours browsing through multiple websites because you can’t find the right source for your essay?

Fortunately, a Waterloo-led research team has created ScholarCopilot, an AI-powered software that can make writing papers faster, smoother, and less stressful.

Users can write or upload on ScholarCopilot’s interface. When they click on the “search citations” button, it will analyze their content and generate a list of academic sources. If the user chooses one of the recommendations, ScholarCopilot will automatically create in-text citations.

Three young men in different outdoor settings: one by a waterfall, one in a forest, and one in a snowy landscape. — After noticing a gap in AI writing assistants, PhD student Yubo Wang co-created ScholarCopilot alongside his supervisor, Dr. Wenhu Chen, and fellow Waterloo student Xueguang Ma.

“It’s almost like having an intelligent helper right by your side. ScholarCopilot helps students avoid common mistakes like wrong formatting, and gain confidence in their abilities,” explains Yubo Wang, a PhD candidate at the University of Waterloo’s David R. Cheriton School of Computer Science. He led ScholarCopilot alongside his supervisor, Dr. Wenhu Chen, and fellow Waterloo students, Xueguang Ma, Huaye Zeng, Zhiheng Lyu, Yuxuan Zhang, Benjamin Schneider and Yi Lu, Carnegie Mellon University’s Xiang Yue, and independent researcher Ping Nie.

ScholarCopilot doesn’t just benefit students but also the research community. “It can make scientific research more accessible to the public, fostering broader participation and engagement. It makes scholarly communications credible by improving academic writing and citation accuracy,” says Wang.

View "scholar copilot demo video" on YouTube

A tutorial on ScholarCopilot, an AI-powered software that can autocomplete writing and suggest citations.

Wang and his fellow co-researchers were inspired by their own struggles with writing academic papers. Unfortunately, existing AI writing tools often recommend irrelevant citations — or worse, make up citations — making them unreliable.

These tools retrieve citations before generating text, treating citation retrieval and text generation as separate steps. Oftentimes, this rigid approach cannot adapt to evolving writing contexts. For example, if a user is writing about pop music in the 1970s but later references the 1990s, conventional tools may fetch outdated or irrelevant sources.

In contrast, ScholarCopilot integrates citation retrieval directly into the writing process. While producing content, it generates “retrieval tokens”, which contain keywords from the document. These tokens are used to search the reference corpus for the most appropriate citations.

A comparative diagram illustrating Traditional RAG and ScholarCopilot frameworks, highlighting components like Retriever, Generator, and evaluation metrics for generation quality, retrieval accuracy, and human study results. — A comparison between most AI tools (left) and ScholarCopilot (right). Most models uses “the first-retrieve-then-generate” pipeline, which overlooks any writing changes. Instead, ScholarCopilot combines “citation retrieval” and “text generation,” leading to more accurate citations.

The team found that ScholarCopilot outperforms most baselines, achieving a top-1 retrieval accuracy of 40.1%. “The system has a 40% chance of guessing what citation you need correctly on the first try,” explains Wang.

They also conducted a user study by recruiting 10 students from various academic backgrounds who had experience with AI writing tools. Each participant had to write a paper on one of their topics of expertise. Ultimately, they had to rank ScholarCopilot in comparison to ChatGPT in different categories, including citation quality and user experience.

Yet, ScholarCopilot outranked ChatGPT in all categories, receiving a 100% approval rating for citation quality, particularly citation accuracy and content quality. Around 80% of the participants stated they would use it in the future.

These results were surprising because ScholarCopilot is a 7-billion parameter model, which is much smaller than most leading models, including ChatGPT and Claude. The key to ScholarCopilot’s success was its tailored training set, which comprised 500,000 academic papers. Whereas most AI tools are trained on multiple topics to ensure general use.

Overall, these Waterloo researchers are creating paradigm shifts in AI and academia— saving time and stress from gruelling papers.

Despite public discourse on AI impairing students’ skills, Wang emphasizes that “ScholarCopilot is designed not to replace students writing but rather assist them in handling mechanical tasks such as finding citations.”

“This allows students to focus more on critical tasks like reading, analytical, critical thinking and generating original ideas. Our design also encourages active human-AI collaboration, enabling students to remain fully engaged in the learning and writing process.”

Recently, the team open-sourced their work, allowing users to download their demo.

The research, ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations, was published in April.