PhD Seminar • Software Engineering • Scitix: Scalable Constraint-Based Type Inference for Code Snippets with Unknown Types

Monday, July 28, 2025 12:00 pm - 1:00 pm EDT (GMT -04:00)

Please note: This PhD seminar will take place online.

Yiwen Dong, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Chengnian Sun

Code snippets are commonly used in online developer communities to help communicate ideas and algorithms. However, contextual information, like dependencies and the exact types, are often missing in code snippets, which makes their reuse difficult. Some of the most successful automated techniques use logical constraints to infer the types and dependencies, but they do not work in practice because they require an exact knowledge base that contains all possible dependencies and exact types. However, such a knowledge base is both computationally expensive for constraint solving and impossible to achieve in the presence of unknown types (e.g., user-defined types) in code snippets.

To this end, this seminar discusses a novel, scalable technique named Scitix. Our insight is two-fold. First, inspired by gradual typing, we represent certain unknown types as Any, ignoring such types during constraint solving, improving performance and scalability. Second, our novel, iterative constraint-solving approach saves on computation and skips constraints involving unknown types. Our extensive evaluations show that our insights improve both performance and scalability compared to SnR (the state of the art). Specifically, Scitix achieves F1-scores of 96.6% and 88.7% on Stack Overflow and generated code snippets, respectively, using a large knowledge base of over 3,000 jars. In contrast, SnR consistently times out, yielding F1-scores close to 0%. Even with the smallest knowledge base, where SnR does not time out, Scitix reduces the number of errors by 79% and 37% compared to SnR. Furthermore, even with the largest knowledge base, Scitix reduces error rates by 20% and 78% compared to state-of-the-art LLMs. Scitix’s strong performance highlights its potential as a practical technique for type inference in real-world code snippets.


Attend this PhD seminar virtually on Zoom.