Please note: This master’s thesis presentation will take place in DC 2314 and online.
Kathryn Carbone, Master’s candidate
David R. Cheriton School of Computer Science
Supervisors: Professors Robin Cohen, Lukasz Golab
Image analysis in high-throughput imaging domains with complex scenes, like medicine, is challenging due to time and labour-constraints on domain experts. Visual entity linking (VEL) is a preliminary image processing task which links regions of interest (RoIs) to known entities in structured knowledge bases (KBs), thereby using knowledge to scaffold image understanding. We study a targeted VEL problem in which a specific user-highlighted RoI within the image is used to query a textual KB for information about the RoI, which can support downstream tasks such as similar case retrieval and question answering. For example, a doctor reviewing an MRI scan may wish to obtain images with similar presentations of a medically relevant RoI, such as a brain tumor, for comparison. By linking this RoI to its corresponding KB document, search of an imaging database with VEL-guided automatically-generated tags can be performed in a knowledge-aware manner based on exact or semantically similar entity tag matching.
Cross-modal embedding models like CLIP present straightforward solutions through the dual encoding of KB entries and either whole images or cropped RoIs, which can then be matched by a vector similarity search between these respective learned representations. However, using the whole image as the query may retrieve KB entries related to other aspects of the image besides the RoI; at the same time, using the RoI alone as the query ignores context, which is critical for recognizing and linking complex entities such as those found in medical images. To address these shortcomings, this thesis proposes VELCRO -- visual entity linking with contrastive RoI alignment -- which adapts an image segmentation model to VEL using contrastive learning by aligning the contextual embeddings produced by its decoder with the KB. This strategy preserves the information contained in the surrounding image while focusing KB alignment specifically on the RoI. To accomplish this, VELCRO performs segmentation and contrastive alignment in one end-to-end model via a novel loss function that combines the two objectives. Experimental results on medical VEL show that VELCRO achieves an overall linking accuracy of 95.2% compared to 83.9% for baseline approaches.
Join by:
- In-person: Go to DC 2314
- Online: Zoom