I am a PhD student in the Computer Vision Lab in FST, University of Macau, advised by Prof. Shu Kong. Prior to joining the University of Macau, I earned my Master’s degree in Statistics from Chongqing University of Technology and received my Bachelor’s degree in Engineering from Tongji University.
My current research focuses on computer vision with Vision Language Models. During my Master’s studies, my research focuses on AI4Bio. At that time, I focused specifically on predicting protein function by designing models that leverage both protein sequences and 3D structural data.
📝 Publications

Generating a Paracosm for Training-Free Zero-Shot Composed Image Retrieval
Tong Wang, Yunhan Zhao, Shu Kong
arXiv
We address zero-shot composed image retrieval (ZS-CIR) from first principles and propose the training-free method Paracosm, which generates “mental image” / synthetic proxy for multimodal queries to facilitate matching with dataset images. Further, to mitigate the synthetic-to-real domain gaps, it generates synthetic counterparts for database images and performs multimodal-to-multimodal matching between (1) the combined textual and synthetic visual of the query and (2) the combined synthetic and real database images. Over four challenging benchmarks, Paracosm resoundingly outperforms existing ZS-CIR methods.

Active Learning via Vision-Language Model Adaptation with Open Data
Tong Wang, Jiaqi Wang, Shu Kong
arXiv
Our novel Tail First Sampling (TFS) strategy for AL, an embarrassingly simple yet effective method that prioritizes sampling data from underrepresented classes to label. Extensive experiments on standard benchmark datasets demonstrate that our method achieves state-of-the-art performance, significantly surpassing existing methods.

