About me
I’m a Ph.D. student at CSE, HKUST, supervised by Prof. Binhang Yuan. I am also fortunate to collaborate with Prof. Wentao Zhang of PKU-DCML Group. Previously I worked as a research assistant at Peking University DAIR lab, supervised by Prof. Bin Cui.
My research interests mainly focus on Efficient Training for Large Language Models and Multimodal Large Language Models.
Selected Work
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun, Jiantao Qiu, Yizhen Jiang, Guangxin He, Bohan Zeng, Conghui He, Binhang Yuan, Wentao Zhang
In submissionHallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Tianyi Bai, Yuxuan Fan, Jiantao Qiu, Fupeng Sun, Jiayi Song, Junlin Han, Zichen Liu, Conghui He, Wentao Zhang, Binhang Yuan
In submissionEfficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
Tianyi Bai, Ling Yang, Zhen Hao Wong, Fupeng Sun, Jiahui Peng, Xinlin Zhuang, Chi Zhang, Lijun Wu, Jiantao Qiu, Wentao Zhang, Binhang Yuan, Conghui He
ACL 2025 MainA Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang
ACM Computing Survey in submission
Full list in Google Scholar.
Education
- Hong Kong University of Science and Technology
PhD in Computer Science and Engineering
September 2023-present
Intern & Work Experience
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
May 2024 to May 2025
Position: Research Intern
Project:- Data-efficient LLM pretraining (supervised by Dr. Conghui He and Dr. Jiantao Qiu) –> Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
- Synthetic Data Detection (supervised by Dr. Conghui He and Prof. Weijia Li) –> LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
- Peking University, Beijing, China
January 2024 to Present
Position: Research Intern supervised by Prof. Wentao Zhang
Project:- Data-centric Generative AI –> A Survey of Multimodal Large Language Model from A Data-centric Perspective
- Peking University, Beijing, China
July 2021 to July 2023
Position: Research Intern in Prof. Bin Cui’s Group
Projects:- Transfer Learning for Bayesian Optimization –> First author preprint review: Transfer Learning for Bayesian Optimization: A Survey
- Transfer Learning based Hyperparameter Optimization –> KDD2022: Transfer Learning based Search Space Design for Hyperparameter Tuning