Tianyi Bai
HKUST CSE Ph.D. / Agentic AI / Data-Centric AI
Tianyi Bai
I work on agentic AI, multimodal visual reasoning, and data-centric AI. I am a Ph.D. student at HKUST supervised by Prof. Binhang Yuan. I am also fortunate to intern at Qwen, where I am supervised by Binyuan Hui and Junyang Lin. I also collaborate with Prof. Wentao Zhang at PKU DCAI.

Representative Work
Qwen3.5 / Qwen3.6
Leading Computer Use Agent capability work across RL infrastructure, annotation quality, data pipelines, training, evaluation, and bad case analysis.
Qwen3-Coder
Contributed to the Browser Use Agent module, including browser interaction data construction, capability improvement, training pipeline support, and evaluation.
DataFlow
Responsible for the coding pipeline, including code data processing, quality filtering, pipeline orchestration, and preparation of training-ready code data.
Selected Work
Coding AgentTechnical Report
Qwen3-Coder-Next Technical Report
Contributor
Multimodal GeometryICML 2026
Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Mentor
Multimodal ReasoningNeurIPS 2025
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Tianyi Bai, Yuxuan Fan, Jiantao Qiu, Fupeng Sun, Jiayi Song, Junlin Han, Zichen Liu, Conghui He, Wentao Zhang, Binhang Yuan
Visual ReasoningNeurIPS 2025
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun, Jiantao Qiu, Yizhen Jiang, Guangxin He, Bohan Zeng, Conghui He, Binhang Yuan, Wentao Zhang
Data SelectionACL 2025 Main
Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
Tianyi Bai, Ling Yang, Zhen Hao Wong, Fupeng Sun, Jiahui Peng, Xinlin Zhuang, Chi Zhang, Lijun Wu, Jiantao Qiu, Wentao Zhang, Binhang Yuan, Conghui He
Synthetic DataICLR 2025 Spotlight
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Co-first author
Data-Centric AIACM Computing Surveys, major revision
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang
Experience
- Research Intern
I work on agent capabilities for Qwen models, including Browser Use Agent for Qwen3-Coder and Computer Use Agent for Qwen3.5/Qwen3.6. My work spans data construction, RL infrastructure, training, evaluation, and failure analysis. - Research Assistant
I contribute to DataFlow, with a focus on code-data workflows, coding pipeline construction, data processing, and quality filtering for training-ready code data. - Research Intern
I worked on data preparation and selection for LLM pretraining, including data management strategies for InternLM3-8B and Ray-based labeling pipelines for data selection. - Research Assistant
I studied transfer learning for Bayesian optimization and hyperparameter tuning. This work led to a KDD 2022 paper on transfer-learning-based search space design.