Tianyi Bai
HKUST CSE Ph.D. / Agentic AI / Data-Centric AI
Tianyi Bai
I am a Ph.D. student in Computer Science at HKUST, advised by Prof. Binhang Yuan. My research focuses on building capable computer-use and coding agents, with an emphasis on data-centric methods for improving their training, evaluation, and multimodal reasoning abilities. I am fortunate to intern at Qwen, where I work with Binyuan Hui and Junyang Lin. I also collaborate with Prof. Wentao Zhang at PKU DCAI.

Representative Work
Computer Use AgentQwen3.5 / Qwen3.6
Leading Computer Use Agent capability work across RL infrastructure, annotation quality, data pipelines, training, evaluation, and bad case analysis.
Coding AgentQwen3-Coder
Contributed to the Browser Use Agent module, including browser interaction data construction, capability improvement, training pipeline support, and evaluation.
Data PipelineDataFlow
Responsible for the code data pipeline, including code data processing, quality filtering, pipeline orchestration, and preparation of training-ready code data.
Selected Work

Coding AgentTechnical Report
Qwen3-Coder-Next Technical Report
Qwen Team

Multimodal GeometryICML 2026
Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Haobo Lin, Tianyi Bai , Chen Chen, Jiajun Zhang, Bohan Zeng, Wentao Zhang, Binhang Yuan

Multimodal ReasoningNeurIPS 2025
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Tianyi Bai, Yuxuan Fan, Jiantao Qiu, Fupeng Sun, Jiayi Song, Junlin Han, Zichen Liu, Conghui He, Wentao Zhang, Binhang Yuan

Visual ReasoningNeurIPS 2025
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun, Jiantao Qiu, Yizhen Jiang, Guangxin He, Bohan Zeng, Conghui He, Binhang Yuan, Wentao Zhang

Data SelectionACL 2025 Main
Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
Tianyi Bai, Ling Yang, Zhen Hao Wong, Fupeng Sun, Jiahui Peng, Xinlin Zhuang, Chi Zhang, Lijun Wu, Jiantao Qiu, Wentao Zhang, Binhang Yuan, Conghui He

Data-Centric AIACM Computing Surveys, major revision
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang
Experience
- Research Intern
I work on agent capabilities for Qwen models, including Browser Use Agent for Qwen3-Coder and Computer Use Agent for Qwen3.5/Qwen3.6. My work spans data construction, RL infrastructure, training, evaluation, and failure analysis. - Research Assistant
I contribute to DataFlow, with a focus on code-data workflows, code data pipeline construction, data processing, and quality filtering for training-ready code data. - Research Intern
I worked on data preparation and selection for LLM pretraining, including data management strategies for InternLM3-8B and Ray-based labeling pipelines for data selection. - Research Assistant
I studied transfer learning for Bayesian optimization and hyperparameter tuning. This work led to a KDD 2022 paper on transfer-learning-based search space design.