[1] Chao Yu*, Akash Velu*, Eugene Vinitsky, Jiaxuan Gao, Yu Wang+, Alexandre Bayen+, Yi Wu+.
The Surprising Effectiveness of PPO in Cooperative Multi-agent Games. in Advances in Neural
Information Processing Systems (NeurIPS), 2022.
[2] Chao Yu, Zuxin Liu, Xin-Jun Liu, Fugui Xie, Yi Yang, Qi Wei, Fei Qiao. DS-SLAM: A semantic
visual SLAM towards dynamic environments. In International Conference on Intelligent Robots and
Systems (IROS), 2018.
[3] Shusheng Xu , Wei Fu, Jiaxuan Gao , Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu+, Yi Wu+. Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study. in International Conference on Machine Learning (ICML), 2024.
[4] Tonghe Zhang, Chao Yu+, Sichang Su, Yu Wang. ReinFlow: Fine-tuning Flow Matching Policy
with Online Reinforcement Learning. in Advances in Neural Information Processing Systems (NeurIPS) 2025.
[5] Zelai Xu, Chao Yu, Fei Fang, Yu Wang+, Yi Wu+. Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game. in International Conference on Machine Learning (ICML), 2024.
[6] Chao Yu*, Jiaxuan Gao*, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu. Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased. in International Conference on Learning Representations (ICLR), 2023.
[7] Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Du, Yu Wang, Yi Wu. Discovering Diverse Multi-agent Strategic Behavior Via Reward Randomization. In
International Conference on Learning Representations (ICLR), 2021.
[8] Botian Xu, Feng Gao, Chao Yu+, Ruize Zhang, Yi Wu, Yu Wang+. OmniDrones: An Efficient
and Flexible Platform for Reinforcement Learning. in Drone Control. in IEEE Robotics and
Automation Letters (RAL), 2024.
[9] Jijia Liu*, Feng Gao*, Bingwen Wei, Xinlei Chen, Qingmin Liao, Yi Wu, Chao Yu+, Yu Wang+. What Can RL Bring to VLA Generalization? An Empirical Study. in Advances in Neural Information Processing Systems (NeurIPS), 2025.
[10] Jijia Liu, Feng Gao, Qingmin Liao, Chao Yu+, Yu Wang+. Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network. in International Conference on Machine Learning (ICML), 2025.
[11] Yixian Zhang*, Shu'ang Yu*, Tonghe Zhang, Mo Guang, Haojia Hui, Kaiwen Long, Yu Wang, Chao Yu+, Wenbo Ding+. SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling. in International Conference on Learning Representations (ICLR), 2026.
[12] Chao Yu, Xinyi Yang, Jiaxuan Gao, Jiayu Chen, Yunfei Li, Jijia Liu, Yunfei Xiang, Ruixin
Huang, Huazhong Yang, Yi Wu, Yu Wang. Asynchronous Multi-Agent Reinforcement Learning for
Efficient Real-time Multi-robot Cooperative Exploration. In International Conference on Autonomous
Agents and Multi-agent Systems (AAMAS), 2023.






