Publications

Recent Works

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
Audrey Huang, Adam Block, Qinghua Liu, Nan Jiang, Akshay Krishnamurthy, Dylan J. Foster
International Conference on Machine Learning (ICML), 2025

Learning to Achieve Goals with Belief State Transformers
Edward S. Hu, Kwangjun Ahn, Qinghua Liu, Haoran Xu, Manan Tomar, Ada Langford,
Dinesh Jayaraman, Alex Lamb, John Langford
International Conference on Learning Representations (ICLR), 2025

Partially Observable Reinforcement Learning

Partially Observable RL: Benign Structures and Simple Generic Algorithms (Survey Article)
Qinghua Liu, Chi Jin
Invited to Statistical Science (STS), 2025

Optimistic MLE – A Generic Model-based Algorithm for Partially Observable Sequential Decision Making
Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin
Symposium on Theory of Computing (STOC), 2023

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
Qinghua Liu, Csaba Szepesvári, Chi Jin
Neural Information Processing Systems (NeurIPS), 2022
European Workshop on Reinforcement Learning, 2022 (Oral)

When Is Partially Observable Reinforcement Learning Not Scary?
Qinghua Liu, Alan Chung, Csaba Szepesvári, Chi Jin
Conference on Learning Theory (COLT), 2022

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
(α-β order) Chi Jin, Sham M. Kakade, Akshay Krishnamurthy, Qinghua Liu
Neural Information Processing Systems (NeurIPS), 2020 (Spotlight) [Slides] [RL Theory Seminar]

Multi-Agent Reinforcement Learning

Breaking the Curse of Multiagency: Provably Efficient Decentralized MARL with Function Approximation
Yuanhao Wang*, Qinghua Liu*, Yu Bai+, Chi Jin+
Conference on Learning Theory (COLT), 2023

Policy Optimization for Markov Games: Unified Framework and Faster Convergence
Runyu Zhang*, Qinghua Liu*, Huan Wang, Caiming Xiong, Na Li, Yu Bai
Neural Information Processing Systems (NeurIPS), 2022

Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits
Qinghua Liu*, Yuanhao Wang*, Chi Jin
International Conference on Machine Learning (ICML), 2022 (Long oral)

V-Learning – A Simple, Efficient, Decentralized Algorithm for Multiagent RL
(α-β order) Chi Jin, Qinghua Liu, Yuanhao Wang, Tiancheng Yu
Best Paper in ICLR Workshop on Gamification and Multiagent Solutions, 2022
Mathematics of Operations Research, 2023

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games
Zihan Ding, Dijia Su, Qinghua Liu, Chi Jin
arXiv preprint

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
(α-β order) Chi Jin, Qinghua Liu, Tiancheng Yu
International Conference on Machine Learning (ICML), 2022
ICML Workshop on Reinforcement Learning Theory, 2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
International Conference on Machine Learning (ICML), 2021

Reinforcement Learning with Large State Spaces

Is RLHF More Difficult than Standard RL?
Yuanhao Wang, Qinghua Liu, Chi Jin
Neural Information Processing Systems (NeurIPS), 2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári
Neural Information Processing Systems (NeurIPS), 2023 (Spotlight)

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
(α-β order) Chi Jin, Qinghua Liu, Sobhan Miryoosefi
Neural Information Processing Systems (NeurIPS), 2021 (Spotlight) [Slides] [RL Theory Seminar]

Provable Rich Observation Reinforcement Learning with Combinatorial Latent States
Dipendra Misra, Qinghua Liu, Chi Jin, John Langford
International Conference on Learning Representations (ICLR), 2021

Other Works

Context-lumpable Stochastic Bandits
Chung-Wei Lee, Qinghua Liu, Yasin Abbasi-Yadkori, Chi Jin, Tor Lattimore, Csaba Szepesvári
Neural Information Processing Systems (NeurIPS), 2023

Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization
Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, H. Vincent Poor
Neural Information Processing Systems (NeurIPS), 2020
Longer version in IEEE Transactions on Signal Processing

A Tight Lower Bound for Uniformly Stable Algorithms
(α-β order) Qinghua Liu, Zhou Lu
arXiv preprint

* (α-β order) denotes alphabetical authorship ordering, and (*,+) denote equal contribution