Yihao Feng

I am a Ph.D student at UT Austin , where I work on Reinforcement Learning and Approximate Inference.

At UT Austin, I work with Prof. Qiang Liu. I also work closely with Lihong LiZiyang Tang,  Hao Liu, and Prof. Jian Peng.

Email  /  Google Scholar  /  Github

profile photo

I'm interested in statistical machine learning, especially Reinforcement Learning. Most of my work is about designing efficient algorithms for training and evaluation in RL, with a long term goal of understanding Deep RL in principle ways.

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds
Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu.
In Submission, ICLR 2021

We propose an approach to constructing non-asymptotic confidence intervals of off-policy estimation.

Off-Policy Interval Estimation with Lipschitz Value Iteration
Ziyang Tang, Yihao Feng, Na Zhang, Jian Peng, Qiang Liu.
NeurIPS, 2020
[coming soon]

Tight value bounds for behavior-agnostic off-policy policy evaluation with lipschitz value iteration.

Accountable Off-Policy Evaluation with Kernel Bellman Statistics
Yihao Feng, Tongzheng Ren, Ziyang Tang, Qiang Liu.
ICML, 2020
[arixv]   [video]

Tight high confidence bounds for behavior-agnostic off-policy policy evaluation.

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang*, Yihao Feng*, Lihong Li, Denny Zhou, Qiang Liu. (Equal Contribution)
ICLR, 2020  (Spotlight)
[arXiv]   [openreview]

A short version was presented at Optimization Foundations of Reinforcement Learning Workshop @ NeurIPS 2019 (Spotlight).

Doubly robust estimator based on the infinite horizon density ratio and off policy value estimation, and its connection with Lagrangian duality.

A kernel Loss for Solving Bellman Equation
Yihao Feng, Lihong Li, Qiang Liu.
NeurIPS, 2019
[arXiv]   [simons talk]   [slides]

A short version presented at Real-world Sequential Decision Making Workshop @ ICML 2019.

A new simple loss for off-policy value function learning with flexible function approximator such as neural networks.

Shrinkage-based Bias-Variance Trade-off for Policy Optimization
Yihao Feng, Hao Liu, Jian Peng, Qiang Liu.
Deep Reinforcement Learning Workshop @ NeurIPS, 2018

Adaptive strategy for combining model-free and model-based policy gradient by utilizing Stein's paradox.

Action-depedent Control Variates for Policy Optimization via Stein's Identity
Hao Liu*, Yihao Feng*, Yi Mao, Denny Zhou, Jian Peng, Qiang Liu. (Equal Contribution)
ICLR, 2018
[arXiv]   [code]   [slides]

Oral presentation at Deep Reinforcement Learning Symposium @NIPS 2017.

A general action-depedent baseline function for on-policy policy optimization algorithms such as PPO or TRPO.

Learning to Draw Samples with Amortized Stein Variational Gradient Descent
Yihao Feng, Dilin Wang, Qiang Liu.
UAI, 2017
[arXiv]   [slides]

Training implicit model by following the direction of Stein variational gradient descent.

Learning to Sample Using Stein Discrepancy
Dilin Wang, Yihao Feng, Qiang Liu.
NIPS Workshop on Bayesian Deep Learning, 2016   (Oral Presentation)
[paper]   [talk]

Training Deep generative models such as GANs by following Stein Variational Gradient.

Two methods for wild Variational Inference
Qiang Liu, Yihao Feng.
NIPS Workshop on Bayesian Deep Learning, 2016

Training Implicit models by minimizing kernel stein discrepancy.

RShop: ACloud-based Augmented Reality System for Shopping
C. Wang,Y. Feng , Q. Guo, Z. Li, K. Liu, Z. Tang, A. Tung, L. Wu, and Y. Zheng.
VLDB, 2017

Professional Service

Reviewer for ICML 2020,   NeurIPS 2018-2020,  ICLR 2019-2021.

Program committee, Optimization Foundation for Reinforcement Learning Workshop, NeurIPS 2019.

Teaching Assistant for CS395T: Learning Theory, UT Austin.

Website code from here.