Neil Zhu，简书ID Not_GOD，University AI 创始人 & Chief Scientist，致力于推进世界人工智能化进程。制定并实施 UAI 中长期增长战略和目标，带领团队快速成长为人工智能领域最专业的力量。
作为行业领导者，他和UAI一起在2014年创建了TASA（中国最早的人工智能社团）, DL Center（深度学习知识中心全球价值网络），AI growth（行业智库培训）等，为中国的人工智能人才建设输送了大量的血液和养分。此外，他还参与或者举办过各类国际性的人工智能峰会和活动，产生了巨大的影响力，书写了60万字的人工智能精品技术内容，生产翻译了全球第一本深度学习入门书《神经网络与深度学习》，生产的内容被大量的专业垂直公众号和媒体转载与连载。曾经受邀为国内顶尖大学制定人工智能学习规划和教授人工智能前沿课程，均受学生和老师好评。

ICML 16-全部接受论文

ICML 2016 - 强化学习相关论文如下：

1. Inverse Optimal Control with Deep Networks via Policy Optimization

Chelsea Finn, UC Berkeley; Sergey Levine, ; Pieter Abbeel, Berkeley

摘要：

http://arxiv.org/abs/1603.00448

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Nan Jiang, University of Michigan; Lihong Li, Microsoft

http://arxiv.org/abs/1511.03722

Smooth Imitation Learning

Hoang Le, Caltech; Andrew Kang, ; Yisong Yue, Caltech; Peter Carr,

PAC Lower Bounds and Efficient Algorithms for The Max KK-Armed Bandit Problem

Yahel David, Technion; Nahum Shimkin, Technion

Anytime Exploration for Multi-armed Bandits using Confidence Information

Kwang-Sung Jun, UW-Madison; Robert Nowak,

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Yingfei Wang, Princeton University; Chu Wang, ; Warren Powell,

https://arxiv.org/abs/1510.02354

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Junpei Komiyama, The University of Tokyo; Junya Honda, The University of Tokyo; Hiroshi Nakagawa, The University of Tokyo

https://arxiv.org/abs/1605.01677

Benchmarking Deep Reinforcement Learning for Continuous Control

Yan Duan, University of California, Berk; Xi Chen, University of California, Berkeley; Rein Houthooft, Ghent University; John Schulman, University of California, Berkeley; Pieter Abbeel, Berkeley

https://arxiv.org/abs/1604.06778

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Prashanth L.A., University of Maryland ; Cheng Jie, University of Maryland – College Park; Michael Fu, University of Maryland – College Park; Steve Marcus, University of Maryland – College Park; Csaba Szepesvari, Alberta

http://arxiv.org/abs/1506.02632

An optimal algorithm for the Thresholding Bandit Problem

Andrea LOCATELLI, University of Potsdam; Maurilio Gutzeit, Universität Potsdam; Alexandra Carpentier,

Sequential decision making under uncertainty: Are most decisions easy?

Ozgur Simsek, ; Simon Algorta, ; Amit Kothiyal,

Opponent Modeling in Deep Reinforcement Learning

He He, ; Jordan , ; Hal Daume, Maryland

Softened Approximate Policy Iteration for Markov Games

Julien Pérolat, Univ. Lille; Bilal Piot, Univ. Lille; Matthieu Geist, ; Bruno Scherrer, ; Olivier Pietquin, Univ. Lille, CRIStAL, UMR 9189, SequeL Team, Villeneuve d’Ascq, 59650, FRANCE

Asynchronous Methods for Deep Reinforcement Learning

Volodymyr Mnih, Google DeepMind; Adria Puigdomenech Badia, Google DeepMind; Mehdi Mirza, ; Alex Graves, Google DeepMind; Timothy Lillicrap, Google DeepMind; Tim Harley, Google DeepMind; David , ; Koray Kavukcuoglu, Google Deepmind

https://arxiv.org/abs/1602.01783

Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Google Inc.; Nando de Freitas, University of Oxford; Tom Schaul, Google Inc.; Matteo Hessel, Google Deepmind; Hado van Hasselt, Google DeepMind; Marc Lanctot, Google Deepmind

http://arxiv.org/abs/1511.06581 Cited by 10

Differentially Private Policy Evaluation

Borja Balle, Lancaster University; Maziar Gomrokchi, McGill University; Doina Precup, McGill

https://arxiv.org/abs/1603.02010

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Philip Thomas, CMU; Emma ,

https://arxiv.org/abs/1604.00923

Hierarchical Decision Making In Electricity Grid Management

Gal Dalal, Technion; Elad Gilboa, Technion; Shie Mannor, Technion

http://arxiv.org/abs/1603.01840

Generalization and Exploration via Randomized Value Functions

Ian Osband, Stanford; Ben , ; Zheng Wen, Adobe Research

https://arxiv.org/abs/1402.0635 Cited by 9

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Yutian Chen, University of Cambridge; Zoubin ,

摘要

Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.

http://arxiv.org/abs/1506.09039

Model-Free Imitation Learning with Policy Optimization

Jonathan Ho, Stanford; Jayesh Gupta, Stanford University; Stefano Ermon,

Improving the Efficiency of Deep Reinforcement Learning with Normalized Advantage Functions and Synthetic Experience

Shixiang Gu, University of Cambridge; Sergey Levine, Google; Timothy Lillicrap, Google DeepMind; Ilya Sutskever, OpenAI

http://arxiv.org/abs/1603.00748

Near Optimal Behavior via Approximate State Abstraction

David Abel, Brown University; David Hershkowitz, Brown University; Michael Littman,

https://cs.brown.edu/~dabel/papers/abel_approx_abstraction.pdf

Model-Free Trajectory Optimization for Reinforcement Learning of Motor Skills

Riad Akrour, TU Darmstadt; Gerhard Neumann

ICML 2016 强化学习相关论文

ICML 2016 强化学习相关论文

ICML 16-全部接受论文

ICML 2016 - 强化学习相关论文如下：

1. Inverse Optimal Control with Deep Networks via Policy Optimization

摘要：

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Smooth Imitation Learning

PAC Lower Bounds and Efficient Algorithms for The Max KK-Armed Bandit Problem

Anytime Exploration for Multi-armed Bandits using Confidence Information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Benchmarking Deep Reinforcement Learning for Continuous Control

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

An optimal algorithm for the Thresholding Bandit Problem

Sequential decision making under uncertainty: Are most decisions easy?

Opponent Modeling in Deep Reinforcement Learning

Softened Approximate Policy Iteration for Markov Games

Asynchronous Methods for Deep Reinforcement Learning

Dueling Network Architectures for Deep Reinforcement Learning

Differentially Private Policy Evaluation

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Hierarchical Decision Making In Electricity Grid Management

Generalization and Exploration via Randomized Value Functions

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

摘要

Model-Free Imitation Learning with Policy Optimization

Improving the Efficiency of Deep Reinforcement Learning with Normalized Advantage Functions and Synthetic Experience

Near Optimal Behavior via Approximate State Abstraction

Model-Free Trajectory Optimization for Reinforcement Learning of Motor Skills

推荐阅读更多精彩内容

ICML 2016 强化学习相关论文

ICML 16-全部接受论文

ICML 2016 - 强化学习相关论文 如下：

1. Inverse Optimal Control with Deep Networks via Policy Optimization

摘要：

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Smooth Imitation Learning

PAC Lower Bounds and Efficient Algorithms for The Max KK-Armed Bandit Problem

Anytime Exploration for Multi-armed Bandits using Confidence Information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Benchmarking Deep Reinforcement Learning for Continuous Control

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

An optimal algorithm for the Thresholding Bandit Problem

Sequential decision making under uncertainty: Are most decisions easy?

Opponent Modeling in Deep Reinforcement Learning

Softened Approximate Policy Iteration for Markov Games

Asynchronous Methods for Deep Reinforcement Learning

Dueling Network Architectures for Deep Reinforcement Learning

Differentially Private Policy Evaluation

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Hierarchical Decision Making In Electricity Grid Management

Generalization and Exploration via Randomized Value Functions

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

摘要

Model-Free Imitation Learning with Policy Optimization

Improving the Efficiency of Deep Reinforcement Learning with Normalized Advantage Functions and Synthetic Experience

Near Optimal Behavior via Approximate State Abstraction

Model-Free Trajectory Optimization for Reinforcement Learning of Motor Skills

推荐阅读更多精彩内容

ICML 2016 - 强化学习相关论文如下：