Continuous control with deep reinforcement learning
Timothy P. Lillicrap,Jonathan J. Hunt,Alexander Pritzel,Nicolas Heess,Tom Erez,Yuval Tassa,David Silver,Daan Wierstra
(Submitted on 9 Sep 2015)
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
我们适应深Q学习到连续动作领域的成功背后的想法。我们提出了一种基于确定性的政策梯度,可以通过连续的动作空间运行一个演员,评论家,无模型算法。使用相同的学习算法,网络架构和超参数,我们的算法稳健地解决了20多个模拟物理任务,包括经典的问题,如cartpole摆起,灵巧的操控,腿运动和汽车驾驶。我们的算法是能够找到的政策,其性能与那些由规划算法具有完全访问域及其衍生物的动态发现有竞争力的。我们进一步证明,对许多任务的算法可以学习政策结束到终端:直接从原始像素的输入。
Deepmind把之前游戏玩得不错的DQN模式推广到动作空间是高维和连续的情形。为避免在每一步对动作进行优化,本文采用基于确定策略梯度的Actor-Critic方法,在20多个模拟物理任务中取得了不错的效果。
Comments:10 pages + supplementary
Subjects:Learning (cs.LG); Machine Learning (stat.ML)
Cite as:arXiv:1509.02971[cs.LG]
(orarXiv:1509.02971v1[cs.LG]for this version)
Download: