Nature 论文
Mastering the game of Go without human knowledge
Nature 550, 7676 (2017). doi:10.1038/nature24270
Authors: David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis
网址:https://www.nature.com/nature/journal/v550/n7676/full/nature24270.html
请下载pdf查看!
Mastering the game of Go with deep neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis: Nature 529(7587): 484-489 (2016)
Papers
Mastering the Game of Go without Human Knowledge
https://deepmind.com/documents/119/agz_unformatted_nature.pdf
Human level control with deep reinforcement learning
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html
Play Atari game with deep reinforcement learning
https://www.cs.toronto.edu/%7Evmnih/docs/dqn.pdf
Prioritized experience replay
https://arxiv.org/pdf/1511.05952v2.pdf
Dueling DQN
https://arxiv.org/pdf/1511.06581v3.pdf
Deep reinforcement learning with double Q Learning
https://arxiv.org/abs/1509.06461
Deep Q learning with NAF
https://arxiv.org/pdf/1603.00748v1.pdf
Deterministic policy gradient
http://jmlr.org/proceedings/papers/v32/silver14.pdf
Continuous control with deep reinforcement learning) (DDPG)
https://arxiv.org/pdf/1509.02971v5.pdf
Asynchronous Methods for Deep Reinforcement Learning
https://arxiv.org/abs/1602.01783
Policy distillation
https://arxiv.org/abs/1511.06295
Control of Memory, Active Perception, and Action in Minecraft
https://arxiv.org/pdf/1605.09128v1.pdf
Unifying Count-Based Exploration and Intrinsic Motivation
https://arxiv.org/pdf/1606.01868v2.pdf
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
https://arxiv.org/pdf/1507.00814v3.pdf
Action-Conditional Video Prediction using Deep Networks in Atari Games
https://arxiv.org/pdf/1507.08750v2.pdf
Control of Memory, Active Perception, and Action in Minecraft
https://web.eecs.umich.edu/~baveja/Papers/ICML2016.pdf
PathNet
https://arxiv.org/pdf/1701.08734.pdf
Papers for NLP
Coarse-to-Fine Question Answering for Long Documentshttps://homes.cs.washington.edu/~eunsol/papers/acl17eunsol.pdfADeep Reinforced Model for Abstractive Summarizationhttps://arxiv.org/pdf/1705.04304.pdfReinforcementLearning for Simultaneous Machine Translationhttps://www.umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdfDualLearning for Machine Translationhttps://papers.nips.cc/paper/6469-dual-learning-for-machine-translation.pdfLearningto Win by Reading Manuals in a Monte-Carlo Frameworkhttp://people.csail.mit.edu/regina/my_papers/civ11.pdfImprovingInformation Extraction by Acquiring External Evidence with Reinforcement Learninghttp://people.csail.mit.edu/regina/my_papers/civ11.pdfDeepReinforcement Learning with a Natural Language Action Spacehttp://www.aclweb.org/anthology/P16-1153DeepReinforcement Learning for Dialogue Generationhttps://arxiv.org/pdf/1606.01541.pdfReinforcementLearning for Mapping Instructions to Actionshttp://people.csail.mit.edu/branavan/papers/acl2009.pdfLanguageUnderstanding for Text-based Games using Deep Reinforcement Learninghttps://arxiv.org/pdf/1506.08941.pdfEnd-to-endLSTM-based dialog control optimized with supervised and reinforcement learninghttps://arxiv.org/pdf/1606.01269v1.pdfEnd-to-EndReinforcement Learning of Dialogue Agents for Information Accesshttps://arxiv.org/pdf/1609.00777v1.pdfHybridCode Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learninghttps://arxiv.org/pdf/1702.03274.pdfDeepReinforcement Learning for Mention-Ranking Coreference Modelshttps://arxiv.org/abs/1609.08667
精选文章
wikihttps://en.wikipedia.org/wiki/Reinforcement_learningDeepReinforcement Learning: Pong from Pixelshttp://karpathy.github.io/2016/05/31/rl/CS294: Deep Reinforcement Learninghttp://rll.berkeley.edu/deeprlcourse/强化学习系列之一:马尔科夫决策过程http://www.algorithmdog.com/%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0-%E9%A9%AC%E5%B0%94%E7%A7%91%E5%A4%AB%E5%86%B3%E7%AD%96%E8%BF%87%E7%A8%8B强化学习系列之九:Deep Q Network (DQN)http://www.algorithmdog.com/drl强化学习系列之三:模型无关的策略评价http://www.algorithmdog.com/reinforcement-learning-model-free-evalution【整理】强化学习与MDPhttp://www.cnblogs.com/mo-wang/p/4910855.html强化学习入门及其实现代码http://www.jianshu.com/p/165607eaa4f9深度强化学习系列(二):强化学习http://blog.csdn.net/ikerpeng/article/details/53031551采用深度 Q 网络的 Atari 的 Demo:
Nature 上关于深度 Q 网络 (DQN) 论文:http://www.nature.com/articles/nature14236David视频里所使用的讲义pdfhttps://pan.baidu.com/s/1nvqP7dB什么是强化学习?http://www.cnblogs.com/geniferology/p/what_is_reinforcement_learning.htmlDavidSilver关于 深度确定策略梯度 DPG的论文http://www.jmlr.org/proceedings/papers/v32/silver14.pdfNature上关于 AlphaGo 的论文:http://www.nature.com/articles/nature16961AlphaGo相关的资源http://deepmind.com/research/alphago/What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/DeepLearning in a Nutshell: Reinforcement Learninghttps://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-reinforcement-learning/Bellmanequationhttps://en.wikipedia.org/wiki/Bellman_equationReinforcementlearninghttps://en.wikipedia.org/wiki/Reinforcement_learningMasteringthe Game of Go without Human Knowledgehttps://deepmind.com/documents/119/agz_unformatted_nature.pdfReinforcementLearning(RL) for Natural Language Processing(NLP)https://github.com/adityathakker/awesome-rl-nlp
视频教程
强化学习教程(莫烦)https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/强化学习课程 by David Silverhttps://www.bilibili.com/video/av8912293/?from=search&seid=1166472326542614796CS234:Reinforcement Learninghttp://web.stanford.edu/class/cs234/index.html什么是强化学习? (Reinforcement Learning)https://www.youtube.com/watch?v=NVWBs7b3oGk什么是 Q Learning (Reinforcement Learning 强化学习)https://www.youtube.com/watch?v=HTZ5xn12AL4强化学习-莫烦https://morvanzhou.github.io/tutorials/machine-learning/ML-intro/DavidSilver深度强化学习第1课 - 简介 (中文字幕)https://www.bilibili.com/video/av9831889/DavidSilver的这套视频公开课(Youtube)https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxTDavidSilver的这套视频公开课(Bilibili)http://www.bilibili.com/video/av9831889/?from=search&seid=17387316110198388304Deep Reinforcement Learninghttp://videolectures.net/rldm2015_silver_reinforcement_learning/
Tutorial
Reinforcement Learning for NLPhttp://www.umiacs.umd.edu/~jbg/teaching/CSCI_7000/11a.pdfICML2016, Deep Reinforcement Learning tutorialhttp://icml.cc/2016/tutorials/deep_rl_tutorial.pdfDQN tutorialhttps://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-4-deep-q-networks-and-beyond-8438a3e2b8df#.28wv34w3a
代码
OpenAI Gymhttps://github.com/openai/gymGoogleDeepMind 团队深度 Q 网络 (DQN) 源码:http://sites.google.com/a/deepmind.com/dqn/ReinforcementLearningCodehttps://github.com/halleanwoo/ReinforcementLearningCodereinforcement-learninghttps://github.com/dennybritz/reinforcement-learningDQNhttps://github.com/devsisters/DQN-tensorflowDDPGhttps://github.com/stevenpjg/ddpg-aigymA3C01https://github.com/miyosuda/async_deep_reinforceA3C02https://github.com/openai/universe-starter-agent