Chapter 9: On-policy Prediction with Approximation From this chapter, we move from tabu...
Chapter 9: On-policy Prediction with Approximation From this chapter, we move from tabu...
Chapter 7: n-step Bootstrapping n-step TD methods span a spectrum with MC methods at on...
Chapter 6: Temporal-Difference Learning Temporal-difference (TD) learning is a combinat...
Chapter 5: Monte Carlo Methods Monte Carlo (MC) methods are learning methods for estima...
Chapter 4: Dynamic Programming Dynamic programming computes optimal policies given a pe...
Chapter 3: Finite Markov Decision Processes Basic Definitions MDP is the most basic for...
Chapter 2: Multi-armed Bandits Multi-armed bandits can be seen as the simplest form of ...
Pointer Networks Oriol Vinyals, Meire Fortunato, Navdeep JaitlyGoogle, BerkeleyNIPS 201...
Neural Computation of Decisions in Optimization Problems J. J. Hopfield, D. W. TankBiol...
Attention, Learn to Solve Routing Problems Wouter Kool, Herke van Hoof, Max WellingUniv...
Machine Learning for Combinatorial Optimization 1 Introduction 1.1 Background Operation...
几天前,特斯拉的自动驾驶汽车出事了,车主身亡。 最近,人工智能很火,无人驾驶很火,从互联网巨头到传统车企都在搞无人车。但是另一方面,许多真正工作在自动驾驶技术研发一线的研究人...
作者: Christopher Olah (OpenAI)译者:朱小虎 Xiaohu (Neil) Zhu(CSAGI / University AI)原文链接:https:...