Different parameters needs different learning rate Adargrad: RMSProp:动态调整 Learning Rate Scheduling Learning Rate Decay Warm up Summary