Prioritized Experience Replay Prioritized Experience Replay
Paper summary this paper: develop a framework to replay important transitions more frequently -> learn efficienty prior work: uniformly sample a replay memory to get experience transitions evaluate: DQN + PER outperform DQN on 41 out of 49 Atari games ## Introduction **issues with online RL:** (solution: experience replay) 1. strongly correlated updates that break the i.i.d. assumption 2. rapid forgetting of rare experiences that could be useful later **key idea:** more frequently replay transitions with high expected learning progress, as measured by the magnitude of their temporal-difference (TD) error **issues with prioritization:** 1. loss of diversity -> alleviate with stochastic prioritization 2. introduce bias -> correct with importance sampling ## Prioritized Replay **criterion:** - the amount the RL agent can learn from a transition in its current state (expected learning progress) -> not directly accessible - proxy: the magnitude of a transition’s TD error ~= how far the value is from its next-step bootstrap estimate **stochastic sampling:** $$P(i)=\frac{p_i^\alpha}{\sum_k p_k^\alpha}$$ *p_i* > 0: priority of transition *i*; 0 <= *alpha* <= 1 determines how much prioritization is used. *two variants:* 1. proportional prioritization: *p_i* = abs(TD\_error\_i) + epsilon (small positive constant to avoid zero prob) 2. rank-based prioritization: *p_i* = 1/rank(i); **more robust as it is insensitive to outliers** **importance sampling:** IS weights: $$w_i = \left(\frac{1}{N} \cdot \frac{1}{P(i)}\right)^\beta $$ - weights can be folded into the Q-learning update by using $w_i*\delta_i$ instead of $\delta_i$ - weights normalized by $\frac{1}{\max w_i}$
Prioritized Experience Replay
Tom Schaul and John Quan and Ioannis Antonoglou and David Silver
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG


Summary by Tianxiao Zhao 2 years ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and