Enhanced Experience Replay Generation for Efficient Reinforcement Learning Enhanced Experience Replay Generation for Efficient Reinforcement Learning
Paper summary - *issue:* RL on real systems -> sparse and slow data sampling; - *solution:* pre-train the agent with the EGAN; - *performance:* ~20% improvement of training time in the beginning of learning compared to no pre-training; ~5% improvement and smaller variations compared to GAN pre-training. ## Introduction 5G telecom systems -> fufill ultra-low latency, high robustness, quick response to changed capacity needs, and dynamic allocation of functionality. *Problems:* 1. exploration has an impact on the service quality in real-time service systems; 2. sparse and slow data sampling -> extended training duration. ## Enhanced GAN **Fomulas** the training data for RL tasks: $$x = [x_1, x_2] = [(s_t,a),(s_{t+1},r)]$$ the generated data: $$G(z) = [G_1(z), G_2(z)] = [(s'_t,a'),(s'_{t+1},r')] $$ the value function for GAN: $$V(D,G) = \mathbb{E}_{z \sim p_z(z)}[\log(1-D(G(z)))] + \lambda D_{KL}(P||Q)$$ where the regularization term $D_{KL}$ has the following form: $$D_{KL}(P||Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}$$ **EGAN structure** https://i.imgur.com/FhPxamJ.png **Algorithm** https://i.imgur.com/RzOGmNy.png The enhancer is fed with training data *D\_r(s\_t, a)* and *D\_r(s\_{t+1}, r)*, and trained by supervised learning. After GAN generates synthetic data *D\_t(s\_t, a, s\_{t+1}, r)*, the enhancer could enhance the dependency between *D\_t(s\_t, a)* and *D\_t(s\_{t+1}, r)* and update the weights of GAN. ## Results two lines of experiments on CartPole environment involved with PG agents: 1. one for comparing the learning curves of agents with no pre-training, GAN pre-training and EGAN pre-training. => Result: EGAN > GAN > no pre-training 2. one for comparing the learning curves of agents with EGAN pre-training for various episodes (500, 2000, 5000). => Result: 5000 > 2000 ~= 500
arxiv.org
scholar.google.com
Enhanced Experience Replay Generation for Efficient Reinforcement Learning
Vincent Huang and Tobias Ley and Martha Vlachou-Konchylaki and Wenfeng Hu
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.AI

more

Summary by Tianxiao Zhao 5 months ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and