This paper proposed three new reinforcement learning tasks which involved dealing with images. - Task 1: An agent crawls across a hidden image, revealing portions of it at each step. It must classify the image in the minimum amount of steps. For example classify the image as a cat after choosing to travel across the ears. - Task 2: The agent crawls across a visible image to sit on it's target. For example a cat in a scene of pets. - Task 3: The agent plays an Atari game where the background has been replaced with a distracting video. These tasks are easy to construct, but solving them requires large scale visual processing or attention, which typically require deep networks. To address these new tasks, popular RL agents (PPO, A2C, and ACKTR) were augmented with a deep image processing network (ResNet-18), but they still performed poorly.