In humanity’s latest blunder to unwittingly help A.I. take over the world, machine learning researchers have now made 45 hours of video game replays accessible to A.I. programs. Compiled by a collaboration between Germany’s Aachen University and Microsoft’s Machine Intelligence and Perception group’s research teams, the Atari Grand Challenge Dataset gives A.I. systems access to new ways of learning and improving skills that will help them overthrow humanity, like controlling anti-air guns, tactical evasion, and maneuvering within complex and treacherous environments.
Okay, so they actually just got really good Ms. Pacman, Space Invaders, Video Pinball, and lesser-known Atari 2600 games, like Qbert and Montezuma’s Revenge. While we may be safe from the robot revolution for a while yet, these games did prove useful for training computers to learn and hone their skills over time.
The masterminds behind the Atari Grand Challenge Dataset invited people to play those five games on their website and then collected data on how well the gamers were doing. By comparing how long it took people to progress or be rewarded within each game and their overall high scores, gamers were divided into groups based on skill level, rated from “novice” to “expert.” Each time they entered a command in the game, it was recorded along with their current score and the in-game environment at that moment.
These replays were then used to train A.I. to play the same games, ultimately speeding up a process of machine learning called Reinforcement Learning. In a typical reinforcement learning experiment, computers learn to perform a task by repeatedly trying and failing in a digital environment until they finally refine their behavior. The researchers behind the Atari database noted that there were a number of these digital environments, like Elon Musk-owned OpenAI’s Roboschool. However, there were no examples of people attempting these same tasks, even though it’s known that machines are adept at learning through imitation.
“It’s important, because currently it is hard to apply reinforcement learning to real life because of its computational inefficiency,” wrote lead author Vitaly Kurin of Aachen University in an email to Inverse. He went on to explain the value of using human behavior, and how no one else has provided researchers with the resources to do this.
“But it’s more about that we give other people an opportunity to test their ideas on how to leverage human data for reinforcement learning purposes,” wrote Kurin.
When A.I. was trained with game replays from more skilled gamers, it ended up performing better as well, which suggests that machine learning could be expedited by bringing in some human experts. The A.I. ended up vastly outperforming people at pinball, roughly matching them at Qbert and Space Invaders, and struggled with Ms. Pacman and Montezuma’s Revenge, perhaps because the last two had the most dynamic environments.
However, Kurin explained that this is just one potential application for his dataset. His real goal was to provide other researchers with tools and information to further their own investigations and projects. In particular, he points out that replays from average gamers could help ongoing research into a new form of reinforcement learning from failure, where machines are also able to learn from human error in addition to success.
Recent progress in Reinforcement Learning (RL), fueled by its combination with Deep Learning, has enabled impressive results in learning to interact with complex virtual environments, yet real-world applications of RL are still scarce. A key limitation is data efficiency, with current state-of-the-art approaches requiring millions of training samples. A promising way to tackle this problem is to augment RL with learning from human demonstrations. However, human demonstration data is not yet readily available. This hinders progress in this direction. The present work addresses this problem as follows. We (i) collect and describe a large dataset of human Atari 2600 replays — the largest and most diverse such data set publicly released to date (ii) illustrate an example use of this dataset by analyzing the relation between demonstration quality and imitation learning performance, and (iii) outline possible research directions that are opened up by our work.