An Atari-playing artificial intelligence created by researchers at the University of Freiburg in Germany has discovered a never-before-seen bug in the classic game Qbert. Using an inexplicable and seemingly random series of moves, the algorithm achieved an unprecedented high score in a matter of minutes.
The researchers explained how they trained their A.I. to achieve an impossible result rivaling James T. Kirk’s defeat of the Kobayashi Maru in a paper posted on the preprint side arXiv on February 24. Rather than employing a standard reinforcement learning approach, they used a lesser-known technique called evolutionary strategy.
As the name suggests, the method is loosely based of the Darwinian concept of natural selection. On a conceptual level, it makes sense to train algorithms with an evolutionary model. Like the process of evolution, machine learning is based on iterating an immense number of times on a problem and making slight tweaks to the algorithm or the species until it discovers the best possible behaviors for its environment.
Evolutionary strategies take it a step further by comparing slightly different versions of the algorithm to each other and then selecting for the more successful one. The winner is then randomly “mutated,” and the process continues over and over again, ideally resulting in an “evolved” algorithm that performs better than a wide range of closely related algorithms.
ES-trained algorithms learn via black-box methods, meaning the researchers don’t really know why the algorithm performs the way it does. The researchers didn’t make any manual alterations to the training system. Rather, they built it to measure the success of each attempt and select for the more successful strategies.
Just as Deepmind’s AlphaGo Zero used completely unrecognizable moves when beating the reigning human Go champion in 2017, the Qbert-playing algorithm stumbled upon some alien solutions to the game. One of the successful, but strange, strategies was rather macabre.
The agent gathers some points at the beginning of the game and then stops showing interest in completing the level. Instead, it starts to bait an enemy that follows it to kill itself. Specifically, the agent learns that it can jump off the platform when the enemy is right next to it, because the enemy will follow: although the agent loses a life, killing the enemy yields enough points to gain an extra life again (Figure 7). The agent repeats this cycle of suicide and killing the opponent over and over again.
But the most successful strategy relied upon a bug.
First, it completes the first level and then starts to jump from platform to platform in what seems to be a random manner. For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points.
For some reason, the system wasn’t able to exploit this bug every time. Out of 30 test runs, it only succeeded in achieving the high score eight times — even the fittest algorithm doesn’t always win in the harsh world of Qbert.