There is no shortage of A.I. researches leveraging the unique environments and simulations provided by video games to teach machines how to do everything and anything. This makes sense from an intuitive sense — until it doesn’t. Case in point: a team of researchers from Google DeepMind and Carnegie Mellon University using first-person shooters like Doom to teach A.I. programs language skills.

Huh?

Yes it sounds bizarre, but it works! Right now, a lot of devices tasked with understanding human language in order to execute certain commands and actions can only work with rudimentary instructions, or simple statements. Understanding conversations and complex monologues and dialogues is an entirely different process rife with its own set of big challenges. It’s not something you can just code for and solve.

In a new research paper to be presented at the annual meeting of the Associate for Computational Linguistics in Vancouver this week, the CMU and DeepMind team detail how to use first-person shooters to teach A.I. the principles behind more complex linguistic forms and structures.

Normally, video games are used by researchers to teach A.I. problems solving skills using the competitive nature of games. In order to succeed, a program has to figure out a strategy to achieve a certain goal, and they must develop an ability to problem solve to get there. The more the algorithm plays, the more the understand which strategies work and which do not.

That’s what makes the idea of teaching language skills to A.I. using a game like Doom so weird — the point of the game has very little to do with language. A player is tasked with running around and shooting baddies until they’re all dead.

For Devendra Chaplot, a master’s student at CMU who will present the paper in Vancouver, a 3D shooter is much more than that. Having previously worked extensively at training A.I. using Doom, Chaplot has a really good grasp at what kind of advantages a game like this provides.

Rather than training an A.I. agent to rack up as many points as possible, Chaplot and his colleagues decided to use the dense 3D environment to teach two A.I. programs how to associate words with certain objects in order to accomplish particular tasks. The programs were told things like “go to the green pillar,” and had to correctly navigate their way towards that object.

After millions of these kinds of tasks, the programs knew exactly how to parse through even the subtle differences in the words and syntax used in those commands. For example, the programs even know how to distinguish relations between objects through terms like “larger” and “smaller”, and reason their way to find objects they may have never seen before using key words.

DeepMind is incredibly focused around giving A.I. the ability to improvise and navigate through scenarios and problems that have never been observed in training, and to come up with various solutions that may never have been tested. To that extent, this new language-teaching strategy is an extension of that methodology.

The biggest disadvantage, however, comes with the fact that it took millions and millions of training runs for the A.I. to become skilled. That kind of time and energy certainly falls short of an ideal efficiency for teaching machines how to do something.

Still, the study is a good illustration of the need to start introducing 3D environments in A.I. training. If we want machines to think like humans, they need to immerse themselves in environments that humans live and breathe in every day.