Google DeepMind Has Given A.I. Vision and Imagination

Picture a refrigerator. Now zoom out. You’re probably also picturing a sink, a table, perhaps a few chairs, and a microwave all in its vicinity. We didn’t actually mention any of that stuff, but your brain knows that since fridges are usually found in the kitchen, those other appliances are likely nearby. When the mental image comes up short, our brains use such assumptions to fill in the gaps. Our imaginations aren’t just for day dreams, they’re one of the main ways humans make sense of a world where we’re not always given all the information we need or want.

It’s part of why we humans are able to see, think, and act on what’s going on around us. We’ve learned not to stand on tables and to sit in chairs through observation and interaction, two things that come naturally to us. But this ability learn by watching that comes so easily to us is actually incredibly difficult for computers to imitate.

GIF of GQN agent “imagining” new viewpoints in rooms with multiple objects..


Computer scientists have been trying to teach artificial intelligence how to see and process images for more than 50 years, a process that usually involves gathering massive datasets and labeling it all for a computer to digest. But in a new paper published in Science Magazine Thursday, a team of researchers at Google’s DeepMind describes how they’ve created a Generative Query Network (GQN), an A.I. that can see and think like a human.

“Our ability to learn about the world by simply looking at it is simply incredible. One of the biggest open problems in A.I. is figuring out what is necessary to allow computers to do the same,” research scientist S. M. Ali Eslami, tells Inverse in a written statement. “In this work, we train a neural network to predict what a scene might look like from new viewpoints.”

In other words, this human-like computer is capable of being trained to make inferences, possibly the first step towards machines that can autonomously learn about the world. By using what Dr. Eslami calls “something akin to imagination” DeepMind’s GQN is able to construct a three-dimensional virtual environment with only a few two-dimensional pictures. This is almost identical to how the human brain fills in gaps of information by assuming what’s missing.

GIF of GQN agent operating in partially observed maze environments.


This A.I. was brought to life by a two-part system. The “representation network” translates the sample images into code the computer can understand. From there the “generation network” creates everything else that isn’t shown in the initial images.

“It was not at all clear that a neural network could ever learn to create images in such a precise and controlled manner,” says Dr. Eslami. “However, we found that sufficiently deep networks can learn about perspective, occlusion, and lighting, without any human engineering. This was a super surprising finding.”

GIF of GQN agent performing the Shepard Metzler object rotation task.


This work is still in its development stages and will require more data and faster hardware before it is ready to be deployed in the real world, but its implications should not go overlooked. Instead of spending months priming data to be used in A.I. training, researchers can now use rudimentary data to not only teach A.I. things, but teach it how to figure things out on its own.

Further down the line this could lead to virtual or robotic assistants that would could not only serve our needs, but anticipate them. Late night? DeepMind’s there with an extra shot of espresso in your morning coffee. Had a bad day and need a plate full of nachos? No need to log into Seamless, there’ll be ready and waiting for you when you get home.

In short, the new breakthrough effectively makes artificial intelligence significantly more intelligent.

Related Tags