This AI turns text into surreal, suggestion-driven art

The GPT-3 powered bot called "DALL·E" is all about playing around with words and colors.


A neural network dubbed DALL·E is all about taking your words and attempting to turn them into art. First launched by OpenAI, this neural network works through a data set of text-image duos and then attempts to transform the words it's given into an image. The name itself is an amusing portmanteau based on the surrealist Salvador Dali and the famous Pixar WALL·E, and the results are a suitable mixture of cartoonish cuteness and surrealness.

If you check the neural network out, you will notice a whole gamut of odd captions, including "an illustration of a baby daikon radish in a tutu walking a dog." It sounds like an animator's fever dream but DALL·E manages to turn that string of outlandish descriptions into a coherent image.


Other captions are simpler and easier to transform into images. For example, DALL·E is capable of turning "storefront that has the word OpenAI on it" into a regular-looking storefront with the exact words. But other captions require DALL·E to get a little creative.

"The exact same cat on the top as a sketch on the bottom" would be a little tough for a neural network to render as an image but with the right amount of training, DALL·E manages to do just that. This is a relatively successful attempt at machine learning. In the past, though, we've seen artificial intelligence bots turn captions into objectively creepy renderings, like this one.

Experiment with DALL·E — A neural network like this one relies on a huge set of data in order to instruct itself to complete visual generation tasks. It's a matter of manipulation, language concepts, and modification. So if you want, you can give DALL·E a prompt and see what it will do with your caption (which you'll create from a set of words from OpenAI).

I played around with the network by tweaking this caption: "A cube made of noodles. A cube with the texture of noodles." The caption previously used "porcupine" in place of "noodles." By mapping the texture of the given descriptor, DALL·E managed to create a host of noodle cubes.


Its creators say that the network gets better with more tries. So I replaced "noodles" with "orchid flower" and received these images. Then I tried "toothpaste"... and the list goes on.


What's impressive about DALL·E is that it is handling multiple shapes, attributes, colors, and dimensions all at once. So a simple sentence like "a plate sitting on a large black counter" requires DALL·E to compose the spatial, texture, and visual aspects of each word correctly. Neural networks like this one have the potential to transform interpretation by artificial intelligence, providing better quality results each time, and potentially widening its use cases down the line.