Need an illustration of a wise cat meditating in the Himalayas searching for enlightenment but don’t have any oil paints? Want some art of robot dinosaurs versus monster trucks in a colosseum fight — in a pinch? You’re in luck.
On Wednesday, OpenAI unveiled the DALL·E 2 system, which can generate artistic images from a phrase. According to the OpenAI blog, the name DALL·E is “a portmanteau of the artist Salvador Dalí and Pixar’s WALL·E.” The recently-released second version expands upon the original DALL·E, which debuted in January 2021. It features higher-resolution images (1024 x 1024 pixels compared to 256 x 256 pixels) and lower-latency. Ultimately, OpenAI plans to offer the tool to artists and others.
How does it work? — The text input signals to DALL·E key features that an image should include. A raccoon should have ears, for example, and if someone is sad, their lip corners may point down. Then, a diffusion model — which is a second neural network — uses the key features to create an image.
It’s not in the wild — It’s not publicly available, but you can join a waitlist. But even if you get access, you can’t use it to generate just anything. To protect against abuse, DALL·E has some built-in safeguards: for one, it can’t generate any recognizable faces based on a name, so you can’t generate incriminating photos of your archenemies. It also is designed to create G-rated images only: no porn, no obscenities, no hate symbols. In addition, OpenAI does not share the technology to the general public — despite the name, OpenAI is not so open after all.
According to The New York Times, OpenAI “puts a watermark in the corner of each image it generates. And though the lab plans on opening the system to testers this week, the group will be small.”
This time, you can edit — Unlike the first DALL·E, this version allows users to edit existing images. You want to add the Eiffel Tower into the back of your photo? No need to fire up Photoshop. With DALL·E 2, users can both add and remove items from photos, and the model can adjust for changes in shadows and lighting. Users can also generate images that look similar to existing images or blend two existing images to get a product with elements of both.
In the past several years, text-to-image technology has exploded along with the field of AI. One similar application is Dream from Wombo, which you can try for yourself: generate a trippy-looking image based on a text input.