OpenAI’s DALL-E is “GPT-3 for images”

OpenAI’s latest piece of mind-boggling deep learning wizardry, DALL-E (Dali meets WALL-E, geddit?), is able to create illustrations, photos, renders or any other visual you care to mention just from a description that you provide. For more detail about DALL-E, try this very readable description from Dale Markowitz on The Next Web.

GPT-3, for those who are still unfamiliar with it, is able to generate pages of text on any subject you feed to it. Some attempts result in gibberish, while others look like they have been written by a human.

GPT-3 showed that language can be used to instruct a large neural network to perform a variety of text generation tasks. Image GPT showed that the same type of neural network can also be used to generate images with high fidelity. We extend these findings to show that manipulating visual concepts through language is now within reach.


DALL-E is robust and understands the prompts it is receiving. It rarely fails completely with its output and often produces unexpected but pleasingly off-the-wall results. Another new system from OpenAI, CLIP, used in conjunction with DALL-E, understands and ranks images.

In the future, we plan to analyze how models like DALLĀ·E relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer term ethical challenges implied by this technology.


While fascinating, powerful and thought-provoking, DALL-E outputs are not yet really ‘production-ready’, without modification, but the direction of travel is obvious and destinations not far off.

Source: TechCrunch