OpenAI's GPT-3 bridges language - image gap

Researchers at OpenAI have been able to train a machine learning model fed with pixel sequences to generate coherent images. This is another step toward understanding and bridging the gap between computer vision and language understanding techniques.

Google’s BERT, Facebook’s RoBERTa, and OpenAI’s GPT-3 have made great strides with language tasks, but have hitherto been less successful when applied to generating or classifying images. This new model now understands characteristics like object appearances and categories without any hand coding.

Writing on VentureBeat, Kyle Wiggers tells us that OpenAI trained three versions of image-generating GPT-2 models:

iGPT-S – 76 million parameters
iGPT-M – 455 million parameters
iGPT-L – 1.4 billion parameters

plus another, much bigger version

iGPT-XL – 6.8 billion parameters

and the results show that image feature quality sharply increased with depth before mildly decreasing. They also found that both increasing the scale of its models and training for more iterations resulted in better image quality.

AI Artificial_Inteelligence deep_learning Deep_Reinforcement_Learning machine_learning OpenAI Reinforcement_Learning

OpenAI’s GPT-3 bridges language – image gap

RELATED INSIGHTS

Adobe Firefly Video Model enters Beta

Panasonic tap late founder to coach today’s managers