OpenAI launches the most powerful raw image model: accurate infographics, multimodal input, extremely realistic quality, built-in GPT-4o

This article is machine translated
Show original

AI giant OpenAI today launched the most advanced image generation model to date, built into GPT-4o, allowing users to quickly generate and edit images directly in ChatGPT without needing to open DALL-E.

The company stated that the feature is now available to Pro subscribers (at $200 per month) and will gradually be rolled out to Plus, Team, and free users, as well as pushed to the Sora platform and API developers. Enterprise and educational users will also soon have access.

GPT-4o Image Generation Brings Higher Accuracy and Utility

The new feature uses the GPT-4o model, replacing the previous DALL-E 3, with native image generation and editing capabilities. The company claims the image quality is so lifelike that it's difficult to distinguish from reality, with rich details that even surpass competitors like Midjourney.

Unlike traditional diffusion models that generate entire images at once, GPT-4o uses an autoregressive technique, generating images step by step from left to right and top to bottom, similar to writing text. Research lead Gabriel Goh revealed to The Verge that this technology significantly improves text rendering and binding capabilities, better following instructions and accurately handling complex commands with 10 to 20 objects, far exceeding competitors' 5-8 object limit.

Additionally, GPT-4o has made breakthroughs in image generation in multiple aspects:

  • More Precise Text Rendering and Integration: Previous models often struggled to generate clear and accurately positioned text. GPT-4o can precisely merge text with images and integrate GPT's rich knowledge, making it more suitable for quickly creating infographics, presentations, or posters.

  • Multi-Round Image Generation: Edit images with a single sentence, flexibly adjust aspect ratios, specify precise colors using hexadecimal color codes, or request background removal. Users can interactively refine images using chat history, maintaining consistency across multiple generations.

  • Multi-Modal Input and Output (Text, Images): GPT-4o can analyze and learn from user-uploaded images, seamlessly integrating their details into the context to guide image generation.

  • Diverse Style Transformation: From hand-drawn sketches to high-resolution realistic styles, the model can flexibly create and transform to meet different needs.

Although generation speed is slightly slower than DALL-E 3, OpenAI emphasizes that the quality improvement is worth the wait. Demonstration cases include multi-panel comics (with extremely consistent characters), logos, informational posters, and restaurant menu designs, showcasing its commercial application potential.

OpenAI CEO Sam Altman excitedly stated during the livestream: "The image quality is stunning; I can hardly believe they're from AI! This is a new peak of creative freedom."

Product lead Jackie Shannon said: "GPT-4o has extensive world knowledge. Users only need to simply describe something like 'Newton's prism experiment' to obtain precisely annotated scientific illustrations." These features elevate ChatGPT from a text tool to an all-round creative platform.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments