DeepSeek seems to have given the large model circle a boost in speed—
Just now, OpenAI urgently released the latest inference model, the o3-mini series, late at night.
It includes three versions: low, medium and high.
o3-mini and o3-mini-high have already been launched:
According to the official account, the o3 series models aim to push the boundaries of low-cost inference.
ChatGPT Plus, team and Pro users can access OpenAI o3-mini starting today, and enterprise-level access will be opened in a week.
Free users can also use o3-mini by selecting "Search+Reason" to experience the search.
Perhaps pushed by DeepSeek, this is the first time OpenAI has offered an inference model for free to users.
Even in the subsequent Reddit "Ask Me Anything" event, CEO Altman also rarely publicly reflected:
On the issue of open-sourcing weighted AI models, (in my opinion) we were on the wrong side of history.
At the same time, within just a few hours, netizens have already started to test it crazily...
Optimized for STEM reasoning, but the price is still sky-high compared to DeepSeek-R1
Let's first take a look at what the technical report says.
At the end of last year, OpenAI launched the o3-mini preview version, once again refreshing the capability boundary of small models. (On par with o1-mini in terms of cost and low latency)
At the time, CEO Altman previewed that the official version would be released in January this year. And at the last minute of the deadline, the official version of o3-mini finally hit the table.
Overall, similar to the previous generation o1-mini, it has also been optimized for STEM (Science, Technology, Engineering, Mathematics), continuing the "small but beautiful" style of the mini series.
Even the o3-mini (medium) not only performs on par with the o1 series in mathematical coding, but also responds faster.
Human expert evaluations show that in most cases, o3-mini produces more accurate and clearer answers than o1-mini, with a 56% preference rate, and the major error rate in handling complex real-world problems has also been reduced by 39%.
In terms of mathematical ability, the low-inference o3-mini (low) has reached the same level as o1-mini; under medium inference intensity, its capability is comparable to the full-strength o1; and once the inference intensity is maxed out (high), its performance directly surpasses the entire o1 series.
In the FrontierMath difficult problem test prepared by more than 60 top mathematicians, the high-inference o3-mini has also seen a significant improvement over the o1 series.
The official even specifically noted that if used with Python tools, o3-mini (high) solved more than 32% of the problems on the first try, including more than 28% of the T3-level problems.
In terms of scientific capabilities, in PhD-level physical and chemical problems, the low-inference o3-mini has already pulled away from the o1-mini.
Of course, in the important ability of coding, o3-mini is leading the o1 series at all levels.
Based on their performance on LiveBench, as the inference intensity is upgraded, the advantage of o3-mini is still expanding.
And it needs to be reminded that while o3-mini has taken the lead, it responds faster, with an average response time of 7.7 seconds, a 24% improvement over the 10.16 seconds of o1-mini.
Finally, in terms of security assessment, o3-mini clearly outperforms GPT-4o in multiple security assessments.
In terms of price, compared to the 0.14/0.55 USD for input/output of DeepSeek-R1, o3-mini is still sky-high.
According to netizen reviews, DeepSeek-R1 is still the king of cost performance: faster, better, and cheaper.
BTW, OpenAI has also released the team behind o3-mini as usual. It can be seen that this time it is led by Altman himself, with Carpus Chang and Kristen Ying as the research project managers (the list also includes many familiar old friends like Ren Hongyu, Zhao Shengjia, etc.).
Netizens are testing it crazily
As we just mentioned, netizens have already started to test it crazily.
However, from the reviews, the public's opinions on the performance of o3-mini are mixed.
For example, in the task of implementing "a ball bouncing inside a four-dimensional body" using Python, someone thinks o3-mini is the best LLM:
The effect is like this:
Then a netizen tried to use DeepSeek to do the same task, and from the effect, thinks that o3-mini is slightly better:
A more direct comparison, letting a ball bounce in a rotating hexagon, where the ball is affected by gravity and friction, the difference between o3-mini and DeepSeek R1 is more obvious:
Including some more complex tasks, such as creating 100 bouncing yellow balls inside a sphere, o3-mini can now do it:
And let o3-mini design a game of two competing Pac-Man:
In addition to DeepSeek, netizens have also compared the effects of o1 and o3-mini, such as generating a massive, amazing epic-level floating city.
There is also a netizen who proposed a puzzling question that almost all large models would get wrong, but to his surprise, o3-mini got it right:
However, the famous podcast host Lex Fridman's evaluation of o3-mini is:
OpenAI o3-mini is a good model, but DeepSeek R1 has similar performance, lower price, and reveals its reasoning process.
Better models will emerge (can't wait for o3-pro), but the "DeepSeek moment" is real. I think it will be remembered five years from now as a turning point in tech history.
One More Thing
Just a few hours after the launch of o3-mini, Altman himself also participated in the Reddit "Ask Me Anything" event with his team.
Considering that the open-sourcing of DeepSeek has recently stirred up the AI circle, Altman rarely publicly reflected:
On the issue of open-sourcing weighted AI models, (in my opinion) we were on the wrong side of history.
He even admitted that OpenAI's lead advantage will not be as large as before.
DeepSeek is indeed excellent, and we will continue to develop better models, but the lead advantage will be smaller.
At the same time, some of OpenAI's future plans have also been revealed.
For example, the advanced voice mode will soon be updated, and OpenAI will directly call it GPT-5, not GPT-5o, but there is no specific timetable yet.
In addition, the inference models will also support calling more tools.
Finally, the full-strength o3 was also mentioned, but it seems to be still quite far away...