OpenAI releases GPT-5, its most powerful model yet: performance significantly improved, Microsoft takes the lead in integrating it

Author: Li Dan

Source: Wall Street Journal

OpenAI's most anticipated product of the year is here.

On Thursday, August 7th, Eastern Time, OpenAI announced the release of its next-generation flagship artificial intelligence (AI) model, GPT-5. It is OpenAI's first "all-in-one" AI system, combining the reasoning capabilities of the O-series models with the rapid response capabilities of the GPT series models.

OpenAI CEO Sam Altman spoke highly of GPT-5 at the new model launch conference, calling it "the best model in the world" and a "major upgrade" compared to previous models. He also said that its release marked an "important step" for OpenAI on the road to achieving general artificial intelligence (AGI).

OpenAI announced that GPT-5 has achieved outstanding performance across multiple benchmarks, reaching cutting-edge levels in areas such as programming, mathematics, and health. GPT-5 achieved an accuracy of 74.9% on the SWE-bench Verified code test, slightly surpassing Anthropic's new model, Claude Opus 4.1, released on Tuesday. GPT-5 also significantly improved its ability to detect hallucinations, achieving a false positive rate of only 4.8%, significantly lower than the 20.6% of its predecessor, GPT-4o.

Starting this Thursday, GPT-5 will be available as the default model to all free ChatGPT users and paid users of Plus, Pro, and Team subscriptions, and will be launched on Enterprise and Edu paid plans within a week.

As with GPT-4o, the difference between the free and paid versions of GPT-5 lies in usage. Plus users enjoy higher usage limits, while Pro users receive unlimited usage and the enhanced version, GPT-5 Pro. For free users, full inference functionality may take several days to be fully available. Once free users reach their GPT-5 usage limits, OpenAI will switch them to the smaller model, GPT-5 mini.

OpenAI also said on Wednesday that it will provide its ChatGPT product to U.S. federal government agencies for a nominal fee of $1 per year. Specifically, it will provide an enterprise version of ChatGPT, which includes enhanced security and privacy features.

OpenAI has just officially announced GPT-5, and Microsoft announced that starting from Thursday, it will integrate GPT-5 into its extensive product portfolio, including 365 Copilot, Copilot, GitHub Copilot and Azure AI Foundry platforms, so that Microsoft's enterprise and consumer users can immediately experience GPT-5's advanced reasoning capabilities and programming advantages.

GPT-5 has three major advantages in programming, creative writing, and health

OpenAI’s GPT5 announcement begins by stating that GPT-5 is OpenAI’s “smartest, fastest, and most practical model, with built-in thinking capabilities that put expert-level intelligence within everyone’s reach.”

According to OpenAI, as OpenAI's "most powerful model", GPT-5 has achieved significant improvements in three key areas.

First, programming capabilities. GPT-5 is OpenAI's most powerful coding model to date, excelling at complex front-end generation and debugging large codebases. It can create beautiful, responsive websites, apps, and games with a single prompt. Early testers have noticed improvements in design choices like spacing, typography, and white space.

In SWE-bench Verified, a benchmark that obtains real-world coding tasks from GitHub, GPT-5 achieved an accuracy of 74.9% on the first attempt after thinking, higher than the 69.1% of OpenAI's reasoning model o3 and 30.8% of GPT-4o.

Commentators noted that this means GPT-5 performs slightly better than Anthropic's Claude Opus 4.1, released on Tuesday, and Google DeepMind's Gemini 2.5 Pro, which scored 74.5% and 59.6% respectively on the SWE-bench Verified test.

However, on the Humanity's Last Exam test, which measures the performance of models in mathematics, humanities, and natural sciences at the expert level across various disciplines, GPT-5 Pro, an enhanced version of GPT-5 with extended reasoning capabilities, scored 42% using tools. This was slightly lower than the xAI model Grok 4 Heavy, which scored 44.4%.

Altman said GPT-5 is particularly good at launching entire software apps on demand, which is called "ambient coding", that is, using AI to generate functional code based on natural language prompts, thereby speeding up development.

As an example, OpenAI researchers demonstrated that GPT-5 was required to create a web app to help English-speaking users learn French. The app must have an engaging theme and include flashcards, quizzes, the classic Snake game, and a method to track daily learning progress.

The researchers submitted the same prompt word to two GPT-5 windows, and after a few minutes, two different apps were generated. The head of OpenAI said that these apps "have some flaws," but users can adjust the AI-generated software according to their personal preferences, such as changing the background or adding more tabs.

In creative writing, GPT-5 is capable of handling complex writing tasks, such as unrhymed iambic pentameter or naturally flowing free verse. Nick Turley, OpenAI's Vice President of ChatGPT, stated that GPT-5 demonstrated "better taste" and more natural responses on creative tasks.

Health consultation is the third important area for improvement.

GPT-5 can more actively flag potential health issues and help users interpret medical results, although OpenAI emphasizes that ChatGPT is not a replacement for medical professionals.

In a test called HealthBench Hard Hallucinations, the thinking GPT-5 had a hallucination error rate of only 1.6%, which was significantly lower than the GPT-4o and o3 models, which had error rates of 15.8% and 12.9%, respectively.

New safety training model significantly reduces the likelihood of hallucinations

OpenAI claims that GPT-5 is more reliable and practical than previous models. It can answer real-world questions more accurately and is significantly less likely to experience hallucinations.

When running a web search on an anonymous prompt representing ChatGPT's production traffic, GPT-5 responses were approximately 45% less likely to contain factual errors than GPT-4o responses. After reflection, GPT-5 responses were approximately 80% less likely to contain factual errors than o3 responses. As shown in the figure below, GPT-5 responses had an error rate of only 4.8%, compared to 20.6% for GPT-4o and 22% for o3 responses.

OpenAI also announced that it has introduced a new form of safe training for GPT-5, called safe completions. This teaches the model to give the most helpful answers possible within a safe range. Sometimes, this may mean partially answering a user's question or only providing a high-level response.

If a rejection is necessary, the trained GPT-5 will transparently inform the user of the reason for the rejection and provide a safe alternative.

In both controlled experiments and OpenAI’s production models, we found that this approach to safety completion is more nuanced, better guides dual-use issues, increases robustness to ambiguous intent, and reduces unnecessary overrejection.

“GPT-5 has been trained to recognize when a task is impossible, avoid guesswork, and explain limitations more clearly than previous models, which reduces unfounded assertions,” said Michelle Pokrass, OpenAI’s head of post-training.

Introducing four optional ChatGPT chat preset personalities

OpenAI claims that GPT-5 has improved its performance in executing commands, and its ability to execute custom commands has also been improved accordingly. OpenAI will launch a new research preview version with four preset personalities for all ChatGPT users.

The initial four personality options - Cynic, Robot, Listener and Nerd - are all optional and users can adjust them at any time in the settings to match ChatGPT and the user's communication style.

Initially available for text chats, the four personalities will be expanded to voice chats, allowing users to customize how they interact with ChatGPT without having to write custom prompts – whether it's concise and professional, attentive and supportive, or even a touch sarcastic.

OpenAI says all of these new personalities met or exceeded our internal evaluation criteria for reducing ingratiating behavior.

Altman praised the historic breakthrough, but the results of using GPT-4 were quite poor.

At a briefing on Thursday, Altman spoke highly of GPT-5, positioning it as an important milestone on the road to AGI. He said:

“At no other time in history would it have been unthinkable to have something like GPT-5.” “This is the first time it feels like I’m talking to an expert in any field.”

Altman even went so far as to praise GPT-5 by slamming GPT-4 during the briefing. He said:

“I tried using GPT-4 again, but it was terrible.”

GPT-5 uses a unified system architecture with a real-time router that automatically decides whether to respond quickly or engage in deep "thinking" based on the conversation type, complexity, and tool requirements. This eliminates the need for users to select appropriate settings, making ChatGPT easier to use.

In internal benchmark tests of economically valuable tasks, GPT-5, using reasoning mode, performed at or above expert level in about half of the cases, covering over 40 occupations, including law, logistics, sales, and engineering. OpenAI VP Nick Turley stated, "The model feels really good."

Altman likened using GPT-5 to having a team of experts, all with PhDs, at your fingertips. He added, "In many new fields, people are limited by their ideas but lack the ability to actually execute them."

Microsoft fully integrates to seize the initiative

On the day of GPT-5's release, Microsoft announced its integration into a wide range of its product lines. For enterprise applications, Microsoft 365 Copilot will leverage GPT-5 to better handle complex problems, maintain focus during long conversations, and understand user context. Enterprise users can use reasoning capabilities to process emails, documents, and files.

For consumers, Microsoft Copilot’s new Intelligent Mode will leverage GPT-5 to help users discover optimal solutions. Users can experience GPT-5 for free at copilot.microsoft.com or through the Copilot app on Windows, Mac, Android, and iOS devices.

Developers will have access to GPT-5 support through GitHub Copilot and Visual Studio Code for writing, testing, and deploying code. All GPT-5 models will be available on the Azure AI Foundry platform, along with an AI-powered model router that selects the optimal model based on the complexity, performance requirements, and cost efficiency of each task.

Microsoft's AI Red Team tested the GPT-5 inference model using strict security protocols. The results showed that the model demonstrated one of the strongest AI security configurations among OpenAI's previous models in various attack modes such as malware generation and fraud automation.