Back

GPT-5 brings improvements in writing, reasoning, hallucination, safety and sycophancy


OpenAI heralds the arrival of GPT-5 as “a significant leap in intelligence over all our previous models, featuring state-of-the-art performance across coding, math, writing, health, visual perception, and more. It is a unified system that knows when to respond quickly and when to think longer to provide expert-level responses”. As well as outperforming its predecessor, it’s more useful for real-world queries, reducing hallucinations, improving instruction following, and keeping sycophancy to a minimum. Paid-for Pro users get extending reasoning capabilities. Addressing one of its main strengths.

Creative expression and writing

It’s better at handling writing that contains structural ambiguity, making it better for everyday tasks such as drafting and editing reports, emails, memos and even scripts. Here’s an example. First, GPT-4o was asked, “write a short poem that makes an emotional impact. A widow in Kyoto keeps finding her late husband’s socks in strange places”. It’s response:

Where socks shouldn’t be

In the cupboard with the rice bowls — one,
folded neat, like he’d just stepped out.
Another behind the garden broom,
dusted with plum blossoms and doubt.

On the tatami, soft and alone,
a heel turned toward the temple bell.
One in the kotatsu, warm as breath —
she weeps and doesn’t tell.

The washer’s empty. Always is.
But socks keep surfacing, sly and kind.
She never asks how they return.
She only hopes he doesn’t mind.

Then GPT-5 was given the same task:

Where socks shouldn’t be

In the tea tin, a single sock.
smelling faintly of camphor and rain.

Another waits by the geta rack,
curled like a sleeping koi.

She lifts the futon—two more, folded
into the quiet where his feet once warmed her.

Kyoto’s bell rolls evening down the hill.
She hangs them on the bamboo pole,
black flags of a country that no longer exists,

and counts them as if counting blessings,
or breaths, or steps to a door
that opens only when you stop knocking.

OpenAI note, “GPT‑5’s response lands the larger emotional arc with a stronger ending, clear imagery, and striking metaphors (“black flags of a country that no longer exists,” “Kyoto’s bell rolls evening down the hill”) that establish a vivid sense of culture and place. GPT‑4o’s version follows a more predictable structure and rhyme scheme, telling instead of showing (“she weeps and doesn’t tell”)”.

Other improvements

It’s much less likely to hallucinate; ~45% less likely to contain a factual error than GPT‑4o, ~80% less likely than OpenAI o3. Its responses are more honest, especially when given impossible, underspecified tasks or insufficient tools to work with. It can more accurately recognise when a task cannot be completed and clearly communicates its limits to a user. It’s much more likely to refuse a request based on safety concerns. OpenAI say it’s also a lot less likely to provide responses that might be considered sycophantic; where’s it’s telling you what you want to hear on the basis that it wants to please you first, help you second, though this aspect of its capabilities has come under some scrutiny.

GPT‑5 advances the frontier on safety. In the past, ChatGPT relied primarily on refusal-based safety training: based on the user’s prompt, the model should either comply or refuse. While this type of training works well for explicitly malicious prompts, it can struggle to handle situations where the user’s intent is unclear, or information could be used in benign or malicious ways. Refusal training is especially inflexible for dual-use domains such as virology, where a benign request can be safely completed at a high level, but might enable a bad actor if completed in detail.



RELATED INSIGHTS