OpenAI Releases GPT-5.4 With Enhanced Reasoning, Coding, and Computer-Use Features
OpenAI has unveiled GPT-5.4, touted as its most capable and efficient frontier model for professional tasks, now available in ChatGPT as GPT-5.4 Thinking, the API, and Codex.
This release integrates advances in reasoning, coding from GPT-5.3-Codex, and agentic workflows, enabling precise handling of complex work, such as spreadsheets and documents, with fewer iterations.​
GPT-5.4 excels in knowledge work, achieving 83.0% win-or-tie rate on the GDPval benchmark against industry professionals across 44 occupations, surpassing GPT-5.2’s 70.9%.
It scores 87.3% on internal spreadsheet modeling tasks for junior analysts, up from 68.4%, and produces preferred presentations 68% of the time due to better aesthetics. Computer-use features enable native software operation via screenshots, mouse/keyboard, achieving 75.0% on OSWorld-Verified, exceeding human baselines at 72.4%.
Coding prowess matches GPT-5.3-Codex on SWE-Bench Pro at 57.7% while offering lower latency, with /fast mode boosting token velocity 1.5x. Tool search reduces tokens by 47% in large ecosystems, enhancing agent efficiency on benchmarks like Toolathlon (54.6%). Up to 1M token context supports long-horizon planning.
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 |
|---|---|---|---|
| GDPval | 83.0% | 70.9% | 70.9% |
| SWE-Bench Pro | 57.7% | 56.8% | 55.6% |
| OSWorld-Verified | 75.0% | 74.0% | 47.3% |
| BrowseComp | 82.7% | 77.3% | 65.8% |
These gains stem from token-efficient reasoning, reduced usage compared to GPT-5.2, and improved vision, such as 81.2% on MMMU-Pro.​
Classified as “High cyber capability” under OpenAI’s Preparedness Framework, GPT-5.4 automates vulnerability discovery and end-to-end attacks, passing 73.33% of Cyber Range scenarios, such as Azure SSRF and binary exploitation, down slightly from GPT-5.3-Codex’s 80% but above GPT-5.2.
It fails EDR evasion, firewall evasion, token leakage, and CA/DNS hijacking, yet excels at CTF professional challenges and CVE-Bench consistency.
Safeguards include expanded cyber stack: ZDR asynchronous blocking, monitoring, trusted access controls, and account-level thresholds for high-risk content.
According to OpenAI, low CoT controllability (0.3% on long chains) aids monitoring and helps resist obfuscation. Hallucinations drop by 33% on claims and 18% on responses. Experts warn of dual-use risks, like automating flaw discovery without human probing.
Rolling out to ChatGPT Plus/Team/Pro, API as gpt-5.4 (Pro variant too), with GPT-5.2 legacy until June 2026. Pricing: $2.50/M input, $15/M output tokens; Pro at $30/$180. New Excel add-in, spreadsheet/presentation skills enhance pro use.
Mercor CEO praises leaderboard-topping on APEX-Agents for slide decks and models. Harvey notes 91% on BigLaw Bench for legal docs. Mainstay reports 95-100% success on property portals, 3x faster. Cursor VP highlights natural assertiveness in coding.​
Site: cybersecuritypath.com