OpenAI Releases GPT-5.4 With Enhanced Reasoning, Coding, and Computer-Use Features

OpenAI has unveiled GPT-5.4, touted as its most capable and efficient frontier model for professional tasks, now available in ChatGPT as GPT-5.4 Thinking, the API, and Codex.

This release integrates advances in reasoning, coding from GPT-5.3-Codex, and agentic workflows, enabling precise handling of complex work, such as spreadsheets and documents, with fewer iterations.

GPT-5.4 excels in knowledge work, achieving 83.0% win-or-tie rate on the GDPval benchmark against industry professionals across 44 occupations, surpassing GPT-5.2’s 70.9%.

It scores 87.3% on internal spreadsheet modeling tasks for junior analysts, up from 68.4%, and produces preferred presentations 68% of the time due to better aesthetics. Computer-use features enable native software operation via screenshots, mouse/keyboard, achieving 75.0% on OSWorld-Verified, exceeding human baselines at 72.4%.

Coding prowess matches GPT-5.3-Codex on SWE-Bench Pro at 57.7% while offering lower latency, with /fast mode boosting token velocity 1.5x. Tool search reduces tokens by 47% in large ecosystems, enhancing agent efficiency on benchmarks like Toolathlon (54.6%). Up to 1M token context supports long-horizon planning.

Benchmark	GPT-5.4	GPT-5.3-Codex	GPT-5.2
GDPval	83.0%	70.9%	70.9%
SWE-Bench Pro	57.7%	56.8%	55.6%
OSWorld-Verified	75.0%	74.0%	47.3%
BrowseComp	82.7%	77.3%	65.8%

These gains stem from token-efficient reasoning, reduced usage compared to GPT-5.2, and improved vision, such as 81.2% on MMMU-Pro.

Classified as “High cyber capability” under OpenAI’s Preparedness Framework, GPT-5.4 automates vulnerability discovery and end-to-end attacks, passing 73.33% of Cyber Range scenarios, such as Azure SSRF and binary exploitation, down slightly from GPT-5.3-Codex’s 80% but above GPT-5.2.

It fails EDR evasion, firewall evasion, token leakage, and CA/DNS hijacking, yet excels at CTF professional challenges and CVE-Bench consistency.

Safeguards include expanded cyber stack: ZDR asynchronous blocking, monitoring, trusted access controls, and account-level thresholds for high-risk content.

According to OpenAI, low CoT controllability (0.3% on long chains) aids monitoring and helps resist obfuscation. Hallucinations drop by 33% on claims and 18% on responses. Experts warn of dual-use risks, like automating flaw discovery without human probing.

Rolling out to ChatGPT Plus/Team/Pro, API as gpt-5.4 (Pro variant too), with GPT-5.2 legacy until June 2026. Pricing: $2.50/M input, $15/M output tokens; Pro at $30/$180. New Excel add-in, spreadsheet/presentation skills enhance pro use.

Mercor CEO praises leaderboard-topping on APEX-Agents for slide decks and models. Harvey notes 91% on BigLaw Bench for legal docs. Mainstay reports 95-100% success on property portals, 3x faster. Cursor VP highlights natural assertiveness in coding.

Site: cybersecuritypath.com

John

Editor

I'm cybersecurity researcher and threat intelligence writer focused on malware campaigns, data breaches, OSINT, and emerging attack techniques. Passionate about breaking down complex security threats into clear, actionable insights.

Visit Website View All Posts