OpenAI Dominates AI Speed Rankings, Outpacing Competitors with Record-Breaking Token Performance

OpenAI still dominates the speed chart, occupying the top two spots with GPT-OSS 20B (HIGH) delivering 239 tokens per second in second place and GPT-OSS 20B (HIGH) at the forefront with 306 tokens per second, underscoring speed as a critical budgetary driver for AI deployments.
Google Gemini 3.1 Pro Preview and AWS Nova 2.0 Pro Preview demonstrate continued competition from Google and AWS, each emphasizing different model goals—speed for some, quality for others—and reinforcing enterprise ecosystem considerations.
The mid-pack is tightly clustered and Chinese models like Qwen3.7 Max show genuine competitiveness, indicating speed must be balanced with output quality to maximize enterprise ROI.
In the crowded middle tier, OpenAI GPT-5.4 Mini and NVIDIA Nemotron 3 Super illustrate a push for a balance of speed and efficiency to support scalable deployments.
Third to fifth places feature Google Gemini 3.5 Flash, Alibaba Qwen3.7 Max, and XAI Grok 4.3, highlighting strong performance across major labs and notable non-US entrants.
Overview: A June 2026 benchmark ranks the fastest AI models, with speed as a primary differentiator and OpenAI maintaining clear leadership at the top.

Summary based on 1 source

Get a daily email with more AI stories

OfficeChai • Jun 9, 2026