Together AI Achieves Major AI Inference Breakthroughs with MiniMax-M3, Expands Global Reach with Pax8 Partnership
June 13, 2026
Together AI is profiled as an AI infrastructure provider focused on high-performance inference, with a weekly recap highlighting platform, performance, and compliance developments aimed at AI agents, multimodal workloads, and latency-sensitive voice applications.
The MiniMax-M3 model demonstrated major inference efficiency gains, delivering 81 to 125 percent throughput improvements across common agent-like traffic patterns thanks to architectural and kernel-level optimizations, including a 1 million-token context window and native multimodality.
Together AI supported Deep Cogito’s large-scale open-weight reasoning models (ranging from 8B to 671B parameters), enabling sub-500 millisecond time-to-first-token at over 1,000 requests per minute with 99.9% uptime, with quantized variants available in under two weeks.
A Pax8 partnership was announced to distribute Together AI’s AI Native Cloud globally through a channel targeting small and mid-sized businesses, expanding access to enterprise-grade AI infrastructure and 200+ open-source models.
Support was added for NVIDIA Nemotron 3 Ultra and Nemotron 3.5 ASR models on Together AI's AI Native Cloud, broadening options for AI agents and multilingual voice applications via a serverless inference stack optimized for high throughput and low latency.
Infrastructure placement matters: colocated components offer around 5 milliseconds latency versus about 75 milliseconds when models run in a separate data center, highlighting the impact of network design and hardware proximity for real-time use cases.
TipRanks MCP for Agents enables delivering institutional-grade market data into Claude, ChatGPT, Cursor, and other MCP-compatible tools, supporting personal research, portfolio monitoring, and AI-assisted investment workflows.
Together AI achieved ISO 27001:2022 certification, reinforcing its security posture alongside SOC 2 compliance for global platform infrastructure and key facilities serving enterprise and regulated industries.
Latency optimization for LLM-based voice agents targets 200–300 milliseconds time to first token, with user satisfaction declining if delays exceed 500 milliseconds and calls potentially abandoned after one second.
Overall, the week shows Together AI pushing low-latency, cost-efficient inference, broader model coverage, stronger security, and expanded go-to-market channels to strengthen competitiveness and potential revenue growth.
Summary based on 1 source
Get a daily email with more AI stories
Source

Tipranks • Jun 13, 2026
Together AI – Weekly Recap