Together AI Achieves Major AI Inference Breakthroughs with MiniMax-M3, Expands Global Reach with Pax8 Partnership

June 13, 2026
Together AI Achieves Major AI Inference Breakthroughs with MiniMax-M3, Expands Global Reach with Pax8 Partnership
  • Together AI is profiled as an AI infrastructure provider focused on high-performance inference, with a weekly recap highlighting platform, performance, and compliance developments aimed at AI agents, multimodal workloads, and latency-sensitive voice applications.

  • The MiniMax-M3 model demonstrated major inference efficiency gains, delivering 81 to 125 percent throughput improvements across common agent-like traffic patterns thanks to architectural and kernel-level optimizations, including a 1 million-token context window and native multimodality.

  • Together AI supported Deep Cogito’s large-scale open-weight reasoning models (ranging from 8B to 671B parameters), enabling sub-500 millisecond time-to-first-token at over 1,000 requests per minute with 99.9% uptime, with quantized variants available in under two weeks.

  • A Pax8 partnership was announced to distribute Together AI’s AI Native Cloud globally through a channel targeting small and mid-sized businesses, expanding access to enterprise-grade AI infrastructure and 200+ open-source models.

  • Support was added for NVIDIA Nemotron 3 Ultra and Nemotron 3.5 ASR models on Together AI's AI Native Cloud, broadening options for AI agents and multilingual voice applications via a serverless inference stack optimized for high throughput and low latency.

  • Infrastructure placement matters: colocated components offer around 5 milliseconds latency versus about 75 milliseconds when models run in a separate data center, highlighting the impact of network design and hardware proximity for real-time use cases.

  • TipRanks MCP for Agents enables delivering institutional-grade market data into Claude, ChatGPT, Cursor, and other MCP-compatible tools, supporting personal research, portfolio monitoring, and AI-assisted investment workflows.

  • Together AI achieved ISO 27001:2022 certification, reinforcing its security posture alongside SOC 2 compliance for global platform infrastructure and key facilities serving enterprise and regulated industries.

  • Latency optimization for LLM-based voice agents targets 200–300 milliseconds time to first token, with user satisfaction declining if delays exceed 500 milliseconds and calls potentially abandoned after one second.

  • Overall, the week shows Together AI pushing low-latency, cost-efficient inference, broader model coverage, stronger security, and expanded go-to-market channels to strengthen competitiveness and potential revenue growth.

Summary based on 1 source


Get a daily email with more AI stories

Source

Together AI – Weekly Recap

Tipranks • Jun 13, 2026

Together AI – Weekly Recap

More Stories