Microsoft has officially revealed three new AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—marking a significant expansion of its in-house AI capabilities. These models are now available through Microsoft Foundry and the MAI Playground, giving developers immediate access to faster and more cost-efficient AI tools across speech, voice, and image generation.

As detailed in a new Microsoft blog post, the company is positioning these MAI models as “world-class” alternatives that outperform competitors on both speed and price-performance. The move signals Microsoft’s growing ambition to reduce reliance on third-party AI providers while strengthening its own ecosystem across Copilot, Azure, and enterprise tools.
MAI-Transcribe-1 Leads in Speech AI

MAI-Transcribe-1 is Microsoft’s latest speech-to-text model, and it’s designed to handle real-world audio—not just clean recordings. It supports the top 25 most-used languages and ranks among the best in accuracy based on the FLEURS benchmark.
Key highlights include:
- Up to 2.5× faster transcription than Azure’s previous fast models
- Lower word error rate than competing models like Whisper-large-v3 and Gemini 3.1 Flash
- Optimized for noisy, real-world environments
Microsoft is also aggressively pricing it at just (0.36) dollars per hour, making it one of the most affordable enterprise-grade transcription tools available.
MAI-Voice-1 Brings Realistic AI Speech

MAI-Voice-1 focuses on high-quality voice generation, with an emphasis on emotional nuance and identity preservation. In simple terms, it sounds more human—and stays consistent even in longer audio outputs.
What stands out:
- Generates 60 seconds of audio in just 1 second
- Supports custom voice creation using only a few seconds of sample audio
- Designed for AI agents, assistants, and content creation tools
This model is already being integrated into experiences like Copilot Audio Expressions and Copilot Podcasts, hinting at Microsoft’s broader plans for AI-driven media.
MAI-Image-2 Speeds Up AI Creativity

On the visual side, MAI-Image-2 is Microsoft’s newest image generation model, built for designers, marketers, and content creators who need both speed and realism.
Improvements include:
- At least 2× faster image generation compared to previous models
- Enhanced lighting, texture, and accurate skin tones
- Improved rendering of in-image text for diagrams and layouts
Major companies are already taking notice. WPP, one of the world’s largest marketing firms, is using MAI-Image-2 to create campaign-ready visuals at scale, calling it a “game-changer” for creative workflows.
Built for Foundry and Enterprise Use
All three models are deeply integrated into Microsoft Foundry, the company’s AI development platform. Developers get built-in tools for governance, safety, and compliance—key requirements for enterprise adoption.
Pricing is also designed to undercut competitors:
- MAI-Transcribe-1: (0.36) dollars per hour
- MAI-Voice-1: (22) dollars per 1 million characters
- MAI-Image-2: (5) dollars per 1M text tokens and (33) dollars per 1M image tokens
Why This Matters
This launch isn’t just about new features—it’s about control. Microsoft is clearly investing in its own AI stack to compete more directly with OpenAI, Google, and others while lowering costs for developers using its ecosystem.
With tighter integration into Copilot, Azure, and Microsoft 365, these MAI models could quickly become the backbone of future AI experiences across both consumer and enterprise products.
If performance holds up at scale, Microsoft may have just taken a major step toward owning more of the AI pipeline—from model to product.
Recent Posts You Might Like
- Microsoft 365 April 2026: Important Retirements, Security Tightening, and Copilot Upgrades Admins Must Act On
- Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research
- Xbox FanFest Goes on Thrilling Global Tour to Celebrate 25 Years of Play
Discover more from Microsoft News Now
Subscribe to get the latest posts sent to your email.