Microsoft Launches MAI-Transcribe-1, Voice-1, and Image-2: Faster, Cheaper AI Models Hit Foundry

Microsoft Launches MAI-Transcribe-1, Voice-1, and Image-2: Faster, Cheaper AI Models Hit Foundry

User avatar placeholder
Written by Dave W. Shanahan

April 2, 2026

Microsoft has officially revealed three new AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—marking a significant expansion of its in-house AI capabilities. These models are now available through Microsoft Foundry and the MAI Playground, giving developers immediate access to faster and more cost-efficient AI tools across speech, voice, and image generation.

Microsoft Launches MAI-Transcribe-1, Voice-1, and Image-2: Faster, Cheaper AI Models Hit Foundry

As detailed in a new Microsoft blog post, the company is positioning these MAI models as “world-class” alternatives that outperform competitors on both speed and price-performance. The move signals Microsoft’s growing ambition to reduce reliance on third-party AI providers while strengthening its own ecosystem across Copilot, Azure, and enterprise tools.

MAI-Transcribe-1 Leads in Speech AI

Microsoft Launches MAI-Transcribe-1, Voice-1, and Image-2: Faster, Cheaper AI Models Hit Foundry

MAI-Transcribe-1 is Microsoft’s latest speech-to-text model, and it’s designed to handle real-world audio—not just clean recordings. It supports the top 25 most-used languages and ranks among the best in accuracy based on the FLEURS benchmark.

Key highlights include:

  • Up to 2.5× faster transcription than Azure’s previous fast models
  • Lower word error rate than competing models like Whisper-large-v3 and Gemini 3.1 Flash
  • Optimized for noisy, real-world environments

Microsoft is also aggressively pricing it at just (0.36) dollars per hour, making it one of the most affordable enterprise-grade transcription tools available.

MAI-Voice-1 Brings Realistic AI Speech

Microsoft Launches MAI-Transcribe-1, Voice-1, and Image-2: Faster, Cheaper AI Models Hit Foundry

MAI-Voice-1 focuses on high-quality voice generation, with an emphasis on emotional nuance and identity preservation. In simple terms, it sounds more human—and stays consistent even in longer audio outputs.

What stands out:

  • Generates 60 seconds of audio in just 1 second
  • Supports custom voice creation using only a few seconds of sample audio
  • Designed for AI agents, assistants, and content creation tools

This model is already being integrated into experiences like Copilot Audio Expressions and Copilot Podcasts, hinting at Microsoft’s broader plans for AI-driven media.

MAI-Image-2 Speeds Up AI Creativity

Transcribe-1

On the visual side, MAI-Image-2 is Microsoft’s newest image generation model, built for designers, marketers, and content creators who need both speed and realism.

Improvements include:

  • At least 2× faster image generation compared to previous models
  • Enhanced lighting, texture, and accurate skin tones
  • Improved rendering of in-image text for diagrams and layouts

Major companies are already taking notice. WPP, one of the world’s largest marketing firms, is using MAI-Image-2 to create campaign-ready visuals at scale, calling it a “game-changer” for creative workflows.

Built for Foundry and Enterprise Use

All three models are deeply integrated into Microsoft Foundry, the company’s AI development platform. Developers get built-in tools for governance, safety, and compliance—key requirements for enterprise adoption.

Pricing is also designed to undercut competitors:

  • MAI-Transcribe-1: (0.36) dollars per hour
  • MAI-Voice-1: (22) dollars per 1 million characters
  • MAI-Image-2: (5) dollars per 1M text tokens and (33) dollars per 1M image tokens

Why This Matters

This launch isn’t just about new features—it’s about control. Microsoft is clearly investing in its own AI stack to compete more directly with OpenAI, Google, and others while lowering costs for developers using its ecosystem.

With tighter integration into Copilot, Azure, and Microsoft 365, these MAI models could quickly become the backbone of future AI experiences across both consumer and enterprise products.

If performance holds up at scale, Microsoft may have just taken a major step toward owning more of the AI pipeline—from model to product.

Recent Posts You Might Like


Discover more from Microsoft News Now

Subscribe to get the latest posts sent to your email.

Image placeholder

I'm Dave W. Shanahan, a Microsoft enthusiast with a passion for Windows, Xbox, Microsoft 365 Copilot, Azure, and more. I started MSFTNewsNow.com to keep the world updated on Microsoft news. Based in Massachusetts, you can email me at davewshanahan@gmail.com.