Microsoft AI, the research division of the tech giant, announced the release of three new foundational artificial intelligence models on Thursday. The models, capable of generating text, voice, and images, represent a significant step in Microsoft's strategy to develop its own comprehensive suite of multimodal AI tools and compete more directly with rivals like Google and OpenAI.
The announcement was made in a company press release and a blog post by Mustafa Suleyman, the CEO of Microsoft AI. The models are now available on Microsoft Foundry, the company's AI development platform, with some also accessible through the newly launched MAI Playground testing software.
Three Models for a Multimodal Push
The newly released models are MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. According to Microsoft, MAI-Transcribe-1 can transcribe speech across 25 languages into text and is 2.5 times faster than its existing Azure Fast service. MAI-Voice-1 is an audio-generating model capable of producing 60 seconds of audio in one second and allows for the creation of custom voices.
MAI-Image-2, a video-generating model, was first made available on MAI Playground on March 19. Its release on Foundry, alongside the other two models, marks their full commercial availability. The development was led by Microsoft's MAI Superintelligence team, a research unit formed in November 2025 and headed by Suleyman.
Human-Centric AI and Competitive Pricing
In his blog post, Suleyman outlined the philosophy behind the new releases. "At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use," he wrote, adding that more models would follow.
A key competitive angle highlighted by Microsoft is pricing. In a crowded large language model (LLM) market, the company positions these models as more cost-effective than offerings from Google and OpenAI. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 at $22 per 1 million characters, and MAI-Image-2 at $5 for 1 million tokens for text input and $33 for 1 million tokens for image output.
Maintaining the OpenAI Partnership
Despite this push for in-house model development, Suleyman reaffirmed Microsoft's commitment to its long-standing partnership with OpenAI. In an interview with VentureBeat, he noted that a recent renegotiation of their partnership terms has, in fact, enabled Microsoft to pursue this independent superintelligence research more aggressively.
Microsoft has invested over $13 billion in OpenAI and integrates its models across Microsoft products. The company adopts a similar dual-strategy with semiconductor chips, both producing its own and purchasing from external suppliers.