DeepL, the German company renowned for its AI-powered text translation services, has announced the launch of a comprehensive voice-to-voice translation suite. The new product, unveiled today, is designed for real-time use in business meetings, mobile conversations, and customer service scenarios. CEO Jarek Kutylowski stated the move into voice was a "natural step" for the company following years of refining its text translation technology.

The launch positions DeepL in direct competition with several well-funded AI startups specialising in speech synthesis and real-time translation. The company is also releasing an API, allowing external developers to build custom applications, such as for call centres, on top of DeepL's translation engine.

Balancing Speed and Accuracy

Kutylowski identified the core technical challenge as striking a balance between reducing latency – the delay between speech and translated audio – and maintaining high accuracy. The current system operates by converting speech to text, applying DeepL's translation models, and then converting the text back to speech. However, the company aims to develop an end-to-end model that bypasses the text step entirely.

"We thought there wasn’t a great product for real-time voice translation," Kutylowski told TechCrunch. He emphasised that DeepL's extensive experience in text translation gives it a significant edge in translation quality, a claim it will test in a newly competitive market.

Product Suite and Early Access

The initial product offerings include add-ons for major collaboration platforms like Zoom and Microsoft Teams. Participants in meetings can choose to hear real-time translated audio or read translated subtitles. This programme is currently in an early access phase, with organisations invited to join a waitlist.

DeepL also launched a product for one-on-one mobile and web conversations, usable both in person and remotely. A separate feature facilitates group discussions in settings like training workshops, where participants can join a translated session via a QR code. The company claims its AI can learn and adapt to custom vocabulary, including industry-specific jargon and proper names.

Market Context and Competition

DeepL enters a market with established players focusing on niche applications. Competitor Sanas, which raised $65 million last year, uses AI to modify call centre agents' accents in real time. Camb.AI, based in Dubai, specialises in dubbing and localising video content for media companies.

More direct competition comes from Palabra, backed by Reddit co-founder Alexis Ohanian’s venture firm. Palabra is building a real-time translation engine designed to preserve both meaning and the speaker's original vocal characteristics.

Kutylowski framed the launch within broader trends in customer service, suggesting AI translation layers will help companies provide support in languages where hiring qualified human staff is difficult or prohibitively expensive.