Nvidia Corporation on Friday announced the release of a powerful new dataset and accompanying AI models aimed at advancing speech recognition and translation across 25 European languages, Azernews reports, citing foreign media.
The open-source dataset, named Granary, includes around 1 million hours of multilingual audio, making it one of the largest speech corpora available for European languages.
Alongside the dataset, Nvidia introduced two AI models:
NVIDIA Canary-1b-v2 – optimized for transcribing European languages using the Granary dataset, and
NVIDIA Parakeet-tdt-0.6b-v3 – designed for real-time transcription, supporting all languages included in Granary.
“These tools will help developers scale AI applications globally, providing fast and accurate speech capabilities for real-world use cases like multilingual chatbots, voice-based customer service agents, and near-instant translation tools,” the company said in a press release.