Cohere announced the release of Transcribe on March 26, 2026, introducing a new state-of-the-art automatic speech recognition model. This open-source system targets high-accuracy transcription for enterprise applications across 14 supported languages. The model aims to reduce word error rates while maintaining production readiness for real-world deployment.
Key Details
The architecture utilizes a 2-billion parameter Conformer encoder-decoder design trained from scratch on diverse audio inputs. Unlike previous research artifacts, Cohere emphasizes manageability and serving efficiency for practical GPU utilization. Users can access the weights via Hugging Face or through the secure Model Vault inference platform.
Benchmarking data places Cohere Transcribe at the top of the HuggingFace Open ASR Leaderboard with an average word error rate of 5.42%. This performance surpasses closed-source alternatives including ElevenLabs Scribe v2 and OpenAI Whisper Large v3. The model demonstrates robustness across diverse accents and boardroom-style acoustics in testing.
Previous iterations in the market often traded accuracy for speed or required significant computational resources to function. Cohere claims this release extends the Pareto frontier by delivering high accuracy without sacrificing throughput capabilities. Human preference evaluations further confirm the model maintains meaning and avoids hallucination in real scenarios.
"We’re genuinely impressed with what Cohere has built with Transcribe. The speed is exceptional — turning minutes of audio into usable transcripts in seconds," said Paige Dickie, Vice-President of Radical Ventures. Her team tested the model for everyday speech handling and reported a smooth integration experience.
What This Means
Cohere plans deeper integration of Transcribe with North, its AI agent orchestration platform for enterprise workflows. Updates will evolve the system from a transcription tool into a broader foundation for enterprise speech intelligence. Production deployment options include dedicated Model Vault instances for low-latency private cloud inference.
This launch signals a shift toward open-weights models capable of competing with proprietary industry standards in the global market. Enterprises seeking cost-effective speech automation now have a viable alternative to expensive API calls from major providers. The Apache 2.0 license ensures full infrastructure control for organizations requiring strict data privacy standards.
Developers can download the model immediately to run locally or deploy it in edge environments with specific hardware. Free API access allows for low-setup experimentation subject to rate limits before moving to dedicated infrastructure. Documentation provides detailed integration guidance for teams integrating this technology into existing business systems.