Whisper Large v2

OpenAI's Whisper Large-v2 transcribes audio with high accuracy across 99+ languages and can translate speech directly to English. Can be used with base weights or a fine-tuned checkpoint.

When to use:

Transcribing meetings, interviews, or voice recordings
Multilingual content requiring automatic language detection
Translating non-English audio to English text
Generating subtitles with word-level timestamps

Input: Audio file (WAV, MP3, etc.) + optional fine-tuned checkpoint Output: Transcribed text and word-level timestamps

Model Settings

Sampling Rate (default: 16000) Audio sampling rate in Hz. Must match the audio file's actual sampling rate.

16000: Standard for speech - required by Whisper
Resample audio before inference if it differs from 16000 Hz

Inference Settings

Language (default: en) Language code for transcription (e.g., en, fr, de, zh, es).

Set to the audio's language for best accuracy
Leave as en if the audio is English

Task (default: transcribe, options: transcribe / translate) What to do with the audio.

transcribe: Output text in the original language of the audio
translate: Translate the audio to English regardless of source language

Whisper Large v2

Model Settings

Inference Settings

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Whisper Large v2

Model Settings

Inference Settings

On this page

Command Palette