Whisper Large v2
Multilingual speech recognition and translation with word-level timestamps
OpenAI's Whisper Large-v2 transcribes audio with high accuracy across 99+ languages and can translate speech directly to English. Can be used with base weights or a fine-tuned checkpoint.
When to use:
- Transcribing meetings, interviews, or voice recordings
- Multilingual content requiring automatic language detection
- Translating non-English audio to English text
- Generating subtitles with word-level timestamps
Input: Audio file (WAV, MP3, etc.) + optional fine-tuned checkpoint Output: Transcribed text and word-level timestamps
Model Settings
Sampling Rate (default: 16000) Audio sampling rate in Hz. Must match the audio file's actual sampling rate.
- 16000: Standard for speech — required by Whisper
- Resample audio before inference if it differs from 16000 Hz
Inference Settings
Language (default: en)
Language code for transcription (e.g., en, fr, de, zh, es).
- Set to the audio's language for best accuracy
- Leave as
enif the audio is English
Task (default: transcribe, options: transcribe / translate) What to do with the audio.
- transcribe: Output text in the original language of the audio
- translate: Translate the audio to English regardless of source language