Whisper and its many forms

HELLO THERE, sorry for my late reply, the workbook is currently under maintenance, i'll let you know
 
If it says torchcodec is missing even though it's in the installation list on step 1, then step 1 must have an error that needs fixing.
 
  • Like
Reactions: DocNic
I took mei2's Colab code and made a FasterWhisperWithVAD_pro, also with checkpoint resume functionality. Sacrifices some quality for significantly faster speed. 4-5x faster than normal Whisper at similar beam_size. I find large-v2 gives better results overall on normal Whisper, but on Faster Whisper, that's not the case. Use large-v3 when using Faster Whisper.

Gemini Pro helped me write a FasterWhisper with VAD Python script. "faster_whisper_local.py", can run on your local PC if you have a CUDA gpu and pytorch installed. Tested on my RTX 3070. Silero VAD gives better results than faster-whisper's built-in VAD, that's why I use it. The transcription results won't beat normal Whisper but it's more efficient. If can help get the most complete subtitle.

My workflow for the most complete subtitle involves using the normal WhisperWithVAD_pro, then on my local PC running different beam sizes on the same audio file, then merging non-overlapping timecodes. To use: put config.yaml inside the same directory as faster_whisper_local.py. If you want to run 2 profiles for example:
python faster_whisper_local.py --profiles beam_16 beam_5_wide

Higher beam size produces more accurate results but it actually captures fewer subtitles as it will decide that silence is the outcome when it's not confident.

I also have a Python script to find non-overlapping timecodes between two srt files and merge them.

Hi there, how do I install this? The google docs has some downloads for whisper but when I dl and open them (with chrome) I just get a bunch of code.
Thanks in advance!