I took mei2's Colab code and made a FasterWhisperWithVAD_pro, also with checkpoint resume functionality. Sacrifices some quality for significantly faster speed. 4-5x faster than normal Whisper at similar beam_size. I find large-v2 gives better results overall on normal Whisper, but on Faster Whisper, that's not the case. Use large-v3 when using Faster Whisper.
Gemini Pro helped me write a FasterWhisper with VAD Python script. "faster_whisper_local.py", can run on your local PC if you have a CUDA gpu and pytorch installed. Tested on my RTX 3070. Silero VAD gives better results than faster-whisper's built-in VAD, that's why I use it. The transcription results won't beat normal Whisper but it's more efficient. If can help get the most complete subtitle.
My workflow for the most complete subtitle involves using the normal WhisperWithVAD_pro, then on my local PC running different beam sizes on the same audio file, then merging non-overlapping timecodes. To use: put config.yaml inside the same directory as faster_whisper_local.py. If you want to run 2 profiles for example:
python faster_whisper_local.py --profiles beam_16 beam_5_wide
Higher beam size produces more accurate results but it actually captures fewer subtitles as it will decide that silence is the outcome when it's not confident.
I also have a Python script to find non-overlapping timecodes between two srt files and merge them.
drive.google.com