Whisper (OpenAI) - Automatic English Subtitles for Any Film in Any Language - An Intro & Guide to Subtitling JAV

I am trying to transcribe NEO-017.mp4, is it normal to wait longer than hour?
Performance depends on your hardware and the specific software(there's tons of whisper variations with different models) you're using so that's impossible to answer unless you share that information first.
 
  • Like
Reactions: mei2
Edited: Depends on what pipeline you're using. NEO-017.mp4 is 144 minutes. As SamKook said it is impossible to say by sure. But if I need to make a guess based on the T4 Colab env, I would say:

Fidelity | Agressive : 40min - 50min
Balanced | Agressive : 15min - 20min

Plus it usually takes 5-10min do the install and load.
 
Edited: Depends on what pipeline you're using. NEO-017.mp4 is 144 minutes. As SamKook said it is impossible to say by sure. But if I need to make a guess based on the T4 Colab env, I would say:

Fidelity | Agressive : 40min - 50min
Balanced | Agressive : 15min - 20min

Plus it usually takes 5-10min do the install and load.

Hi,

I tested it with 4 differents movies and everything works well. Took around 43 min on Fidelity/agressive for each movies, thank you for the fix @mei2

Note: During downtime when they are not talking, Whisper AI generate a full japanese recipe ahaha. Example: I used the leftover rice and mixed it with soy sauce and ginger and more sentences of that type.

Am I using the wrong setting and am I the only one who get these hallucination from Whisper? Not a big deal tbh, i'm just wondering

-Besh
 
  • Like
Reactions: ToastFrench
Hi,

I tested it with 4 differents movies and everything works well. Took around 43 min on Fidelity/agressive for each movies, thank you for the fix @mei2

Note: During downtime when they are not talking, Whisper AI generate a full japanese recipe ahaha. Example: I used the leftover rice and mixed it with soy sauce and ginger and more sentences of that type.

Am I using the wrong setting and am I the only one who get these hallucination from Whisper? Not a big deal tbh, i'm just wondering

-Besh

The aggressive sensitivity does have more risk of hallucination. The flood gates are open with agressive, and actually I widened it even more for the latest release (which I think I went too far). I suggest you use Auditok scene detect, and TEN VAD sementer for the aggressive sensitivity.
 
Hi,

I tested it with 4 differents movies and everything works well. Took around 43 min on Fidelity/agressive for each movies, thank you for the fix @mei2

Note: During downtime when they are not talking, Whisper AI generate a full japanese recipe ahaha. Example: I used the leftover rice and mixed it with soy sauce and ginger and more sentences of that type.

Am I using the wrong setting and am I the only one who get these hallucination from Whisper? Not a big deal tbh, i'm just wondering

-Besh
I got a bread recipe once, but the most common hallucinations I'm seeing are "Please subscribe to my Channel! Follow me on Twitter!"
 
I got a bread recipe once, but the most common hallucinations I'm seeing are "Please subscribe to my Channel! Follow me on Twitter!"

Hhmm, the sanitization filters should have caught those two hallucinations. Do you get those in Japanese transcription or in direct to English?
 
  • Like
Reactions: ToastFrench
Hhmm, the sanitization filters should have caught those two hallucinations. Do you get those in Japanese transcription or in direct to English?
Using the Transcription Mode Direct To English.
I used the same video trying the three different Balanced Modes-the video had a pretty long music interlude at the start (over 2 minutes), then the first dialogue is a phone call (you only hear the voice on the phone, and it's purposefully fuzzy to indicate its a phone call)

During the Interlude (no dialogue) on Aggressive I got a long weird doll making instruction or something-13 lines long! "Put the pins in and attach the head part to the body", etc. It goes on for a while, lol. But it also captured the fuzzy phone conversation most accurately.

On Balanced I got "Please subscribe to our channel and follow us on Twitter", and "Please subscribe to our channel" and "Please subscribe to my channel and give it a high rating." It also captured the phone conversation somewhat.

On Conservative I got "Please subscribe to my channel!" just twice, but it also completely missed the phone conversation.

Seems like this video would be a good test of your Ensemble mode-possibly running Aggressive and then Balanced would catch the phone conversation and eliminate the weird doll making instructions. However, I can't get it to translate in Ensemble mode (I can't get Local LLM to work, and don't have a subscription to any of the services yet).

Edit: Hey, just checked your site-you are up to 1.8.10? I'm still on 1.8.8 on the Mac; I'll update.
 
Last edited:
Seems like this video would be a good test of your Ensemble mode-possibly running Aggressive and then Balanced would catch the phone conversation and eliminate the weird doll making instructions. However, I can't get it to translate in Ensemble mode (I can't get Local LLM to work, and don't have a subscription to any of the services yet).

Edit: Hey, just checked your site-you are up to 1.8.10? I'm still on 1.8.8 on the Mac; I'll update.

Yes, please send me the link to video or the ID.

My sanitization filters for direct to English are not as strong as for Japanese. If you keep track of the hallucination phrases please send it to me, I add it to the filter.
 
  • Like
Reactions: ToastFrench
Yes, please send me the link to video or the ID.

My sanitization filters for direct to English are not as strong as for Japanese. If you keep track of the hallucination phrases please send it to me, I add it to the filter.
Done!
"Please subscribe to my channel!" is really common with all three Balanced modes, whenever there is dead space or just music.
When I come across more I'll send it to you either here or on your Github.
 
It's really a good idea to have like a collection of movies that has settings work best honestly or like a place we can contribute best result if you know what i mean :).