Whisper and its many forms

Is anyone else having a problem with the runtime constantly dropping? It usually takes over an hour (used to be 30 min) to do a sub. Sometimes it's so bad that I can't even upload a file, since it keeps losing connection. I'm using VADPro
 
If you're comparing VADPro with the other VADs, then yes, it's normal since it has different defaults which should provide a better result, at the price of taking longer.

If you're comparing to VADPro from some time ago, I don't use it often enough to tell if anything changed.

It's also possible you got a different GPU type since there's a few that can be assigned to you and they would have different performance.
 
Only way I know is to disconnect and then reconnect to a different session and pray. But every time I've used the colab(as a free user, paid may be different), I've only ever got a T4 so I don't know how likely it is that you get different ones.
 
  • Like
Reactions: DocNic
Hello all,

FYI, an updated version of DeepSeek V3 has been released on the website and API.

I used the new version with my go-to test subtitle, DDB-271 transcribed using WhisperWithVAD Pro, and I think the updated version does better translation.

I've attached a zip with two translated and the transcribed SRTs. One translated using the original DeepSeek V3 and one using the updated V3-0324. Neither translated version has any manual editing or cleanup.

Here is the changelog:
 

Attachments

  • Like
Reactions: idolfan and mei2
Hello all, I'm using WhisperWithVAD_PRO and getting the following error:
"<span><span>ValueError:&nbsp;numpy.dtype&nbsp;size&nbsp;changed,&nbsp;may&nbsp;indicate</span><span>&nbsp;binary&nbsp;incompatibility.&nbsp;Expected&nbsp;</span><span>96</span> <span>from</span><span>&nbsp;C&nbsp;header,&nbsp;got&nbsp;</span><span>88</span> <span>from</span><span>&nbsp;PyObject</span></span>"

It feels like the same issue when they changed the version of torch.
 
Hello all, I'm using WhisperWithVAD_PRO and getting the following error:
"ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject"

It feels like the same issue when they changed the version of torch.

As a workaround for now, you can comment out(add a # at the beginning of the line to disable it) the spleeter installation line in the "setup whisper" section(click show code to edit) and it should look like this:
Disable_spleeter.jpg

You need to run the code(press the play button) after you've done this. If you've already ran the code, you'll need to run it again.

Spleeter is used to separate the voice from the background noise.
It's not used by default so unless you set "source_separation" to True(meaning you check the checkbox next to it), it won't change anything(and if you do enable it after disabling the spleeter install, it will give an error).
 
Last edited:
As a workaround for now, you can comment out(add a # at the beginning of the line to disable it) the spleeter installation line in the "setup whisper" section(click show code to edit) and it should look like this:
View attachment 3650116

You need to run the code(press the play button) after you've done this. If you've already ran the code, you'll need to run it again.

Spleeter is used to separate the voice from the background noise.
It's not used by default so unless you set "source_separation" to True(meaning you check the checkbox next to it), it won't change anything(and if you do enable it after disabling the spleeter install, it will give an error).

Upgrading numpy, pandas and tensorflow also seems to resolve the error. Not sure if it's actually necessary to upgrade all of them.

Like so:
1743779437715.png
 
  • Like
Reactions: Arisaka Sana
SamKooK and Novus.Toto, thank you so much. Both solutions worked, really appreciate the prompt replies and visual aids!
 
  • Like
Reactions: Novus.Toto
Upgrading numpy, pandas and tensorflow also seems to resolve the error. Not sure if it's actually necessary to upgrade all of them.

Like so:
View attachment 3650292
I tried this but got this error. I'm not a computer whiz. Any ideas?

NameError Traceback (most recent call last)

<ipython-input-3-754953b95a0a> in <cell line: 0>()
35 out_path = os.path.splitext(audio_path)[0] + ".srt"
36 out_path_pre = os.path.splitext(audio_path)[0] + "_Untranslated.srt"
---> 37 if source_separation:
38 print("Separating vocals...")
39 get_ipython().system('ffprobe -i "{audio_path}" -show_entries format=duration -v quiet -of csv="p=0" > input_length')


NameError: name 'source_separation' is not defined
 
That error is because the script can't see the variables from the first code block, you just have to execute that, at the top of the page.
 
  • Like
Reactions: mei2 and Novus.Toto
With an extra step of restart session, you can avoid the "numpy.dtype size changed" error without needing to upgrade those packages. The workflow is mount Google Drive, setup, click runtime>restart session, set parameters, then run Whisper.

In WhisperPro, I use an initial_prompt = "これは日本のアダルトビデオです。喘ぎ声、卑猥な表現、スラング(イク、中出し、気持ちいい、お願い、イッちゃうなど)を正確に文字起こししてください。" It translates to: This is a Japanese adult video. Please accurately transcribe the moans, obscene expressions, and slang (such as iku, nakadashi, kimochiii, onegai, icchau, etc.). I do find that it improves the transcription.

Updated! I got AI to refine the prompt. Now I use initial_prompt = "これは日本AV。登場人物の会話、喘ぎ声、卑猥なスラングや独り言を、一言一句省略せずに全て正確に文字起こししてください。特に次の言葉や類似表現に注意してください:ああん、イク、イッちゃう、中出し、気持ちいい、お願い、ダメ、もっと、やばい、突いて、んっ、ふぅ、ちんちん、マンコ、ザーメンなど"
 
Last edited:
  • Like
Reactions: mei2 and Novus.Toto
Gemini Pro helped me add checkpoint resume functionality to Mei2's WhisperWithVAD_pro Colab code. Now it saves a checkpoint file and partial srt periodically. Ever been frustrated when you reached the time limit and all progress is lost? If you crash from memory issues, reach the time limit (probably a little over 4 hours per day) or close your tab, don't worry. Use a different Google account or just wait the next day, and click Run All, if it sees those two files in WhisperJAV folder then it will resume. Check it out and see what settings I like to use. Note I removed Spleeter and Deepl functionality, things I didn't use.

Edit: It needs an audio_folder path instead of a single audio file path. It looks for every m4a, wav, flac, or mp3 in the audio_folder "/content/drive/My Drive/WhisperJAV"

Edit2: I reuploaded it. I incorrectly loaded the vad and whisper models in the loop. This caused out of memory issues for the next audio file. I moved the loading of the models to outside of the loop and added cleanup memory after completion of each file.
 

Attachments

Last edited:
Gemini Pro helped me add checkpoint resume functionality to Mei2's WhisperWithVAD_pro Colab code. Now it saves a checkpoint file and partial srt periodically. Ever been frustrated when you reached the time limit and all progress is lost? If you crash from memory issues, reach the time limit (probably a little over 4 hours per day) or close your tab, don't worry. Use a different Google account or just wait the next day, and click Run All, if it sees those two files in WhisperJAV folder then it will resume. Check it out and see what settings I like to use. Note I removed Spleeter and Deepl functionality, things I didn't use.
So does that mean I don’t need to worry if Colab crashes or restarts? It’s really frustrating when it hits the daily usage limit and I have to start over just to get different results. Does it save the project automatically?
 
So does that mean I don’t need to worry if Colab crashes or restarts? It’s really frustrating when it hits the daily usage limit and I have to start over just to get different results. Does it save the project automatically?
Yes, all results are saved in the partial.srt and ckpt.json found inside /WhisperJAV. Don't move or edit those files as it's running.
 
  • Like
Reactions: ArtemisINFJ
Yes, all results are saved in the partial.srt and ckpt.json found inside /WhisperJAV. Don't move or edit those files as it's running.
Thank you!