Whisper and its many forms

Electromog · Jan 18, 2025

SamKook said:
To speed things up, I'd look into whispercpp, it should be faster without any quality loss.

Thanks, I'll give that one a go.

DocNic · Mar 4, 2025

Is anyone else having a problem with the runtime constantly dropping? It usually takes over an hour (used to be 30 min) to do a sub. Sometimes it's so bad that I can't even upload a file, since it keeps losing connection. I'm using VADPro

SamKook · Mar 4, 2025

If you're comparing VADPro with the other VADs, then yes, it's normal since it has different defaults which should provide a better result, at the price of taking longer.

If you're comparing to VADPro from some time ago, I don't use it often enough to tell if anything changed.

It's also possible you got a different GPU type since there's a few that can be assigned to you and they would have different performance.

DocNic · Mar 4, 2025

Thanks SamKook. Is there a way to switch the GPU type?

SamKook · Mar 4, 2025

Only way I know is to disconnect and then reconnect to a different session and pray. But every time I've used the colab(as a free user, paid may be different), I've only ever got a T4 so I don't know how likely it is that you get different ones.

Novus.Toto · Mar 25, 2025

Hello all,

FYI, an updated version of DeepSeek V3 has been released on the website and API.

I used the new version with my go-to test subtitle, DDB-271 transcribed using WhisperWithVAD Pro, and I think the updated version does better translation.

I've attached a zip with two translated and the transcribed SRTs. One translated using the original DeepSeek V3 and one using the updated V3-0324. Neither translated version has any manual editing or cleanup.

Here is the changelog:

Change Log | DeepSeek API Docs

Version: 2025-03-24

api-docs.deepseek.com

DocNic · Apr 4, 2025

Hello all, I'm using WhisperWithVAD_PRO and getting the following error:
"ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject"

It feels like the same issue when they changed the version of torch.

SamKook · Apr 4, 2025

DocNic said:
Hello all, I'm using WhisperWithVAD_PRO and getting the following error:
"ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject"

It feels like the same issue when they changed the version of torch.

As a workaround for now, you can comment out(add a # at the beginning of the line to disable it) the spleeter installation line in the "setup whisper" section(click show code to edit) and it should look like this:

You need to run the code(press the play button) after you've done this. If you've already ran the code, you'll need to run it again.

Spleeter is used to separate the voice from the background noise.
It's not used by default so unless you set "source_separation" to True(meaning you check the checkbox next to it), it won't change anything(and if you do enable it after disabling the spleeter install, it will give an error).

Novus.Toto · Apr 4, 2025

SamKook said:
As a workaround for now, you can comment out(add a # at the beginning of the line to disable it) the spleeter installation line in the "setup whisper" section(click show code to edit) and it should look like this:
View attachment 3650116

You need to run the code(press the play button) after you've done this. If you've already ran the code, you'll need to run it again.

Spleeter is used to separate the voice from the background noise.
It's not used by default so unless you set "source_separation" to True(meaning you check the checkbox next to it), it won't change anything(and if you do enable it after disabling the spleeter install, it will give an error).

Upgrading numpy, pandas and tensorflow also seems to resolve the error. Not sure if it's actually necessary to upgrade all of them.

Like so:

DocNic · Apr 4, 2025

SamKooK and Novus.Toto, thank you so much. Both solutions worked, really appreciate the prompt replies and visual aids!

MrKid · Apr 29, 2025

maybe it requires update, i tried this and it worked

composite · May 25, 2025

Novus.Toto said:
Upgrading numpy, pandas and tensorflow also seems to resolve the error. Not sure if it's actually necessary to upgrade all of them.

Like so:
View attachment 3650292

I tried this but got this error. I'm not a computer whiz. Any ideas?

NameError Traceback (most recent call last)

<ipython-input-3-754953b95a0a> in <cell line: 0>()
35 out_path = os.path.splitext(audio_path)[0] + ".srt"
36 out_path_pre = os.path.splitext(audio_path)[0] + "_Untranslated.srt"
---> 37 if source_separation:
38 print("Separating vocals...")
39 get_ipython().system('ffprobe -i "{audio_path}" -show_entries format=duration -v quiet -of csv="p=0" > input_length')

NameError: name 'source_separation' is not defined

SamKook · May 25, 2025

That error is because the script can't see the variables from the first code block, you just have to execute that, at the top of the page.

ironfevers · May 27, 2025

With an extra step of restart session, you can avoid the "numpy.dtype size changed" error without needing to upgrade those packages. The workflow is mount Google Drive, setup, click runtime>restart session, set parameters, then run Whisper.

In WhisperPro, I use an initial_prompt = "これは日本のアダルトビデオです。喘ぎ声、卑猥な表現、スラング（イク、中出し、気持ちいい、お願い、イッちゃうなど）を正確に文字起こししてください。" It translates to: This is a Japanese adult video. Please accurately transcribe the moans, obscene expressions, and slang (such as iku, nakadashi, kimochiii, onegai, icchau, etc.). I do find that it improves the transcription.

Updated! I got AI to refine the prompt. Now I use initial_prompt = "これは日本AV。登場人物の会話、喘ぎ声、卑猥なスラングや独り言を、一言一句省略せずに全て正確に文字起こししてください。特に次の言葉や類似表現に注意してください：ああん、イク、イッちゃう、中出し、気持ちいい、お願い、ダメ、もっと、やばい、突いて、んっ、ふぅ、ちんちん、マンコ、ザーメンなど"

ironfevers · Jun 16, 2025

Gemini Pro helped me add checkpoint resume functionality to Mei2's WhisperWithVAD_pro Colab code. Now it saves a checkpoint file and partial srt periodically. Ever been frustrated when you reached the time limit and all progress is lost? If you crash from memory issues, reach the time limit (probably a little over 4 hours per day) or close your tab, don't worry. Use a different Google account or just wait the next day, and click Run All, if it sees those two files in WhisperJAV folder then it will resume. Check it out and see what settings I like to use. Note I removed Spleeter and Deepl functionality, things I didn't use.

Edit: It needs an audio_folder path instead of a single audio file path. It looks for every m4a, wav, flac, or mp3 in the audio_folder "/content/drive/My Drive/WhisperJAV"

Edit2: I reuploaded it. I incorrectly loaded the vad and whisper models in the loop. This caused out of memory issues for the next audio file. I moved the loading of the models to outside of the loop and added cleanup memory after completion of each file.

ArtemisINFJ · Jun 16, 2025

ironfevers said:
Gemini Pro helped me add checkpoint resume functionality to Mei2's WhisperWithVAD_pro Colab code. Now it saves a checkpoint file and partial srt periodically. Ever been frustrated when you reached the time limit and all progress is lost? If you crash from memory issues, reach the time limit (probably a little over 4 hours per day) or close your tab, don't worry. Use a different Google account or just wait the next day, and click Run All, if it sees those two files in WhisperJAV folder then it will resume. Check it out and see what settings I like to use. Note I removed Spleeter and Deepl functionality, things I didn't use.

So does that mean I don’t need to worry if Colab crashes or restarts? It’s really frustrating when it hits the daily usage limit and I have to start over just to get different results. Does it save the project automatically?

ironfevers · Jun 16, 2025

ArtemisINFJ said:
So does that mean I don’t need to worry if Colab crashes or restarts? It’s really frustrating when it hits the daily usage limit and I have to start over just to get different results. Does it save the project automatically?

Yes, all results are saved in the partial.srt and ckpt.json found inside /WhisperJAV. Don't move or edit those files as it's running.

ArtemisINFJ · Jun 16, 2025

ironfevers said:
Yes, all results are saved in the partial.srt and ckpt.json found inside /WhisperJAV. Don't move or edit those files as it's running.

Thank you!

ironfevers · Jun 17, 2025

I reuploaded it. See my edit.

ArtemisINFJ · Jun 17, 2025

ironfevers said:
I reuploaded it. See my edit.

What's your edit? I'm using the one that you've uploaded before

Whisper and its many forms

Akiba Citizen

Member

Grand Wizard

Member

Grand Wizard

Active Member

Attachments

Member

Grand Wizard

Active Member

Member

New Member

Active Member

Grand Wizard

Active Member

Active Member

Attachments

God Slayer, Dawnbreaker

Active Member

God Slayer, Dawnbreaker

Active Member

God Slayer, Dawnbreaker