The Akiba-online English Sub Project★NOT A SUB REQUEST THREAD★

DScott · May 7, 2025

I'm looking for a little help tweaking my Whisper.bat and postprocessing python script. I am in the process of generating thumbs for about 20K videos to upload to Akiba. My specs are: I9-14900K/Nvidia 4080 Super (16G V-RAM)/96G DDR5 RAM/Windows 11 Pro.
What I am finding with my .py file, part of its post-processing regimen is to remove hallucinations but I am finding now that there are sections where the hallucination starts and then the video is clearly a dialog but when Whiper has already started the hallucination process in just breezes by the dialog and keeps creating hallucination which then gets removed during the python script stage. Attached are both my .bat and .py files. IF anyone can offer suggestions of how to configure it better to translate Japanese-English that would be super kean-o

mei2 · May 7, 2025

DScott said:
I'm looking for a little help tweaking my Whisper.bat and postprocessing python script. I am in the process of generating thumbs for about 20K videos to upload to Akiba. My specs are: I9-14900K/Nvidia 4080 Super (16G V-RAM)/96G DDR5 RAM/Windows 11 Pro.
What I am finding with my .py file, part of its post-processing regimen is to remove hallucinations but I am finding now that there are sections where the hallucination starts and then the video is clearly a dialog but when Whiper has already started the hallucination process in just breezes by the dialog and keeps creating hallucination which then gets removed during the python script stage. Attached are both my .bat and .py files. IF anyone can offer suggestions of how to configure it better to translate Japanese-English that would be super kean-o

I definitely envy your rig!

It looks like you're using "vanilla" Whisper, and mainly relying on the "no_speech_threshold" to reduce hallucination. You can get better results by using a VAD. If you haven't yet, I'd suggest to take a look at WhisperPRO. That has got one of the best implementation of VAD I have seen [curtosy of Anon_entity]. Alternatively StandaloneWhsiper does a very good job in Windows environment.

DScott · May 7, 2025

mei2 said:
I definitely envy your rig!

It looks like you're using "vanilla" Whisper, and mainly relying on the "no_speech_threshold" to reduce hallucination. You can get better results by using a VAD. If you haven't yet, I'd suggest to take a look at WhisperPRO. That has got one of the best implementation of VAD I have seen [curtosy of Anon_entity]. Alternatively StandaloneWhsiper does a very good job in Windows environment.

Thanks for responding Mei. I have tried to implement VAD but had some weird effects from it. I have recently been tryin to get faster-whisper working using the large v-3 model but what a nightmare. I have bent over backwards to try to make that work. I haven't looked at the Pro version because I've been told that these Hallucinations, which is the main problem, are still manifiest without manual/editing which, for me, defeats the point of the software. I'll keep pluggin along. I have several Hundred Subs already created so I'll likely wait until I have a Thousand or so and then upload them for the Akiba kids. Thanks again Mei.

Electromog · May 7, 2025

I went back to large v-2, it seems to work better for me than v-3. I should move to some form of faster whisper as even with a fast computer the regular version still takes quite a lot of time.

DScott · May 8, 2025

Electromog said:
I went back to large v-2, it seems to work better for me than v-3. I should move to some form of faster whisper as even with a fast computer the regular version still takes quite a lot of time.

<--- that is me nodding emphatically. I fluctuate back and forth between v-2 and v3. Originally Deepseek said that between the Two that V2 was better specifically with Japanese-English. Recently though Deepseek says no!, V3 is better for Japanese. Lately 'he' has been saying to switch to "faster_whisper" with V3 but I have had just a nightmare trying to get my .bat file (posted earlier in a zip file attachment) Pip says it is installed but I have tried about a Hundred Billion permutations of commands in my .bat file but no go I cannot get faster whisper to work. The thing about the V2-V3 file is that V3 does seem to pick up dialog that V2 misses but on the other hand Both of them are really flawed and often create gibberish or Hallucinations. My mindset has been that my main focus for seeking the best possible outcome is not for me but for Akiba, I could live with the slightly less accurate translations of V2 and worst case scenario if I ever encountered a specific file that I REALLY had to have the best posssible translation I could run it through V3. If you've looked at my "clean_subs.py" you will see that I have a few word replacements like noodles-->Cum, Juice-->Sperm, eventually I will add to that to create a better English environment, but back to the point. My objective is to upload these all to Akiba and so I use V3 in the hopes that the results will be the highest possible quality. That said, Switching to V2 would definitely save a lot of time, with about 20K files this process is going to take many Months to complete. You've got me thinking now Electromog. I was just about to do the NOD/Tokyo Gal- files. I think I'll run them at v2 and see how they go, They have a lot of interview/dialog so it would be a good test. Thanks for your remarks.

t221152 · Jun 9, 2025

DScott said:
I'm looking for a little help tweaking my Whisper.bat and postprocessing python script. I am in the process of generating thumbs for about 20K videos to upload to Akiba. My specs are: I9-14900K/Nvidia 4080 Super (16G V-RAM)/96G DDR5 RAM/Windows 11 Pro.
What I am finding with my .py file, part of its post-processing regimen is to remove hallucinations but I am finding now that there are sections where the hallucination starts and then the video is clearly a dialog but when Whiper has already started the hallucination process in just breezes by the dialog and keeps creating hallucination which then gets removed during the python script stage. Attached are both my .bat and .py files. IF anyone can offer suggestions of how to configure it better to translate Japanese-English that would be super kean-o

Bit of an old post, but I'll give my input

If a model starts hallucinating a lot, it's just random. I find it happens like once every 50-100 transcriptions. Redo the transcription and you'll find that suddenly those are gone. Coincidentally I had it happen today as well, it hallucinated like 100 lines in a row, redid the transcription and poof they're gone.

It's very easy to detect as well through scripting, it's almost always text between parentheses.
Either you detect it through your python script if X amount of lines are found with parentheses or if X amount of lines with parentheses were found in a row.
I personally let the translation LLM remove the hallucinations. I tell it that it's a whisper transcription and about some common mistakes it makes, and to remove any text within parentheses. It does cost you a bit more tokens ofcourse. If I notice it's omitting a lot of lines in the translation, it's usually because there was a lot of hallucinations and I'll just transcribe it again.

Search

Search

The Akiba-online English Sub Project★NOT A SUB REQUEST THREAD★

DScott

Well-Known Member

Attachments

mei2

Well-Known Member

DScott

Well-Known Member

Electromog

Akiba Citizen

DScott

Well-Known Member

t221152

Member