I stripped the audio into a mp3 file as normal from a reduced mosaic version of WAAA-501 and created this Sub with WhisperJAV0.7. Although some missing dialog but looks pretty normal Whisper Sub to me. Attached is the "raw" Sub with no editing.Well I just spent the last couple of Hours goofing around with WAAA-501. IT is very very weird. It is almost like there's a second audio track that his coded into the file and Whisper is decoding that. I tried V2 and V3, I tried upscaling the video to see if the re-encoding may resolve the issue. I'm flummoxed! The only thing I can suggest is have someone here manually create a subtitle file. There was a cat doing it a few Months ago but I cannot remember his name. I wish that i could have found the solution but no dice here.
There's a lot of dialogue in the first half hour, you have 3 lines in the first half hour.I stripped the audio into a mp3 file as normal from a reduced mosaic version of WAAA-501 and created this Sub with WhisperJAV0.7. Although some missing dialog but looks pretty normal Whisper Sub to me. Attached is the "raw" Sub with no editing.
PS: After watching the video a bit, I noticed some of the text has a very short duration time and is not seen in real time while watching the movie with the Sub. Suggest increasing the min duration limit to at least 2 seconds.
I am after the most perfect translation that i can find. I'll add the Word timestamps arg to my batch but could you please clarify for me about V2 vs. V3 because if I can get away with V2 and speed up the process that would be super-keano. Thanks.
You are right...there is definitely something squirrely about WAAA-501! Think I have wasted enough time on it too! lolThere's a lot of dialogue in the first half hour, you have 3 lines in the first half hour.
The file is only 10kb as well, I got 18kb worth of text out of it. It's how I immediately noticed something was off as well. I've downloaded the 4k version just now, see if anything changes but I doubt it, it's in the queue, will take a bit.
I've tried different VAD's on it (with super low VAD sensitivity) and also a different voice extraction method, it didnt change much. Will try it again on the the 4k version
...there is definitely something squirrely about WAAA-501
Maybe the JAV studios have found a way to screw Whisper Subs!?You're right. It seems that there is something very wiered about its audio coding. When the audio is converted to mono 16KHz audio, it looses almost all its waveform (see below). That is the cause of the problem with subtitling --Whipser uses mono 16KHz wav audio.
Original waveform:
View attachment 3671857
After mono resampling:
View attachment 3671856
I am now trying to find a workaround. Will update later.
Thanks for this detailed analysis Mei. The One trade-off with the hallucination clean-up (I'm assuming that you've looked at clean_subs.py) this is a work in progress and I will adjust it piecemeal as ideas come to me. My only slight qualm with the hallucination 'filter' is that during those moments where a hallucination has started but is carrying on over obvious dialog, I half think it would be better to at least have dialog, even wrong dialog, when it is obvious that someone is speaking instead of a blank screen where my script has removed the hallucination. It's probably not really relevant but it is an aspect of the filtering process that I am not happy with. I thought about adding some funny word replacements for my own enjoyment, 'have you had anal?-Do you enjoy a poke up the rump? but since these are primarily for others (Akiba) I have curtailed that inclination. The bottom line with Whisper is with its current fallibility you are relegated to good enough rather than, by jove I think I've got it. Right now I'm working on specific actress subs so , everything from Abe Mikado, Kanna Misaki, Lala Kudou, Airi/Meiri Twins, Nanami Nanase, and a bunch more that I can't remember right now (I"m on my laptop while my PC is hard at work creating subs) in any case the next bunch covers movies by actress, about 2000 titles I think, then I move on to my alphabetical folders which is just a mish-mash of everything and is somewhere in the vicinity of 20K files. This is going to be a very long process so if you discover potential tweaks to either my bat or py files to make them more effective, give me a shout. Thanks Mei.We peobbaly need to take this conversation to another thread so we don't hi-jack the main theme. The work that you're planning to do (and doing) is quite valuable and deserves more attention. I will spend more time on it during the weekend, but I thought of doing a quick exploration into your first set of the subs by running some stats on them. Here are some interesting stats:
(1) Hallucination: Your de-halluicnation seems to have worked quite well for known phrases. There doesn't seem to be too many of those occuring in the majority of the files:
View attachment 3671696
(2) Quality: Looking at the characters per seconds (CPS) metric can give an indication of the quality of the timing, or the sub. That can be one way of measuring the quality of a sub. I'd say anything below 10 and higher than 20 might be not optimal.
View attachment 3671723
(3) Quality: Another metric I looked at was for repetition. One metric can be Type-Token Ratio (TTR). This is a bit difficult metric for JAV (or any porn movie), as the vocabulary is not that vastHowever, one doesn't want the TTR to be too close to 0. That can indicate problems in Whisper output like repetition loop.
View attachment 3671724
These are just dumb stats with no knowledge of the genre, or type of movies. So it must be taken with grain of salt. It would be interesting to run the stats for each separate genre or series to see what are the characteristics.
Wow, that looks really great from a quick glance, but I can't compare this to my own as I don't know how to fix the audio problem.Haha ...
View attachment 3671878
I
Here is my sub for WAAA-501 after fixing the audio problem.
I used whisper-anime model for this one. As a new model, I'm interested to hear your comments about the result.
Apparently it's probably "phase cancellation". One of the track has the opposite polarity of the other and they cancel each other out when downmixing to mono. It can be fixed by separating the 2 tracks, inverting one of them(Effect->Special->Invert in audacity) and then merging them back.
For WAAA-501, it looks like one scene in the middle is correct and the rest of the audio is inverted so you'd need to select and invert the 2 bad parts separately(downmix a copy to mono to identify and create labels(ctrl+B) to easily select them after) for 1 track(apparently usually the right one is the problematic one) and then you can get a good full audio for whisper.
View attachment 3671964
worked great, thx !Apparently it's probably "phase cancellation". One of the track has the opposite polarity of the other and they cancel each other out when downmixing to mono. It can be fixed by separating the 2 tracks, inverting one of them(Effect->Special->Invert in audacity) and then merging them back.
For WAAA-501, it looks like one scene in the middle is correct and the rest of the audio is inverted so you'd need to select and invert the 2 bad parts separately(downmix a copy to mono to identify and create labels(ctrl+B) to easily select them after) for 1 track(apparently usually the right one is the problematic one) and then you can get a good full audio for whisper.
View attachment 3671964
Edit: Made a better picture
Wasn't there some problem with the timecodes on that model, you were able to fix it ? Or is this another one ?
Hi Sam,Apparently it's probably "phase cancellation". One of the track has the opposite polarity of the other and they cancel each other out when downmixing to mono. It can be fixed by separating the 2 tracks, inverting one of them(Effect->Special->Invert in audacity) and then merging them back.
For WAAA-501, it looks like one scene in the middle is correct and the rest of the audio is inverted so you'd need to select and invert the 2 bad parts separately(downmix a copy to mono to identify and create labels(ctrl+B) to easily select them after) for 1 track(apparently usually the right one is the problematic one) and then you can get a good full audio for whisper.
View attachment 3671964
Edit: Made a better picture
I agree with you here Sam. It seems that someone would deliberately encode audio merely to confuse Whisper like HDCP or DRM it just doesn't make any sense to go to that trouble.I don't think there's any reason for them to try and prevent subtitles from being made so it's more likely a bad job done by whoever edited it. I've seen some horrible things done on some videos so not everyone know what they're doing even when they are "professionals".