Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

Well I just spent the last couple of Hours goofing around with WAAA-501. IT is very very weird. It is almost like there's a second audio track that his coded into the file and Whisper is decoding that. I tried V2 and V3, I tried upscaling the video to see if the re-encoding may resolve the issue. I'm flummoxed! The only thing I can suggest is have someone here manually create a subtitle file. There was a cat doing it a few Months ago but I cannot remember his name. I wish that i could have found the solution but no dice here.
I stripped the audio into a mp3 file as normal from a reduced mosaic version of WAAA-501 and created this Sub with WhisperJAV0.7. Although some missing dialog but looks pretty normal Whisper Sub to me. Attached is the "raw" Sub with no editing.

PS: After watching the video a bit, I noticed some of the text has a very short duration time and is not seen in real time while watching the movie with the Sub. Suggest increasing the min duration limit to at least 2 seconds.
 

Attachments

Last edited:
  • Like
Reactions: Imscully and mei2
I stripped the audio into a mp3 file as normal from a reduced mosaic version of WAAA-501 and created this Sub with WhisperJAV0.7. Although some missing dialog but looks pretty normal Whisper Sub to me. Attached is the "raw" Sub with no editing.

PS: After watching the video a bit, I noticed some of the text has a very short duration time and is not seen in real time while watching the movie with the Sub. Suggest increasing the min duration limit to at least 2 seconds.
There's a lot of dialogue in the first half hour, you have 3 lines in the first half hour.

The file is only 10kb as well, I got 18kb worth of text out of it. It's how I immediately noticed something was off as well. I've downloaded the 4k version just now, see if anything changes but I doubt it, it's in the queue, will take a bit.
I've tried different VAD's on it (with super low VAD sensitivity) and also a different voice extraction method, it didnt change much. Will try it again on the the 4k version
 
Last edited:
  • Like
Reactions: mei2 and Chuckie100
I am after the most perfect translation that i can find. I'll add the Word timestamps arg to my batch but could you please clarify for me about V2 vs. V3 because if I can get away with V2 and speed up the process that would be super-keano. Thanks.

We peobbaly need to take this conversation to another thread so we don't hi-jack the main theme. The work that you're planning to do (and doing) is quite valuable and deserves more attention. I will spend more time on it during the weekend, but I thought of doing a quick exploration into your first set of the subs by running some stats on them. Here are some interesting stats:


(1) Hallucination: Your de-halluicnation seems to have worked quite well for known phrases. There doesn't seem to be too many of those occuring in the majority of the files:



1747928710173.png




(2) Quality: Looking at the characters per seconds (CPS) metric can give an indication of the quality of the timing, or the sub. That can be one way of measuring the quality of a sub. I'd say anything below 10 and higher than 20 might be not optimal.

1747929074942.png





(3) Quality: Another metric I looked at was for repetition. One metric can be Type-Token Ratio (TTR). This is a bit difficult metric for JAV (or any porn movie), as the vocabulary is not that vast :) However, one doesn't want the TTR to be too close to 0. That can indicate problems in Whisper output like repetition loop.


1747929458784.png


These are just dumb stats with no knowledge of the genre, or type of movies. So it must be taken with grain of salt. It would be interesting to run the stats for each separate genre or series to see what are the characteristics.
 
There's a lot of dialogue in the first half hour, you have 3 lines in the first half hour.

The file is only 10kb as well, I got 18kb worth of text out of it. It's how I immediately noticed something was off as well. I've downloaded the 4k version just now, see if anything changes but I doubt it, it's in the queue, will take a bit.
I've tried different VAD's on it (with super low VAD sensitivity) and also a different voice extraction method, it didnt change much. Will try it again on the the 4k version
You are right...there is definitely something squirrely about WAAA-501! Think I have wasted enough time on it too! lol
 
Not sure what's going on with WAAA-501, but I tried both the decensored version and a version with hardcoded Chinese subs and they had the same issue. 3 lines of text and then nothing more for a very long time.

So either there's something weird going on with the original soud, or the various different versions out there are all from the same corrupted source.
 
...there is definitely something squirrely about WAAA-501


You're right. It seems that there is something very wiered about its audio coding. When the audio is converted to mono 16KHz audio, it looses almost all its waveform (see below). That is the cause of the problem with subtitling --Whipser uses mono 16KHz wav audio.


Original waveform:
1747938501883.png


After mono resampling:

1747938454515.png


I am now trying to find a workaround. Will update later.
 
  • Love
Reactions: Chuckie100
You're right. It seems that there is something very wiered about its audio coding. When the audio is converted to mono 16KHz audio, it looses almost all its waveform (see below). That is the cause of the problem with subtitling --Whipser uses mono 16KHz wav audio.


Original waveform:
View attachment 3671857


After mono resampling:

View attachment 3671856


I am now trying to find a workaround. Will update later.
Maybe the JAV studios have found a way to screw Whisper Subs!?
 
  • Haha
Reactions: mei2
I'm currently running whisper on only the left channel and it seems to be working. I guess it is possible you'll miss some conversation because it is only on the right channel, so if you want to be sure I guess run it twice, once for each side. However, conversation tends to be on both channels so I'm not sure whether it's worth the extra effort.
 
  • Like
Reactions: Chuckie100 and mei2
We peobbaly need to take this conversation to another thread so we don't hi-jack the main theme. The work that you're planning to do (and doing) is quite valuable and deserves more attention. I will spend more time on it during the weekend, but I thought of doing a quick exploration into your first set of the subs by running some stats on them. Here are some interesting stats:


(1) Hallucination: Your de-halluicnation seems to have worked quite well for known phrases. There doesn't seem to be too many of those occuring in the majority of the files:



View attachment 3671696




(2) Quality: Looking at the characters per seconds (CPS) metric can give an indication of the quality of the timing, or the sub. That can be one way of measuring the quality of a sub. I'd say anything below 10 and higher than 20 might be not optimal.

View attachment 3671723





(3) Quality: Another metric I looked at was for repetition. One metric can be Type-Token Ratio (TTR). This is a bit difficult metric for JAV (or any porn movie), as the vocabulary is not that vast :) However, one doesn't want the TTR to be too close to 0. That can indicate problems in Whisper output like repetition loop.


View attachment 3671724


These are just dumb stats with no knowledge of the genre, or type of movies. So it must be taken with grain of salt. It would be interesting to run the stats for each separate genre or series to see what are the characteristics.
Thanks for this detailed analysis Mei. The One trade-off with the hallucination clean-up (I'm assuming that you've looked at clean_subs.py) this is a work in progress and I will adjust it piecemeal as ideas come to me. My only slight qualm with the hallucination 'filter' is that during those moments where a hallucination has started but is carrying on over obvious dialog, I half think it would be better to at least have dialog, even wrong dialog, when it is obvious that someone is speaking instead of a blank screen where my script has removed the hallucination. It's probably not really relevant but it is an aspect of the filtering process that I am not happy with. I thought about adding some funny word replacements for my own enjoyment, 'have you had anal?-Do you enjoy a poke up the rump? but since these are primarily for others (Akiba) I have curtailed that inclination. The bottom line with Whisper is with its current fallibility you are relegated to good enough rather than, by jove I think I've got it. Right now I'm working on specific actress subs so , everything from Abe Mikado, Kanna Misaki, Lala Kudou, Airi/Meiri Twins, Nanami Nanase, and a bunch more that I can't remember right now (I"m on my laptop while my PC is hard at work creating subs) in any case the next bunch covers movies by actress, about 2000 titles I think, then I move on to my alphabetical folders which is just a mish-mash of everything and is somewhere in the vicinity of 20K files. This is going to be a very long process so if you discover potential tweaks to either my bat or py files to make them more effective, give me a shout. Thanks Mei.
 
  • Like
Reactions: mei2
Just checked the first few lines and it looks a lot better than the translation basic whisper gave me.

I've attached the raw results. Left side only, I didn't bother trying the right channel to see which one got the better results, as whisper takes quite a while to finish on my computer.
 

Attachments

  • Like
Reactions: mei2
Haha ...

View attachment 3671878


I
Here is my sub for WAAA-501 after fixing the audio problem.
I used whisper-anime model for this one. As a new model, I'm interested to hear your comments about the result.
Wow, that looks really great from a quick glance, but I can't compare this to my own as I don't know how to fix the audio problem.

I'd like to see it with actresses that sometimes talk in a very cutesy way, I'm sure any model trained on anime is able to deal with that much more. Ichika Matsumoto comes to mind (maybe MIAA-846 ?).

Wasn't there some problem with the timecodes on that model, you were able to fix it ? Or is this another one ?
 
Apparently it's probably "phase cancellation". One of the track has the opposite polarity of the other and they cancel each other out when downmixing to mono. It can be fixed by separating the 2 tracks, inverting one of them(Effect->Special->Invert in audacity) and then merging them back.

For WAAA-501, it looks like one scene in the middle is correct and the rest of the audio is inverted so you'd need to select and invert the 2 bad parts separately(downmix a copy to mono to identify and create labels(ctrl+B) to easily select them after) for 1 track(apparently usually the right one is the problematic one) and then you can get a good full audio for whisper.

Mono_fix.jpg
Edit: Made a better picture
 
Last edited:
Apparently it's probably "phase cancellation". One of the track has the opposite polarity of the other and they cancel each other out when downmixing to mono. It can be fixed by separating the 2 tracks, inverting one of them(Effect->Special->Invert in audacity) and then merging them back.

For WAAA-501, it looks like one scene in the middle is correct and the rest of the audio is inverted so you'd need to select and invert the 2 bad parts separately(downmix a copy to mono to identify and create labels(ctrl+B) to easily select them after) for 1 track(apparently usually the right one is the problematic one) and then you can get a good full audio for whisper.

View attachment 3671964

Wow, that's a lot trickier than my quick solution. I just did
ffmpeg -i input -af "pan=mono|c0=FL" output
to get the left channel audio only (or FR if you want right only).

I don't have anything more complex to edit audio than ffmpeg so i was the best I could do myself.

I do wish I knew where to get this anime-whisper as it seems to work much better than the regular whisper I have. Or is it one of those versions that you can't run locally but have to upload the file to some server?
 
Apparently it's probably "phase cancellation". One of the track has the opposite polarity of the other and they cancel each other out when downmixing to mono. It can be fixed by separating the 2 tracks, inverting one of them(Effect->Special->Invert in audacity) and then merging them back.

For WAAA-501, it looks like one scene in the middle is correct and the rest of the audio is inverted so you'd need to select and invert the 2 bad parts separately(downmix a copy to mono to identify and create labels(ctrl+B) to easily select them after) for 1 track(apparently usually the right one is the problematic one) and then you can get a good full audio for whisper.

View attachment 3671964
Edit: Made a better picture
worked great, thx !

I tried the python clean up subs thing, seems a bit aggressive on some things ? I might remove some stuff myself.

So i wanted to compare the anime model version vs V2, but I only translated 100 lines and only compared the first 60. I need sleep and can barely keep my eyes open, but really wanted to quickly compare it.

I think I caught like 3 clear mistakes, or audio that didn't get picked up, with the anime one in those 60 lines, but of course it's such a low sample size and even if you transcribe the same audio a second time and do the same parameters and same model, sometimes it will pick up some audio it didn't before.

I'm still very interested in comparing it to an actress that talks in like a really cute voice, I bet if would be better than v2 then.
 

Attachments

  • Like
Reactions: mei2
Wasn't there some problem with the timecodes on that model, you were able to fix it ? Or is this another one ?

Yes there still is the problem with timecodes, anime-whisper doesn't return those. My current workaround is to segment the audio around predicted vocals before feeding it to the model. I'm still testing it. I plan to publish it once it is in a beta-level shape.
 
Apparently it's probably "phase cancellation". One of the track has the opposite polarity of the other and they cancel each other out when downmixing to mono. It can be fixed by separating the 2 tracks, inverting one of them(Effect->Special->Invert in audacity) and then merging them back.

For WAAA-501, it looks like one scene in the middle is correct and the rest of the audio is inverted so you'd need to select and invert the 2 bad parts separately(downmix a copy to mono to identify and create labels(ctrl+B) to easily select them after) for 1 track(apparently usually the right one is the problematic one) and then you can get a good full audio for whisper.

View attachment 3671964
Edit: Made a better picture
Hi Sam,
Any thoughts on if this a "one-off" or is this what the future is going to look like in JAVs?
 
I don't think there's any reason for them to try and prevent subtitles from being made so it's more likely a bad job done by whoever edited it. I've seen some horrible things done on some videos so not everyone know what they're doing even when they are "professionals".
 
  • Like
Reactions: Chuckie100
I don't think there's any reason for them to try and prevent subtitles from being made so it's more likely a bad job done by whoever edited it. I've seen some horrible things done on some videos so not everyone know what they're doing even when they are "professionals".
I agree with you here Sam. It seems that someone would deliberately encode audio merely to confuse Whisper like HDCP or DRM it just doesn't make any sense to go to that trouble.