Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

As I was contriving various word substitudions my 'a'int I a stinker' mindset was always coming up with weird and wonderful sustitutions like replacing ramen with' old men in funny hats' or a wide variety of stupidity but since I was doing these subs mainly for the akiba cats I decided to go with the obvious. I think that once people realize that they can easily edit these .srt files that eventually you'll start to find all kinds of creative insanity. I hope so at least.
Yeah, I often do that (substitute words or phrases in subs) just for the fun of it. If you've seen any of the photos I've posted here, I usually include a goofy addition or two.
 
Not a fan of those settings tb
That is excellent news. Please keep us posted if you find a superior configuration. My results are mostly a battle with accuracy vs. hallucinations. Ultimately I want the best accuracy but I have made some compromises to try to avoid those damned hallucinations.

IF you discover the ultimate configuration, please let me know and if it yields a superior result to my recent mass-srt adventure, I will do it again with the better settings.

It was very interesting the discrepancies between the Two files. One thing that I try to minimize with my configurations is to prevent Whisper from 'assuming' text'. I don't want it to inject what it 'guesses' that is there or fills in phrases, like thank you for subscribing, because there is a gap in the dialog. I don't want AI to guess I want it to translate exactly what it 'hears' and not another word extra.

I'm glad that you were able to get it working. Cheerss.
Can't say I'm a fan of those settings. The VAD settings aren't supposed to be touched (besides threshold and VAD method pyannote vs silero). They are already optimized for most content. Yours are so far of the default, I'm really wondering how you got to these settings. Your threshold is also so low that you will have many hallucinations because of it.

You also cant really chat/prompt to these models, they are designed to do one thing and one thing only (transcribe/translate). Afaik prompting to it can only cause more hallucinations. Just saying it's adult content might be harmless tho.

I'm not an expert by any means, but this info is what I've gotten from nosing around in the githubs.

I'm also of the opposite opinion that I don't mind hallucinations, I rather have a bit more hallucinations than missing out on dialogue. It's usually pretty obvious when there's 1 word wrongly transcribed and it's very rare for me that a whole sentence is wrongly transcribed unless I put that VAD threshold very low. I'm sure with good prompting LLMs can also catch some of these bad transcriptions, but I haven't gone too deep into comparing what I use in my prompt to make LLMs catch some of these ("Handle Whisper ASR transcription errors contextually")
 
Last edited:
That is excellent news. Please keep us posted if you find a superior configuration. My results are mostly a battle with accuracy vs. hallucinations. Ultimately I want the best accuracy but I have made some compromises to try to avoid those damned hallucinations.

IF you discover the ultimate configuration, please let me know and if it yields a superior result to my recent mass-srt adventure, I will do it again with the better settings.

It was very interesting the discrepancies between the Two files. One thing that I try to minimize with my configurations is to prevent Whisper from 'assuming' text'. I don't want it to inject what it 'guesses' that is there or fills in phrases, like thank you for subscribing, because there is a gap in the dialog. I don't want AI to guess I want it to translate exactly what it 'hears' and not another word extra.

I'm glad that you were able to get it working. Cheerss.
fast-whisper-xxl, it seems like V2 does a better translation than V3 for --model large

hhd800.com@WAAA-194_V2.srt = 71 kb
hhd800.com@WAAA-194_V3.srt = 84 kb

Dscott, have you checked fast-whisper V2 vs V3 to see which one does a better job?

1753385828745.png
 

Attachments

We've had these V2 vs V3 discussion a couple of times :P.

So here are the numbers from actual tests on Japanese (higher is worse)

1753387059513.png

BUT from what I understand, largeV2 is more forgiving when the audio quality is a lil worse cause it's trained on more messy audio. V3 is better on clear audio like podcasts, it probably is way better on Anime as well. But as we all know JAV can sometimes have not the best audio, cause most scenes are one-takes. And I doubt even the biggest studios bring actors in to dub over themselves when a line isn't clear, like what they do in actual TV series/movies. I've never noticed it at least.
 
Last edited:
  • Like
Reactions: mycl500

[Reducing Mosaic]HSODA-030 Hey Mom, Let’s Have Sex! A Hot And Steamy Summer Vacation With My Son And I. Yuri Honma​

View attachment 3697435

I forgot that I had started this Yuri Honma sub a while back. I thought it was more erotic than either NUKA-030 or YOCH-002, so I decided to work on it. May still work on NUKA-030 but YOCH-002 Whisper translation was disappointing. .

I used WhisperJAV0.7 to create this Subs and I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think.


Just added a nit at the beginning ..."to set the stage"!

.

LOVE Yuri Honma! She has had some delectable delicious MOM titles in her library of work. GVH-606; GVH-275; GVH-229; HSODA-033; NTRD-111; JJDA-015; FERA-163; GVH-432; SPRD-1504 among my favorites of hers.
 
We've had these V2 vs V3 discussion a couple of times .:P

So here are the numbers from actual tests on Japanese (higher is worse)

View attachment 3699214

BUT from what I understand, largeV2 is more forgiving when the audio quality is a lil worse cause it's trained on more messy audio. V3 is better on clear audio like podcasts, it probably is way better on Anime as well. But as we all know JAV can sometimes have not the best audio, cause most scenes are one-takes. And I doubt even the biggest studios bring actors in to dub over themselves when a line isn't clear, like what they do in actual TV series/movies. I've never noticed it at least.
May I ask, what software do you use to call the Whisper model to turn it green? Thank you
 
Not a fan of those settings tb

Can't say I'm a fan of those settings. The VAD settings aren't supposed to be touched (besides threshold and VAD method pyannote vs silero). They are already optimized for most content. Yours are so far of the default, I'm really wondering how you got to these settings. Your threshold is also so low that you will have many hallucinations because of it.

You also cant really chat/prompt to these models, they are designed to do one thing and one thing only (transcribe/translate). Afaik prompting to it can only cause more hallucinations. Just saying it's adult content might be harmless tho.

I'm not an expert by any means, but this info is what I've gotten from nosing around in the githubs.

I'm also of the opposite opinion that I don't mind hallucinations, I rather have a bit more hallucinations than missing out on dialogue. It's usually pretty obvious when there's 1 word wrongly transcribed and it's very rare for me that a whole sentence is wrongly transcribed unless I put that VAD threshold very low. I'm sure with good prompting LLMs can also catch some of these bad transcriptions, but I haven't gone too deep into comparing what I use in my prompt to make LLMs catch some of these ("Handle Whisper ASR transcription errors contextually")
Thanks for the feedback T221152. I have tweaked the crap out of WhisperAI. I could tell you horror stories of having a decent result and then tweaking a little more and then forgetting the orginal settings, going back to square One and removing all of the components and starting again. I had a nightmare trying to get Faster-Whisper to function. IN any case, we are of differing opinions on the hallucination issue and I see your point.

My various configuration settings are a result of endless experimentation to try to catch dialog that is whispered, or otherwise will be missed by the default settings. Another issue, for me, is the hallucination problem and this is a constant battle, accuracy at the expense of likely hallucinations. The more specific that you design your settings the more likely it is that you will get hallucinations. You have said that you don't mind hallucinations and that is where we will have to disagree. Ultimately though, since Whisper is not perfect, not by a long shot, I just have to go with the best that I can get, not perfect, not even close, but good enough for me.

I also have tried all of the models given a variety of scenarios and one of the annoyances to me is that Once a hallucination starts it will often continue over very clear dialog so that Whisper completely skips that dialog. That particular aspect of the hallucination really drives me nuts. IN any case, for me, It was more of a situation where I was not completely satisfied with the results but they were, 'good enough'.

The large V2 vs. Large V3 discussion I have never completely convinced myself of one over the other so I simply went with V3. IN essense the technology is not yet reliable. It gives you a approximation that, for me, is good enough but not perfect. My final analysis is that almost all of the dialog in a JAV title is just a rehashing of the same old thing over and over again. Where there is clearly a conversation, this is where I find that my settings yield a pretty accurate result. I say "pretty accurate' because I have not found a configuration that completely gives me a perfect result.

That said, I always pay close attention to your posts because you seem to have a better handle on the specifics than I so I really do appreciate your feedback.
 
fast-whisper-xxl, it seems like V2 does a better translation than V3 for --model large

hhd800.com@WAAA-194_V2.srt = 71 kb
hhd800.com@WAAA-194_V3.srt = 84 kb

Dscott, have you checked fast-whisper V2 vs V3 to see which one does a better job?

View attachment 3699207
Thanks for the feedback Mycl500. Yes I have tried both models in a variety of situations and tbh. I never quite convinced myself in either case but your results here definitely imply V2 is a superior choice. I will go back and do some more V2 Testing. This is excellent example and I appreciate it.