Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

Anyone done some testing how gemini models compare to deepseek for translation?
I did some testing with the flash and pro 2.5 model, 2 or 3 months ago, which in the AI universe is a century ago. It was a lil worse imo.
The flash version is really cheap and very fast, which imo is really an option if you don't mind the slightly worse translation.

Apparently the gemini pro 2.5 06-05 version is a lot better for translation ,but it's still waaay too pricy, so haven't even tried it, and my €300 free credits have expired. There's no flash version yet afaik, which will should be a lot cheaper.

As I also do like 70% of my translations on chinese sources (30% Japanese transcriptions), gonna be hard to beat deepseek as well. As I assume it's more trained on Chinese.
 
ABW-074 Yakake Umi's Finest Brush Wholesale 41 For The First Time In My Life, 100% Ejaculation Rate With Sex With Virgin Kun

Cover.jpg
  • Code: ABW-074
  • Release date: 2021-04-02
  • Director: Punonpen Ryou
  • Studio: Prestige
  • Label: ABW
  • Tags: Blow, Solowork, virgin man
  • Series: The finest brush stroke
  • Actor: Amateur
  • Actress: Yatsugake Umi
  1. Audio extracted and transcribed using WhisperWithVAD_Pro - both original speed audio and slowed audio.
  2. The slowed down audio transcription was adjusted back to the correct timing.
  3. Both transcription versions were AI translated using DeepSeek v3 0324.
  4. The "best" translation of each line from the two versions was used in the final translation.
I don't speak nor read Japanese so there are certain to be lots of mistakes.
It'd be extra-shiny if mistakes could be pointed out or corrected.
 

Attachments

  • Like
Reactions: hury
I did some testing with the flash and pro 2.5 model, 2 or 3 months ago, which in the AI universe is a century ago. It was a lil worse imo.
The flash version is really cheap and very fast, which imo is really an option if you don't mind the slightly worse translation.

Apparently the gemini pro 2.5 06-05 version is a lot better for translation ,but it's still waaay too pricy, so haven't even tried it, and my €300 free credits have expired. There's no flash version yet afaik, which will should be a lot cheaper.

As I also do like 70% of my translations on chinese sources (30% Japanese transcriptions), gonna be hard to beat deepseek as well. As I assume it's more trained on Chinese.
Interesting. I did some testing this week and in my opinion based on 2 examples (chinese source), they both are equal in quality. Some lines are better with deepseek and some are better with gemini 2.5 flash.
 
I just downloaded a recent release of a reduced Mosaic of JUX-766 staring my all time favorite JAV actress, Ayumi Shinodoa Naturally I had to update my Sub too. Here is my latest JUX-766 Sub...anyway enjoy and let me know what you think.
 

Attachments

Well folks, here is a new installment in the Super-Mucho-Mega SRT Adventure. What follows is E-G and is about 1600 .srt's. I'm kind of tired of doing this project so I may take a little break. Hope these are useful to some of you. Cheers.

Stay tuned for revised info
 
Last edited:
Anyone know why when you use whisper in Large mode to get subtitles it has things like: "Naokiman Show Instagramуют" "Please subscribe to the channel." "Thank you for watching until the end today" "See you in the next video." and "Thank you for watching!" in the text when it clearly isn't being said in the video?
 
Well folks, here is a new installment in the Super-Mucho-Mega SRT Adventure. What follows is E-G and is about 1600 .srt's. I'm kind of tired of doing this project so I may take a little break. Hope these are useful to some of you. Cheers.

E: https://drive.google.com/file/d/18lW0IFS4bOavNxVCXsjwSpBeuG77_Bg0/view?usp=sharing
F: https://drive.google.com/file/d/1SMEpdOj88ilzRRFxAiDFX5SYNKqpDfKF/view?usp=sharing
G: https://drive.google.com/file/d/1b0dKY-lChc4J5NSygbtru9j8tcNhWzJf/view?usp=sharing
Wow! You're rocking it, Dude. Thanks for sharing.
 
Anyone know why when you use whisper in Large mode to get subtitles it has things like: "Naokiman Show Instagramуют" "Please subscribe to the channel." "Thank you for watching until the end today" "See you in the next video." and "Thank you for watching!" in the text when it clearly isn't being said in the video?
I have seen this a bunch of times Freespirit and I have gone through a ton of tweaks with my .bat and.py scripts to try to minimize these types of things. I have absolutely no idea how this happens. Once in a while it will 'hallucinate' this for half of a video. IF you ever find a solutione would you please let me know. Thanks.
 
Anyone know why when you use whisper in Large mode to get subtitles it has things like: "Naokiman Show Instagramуют" "Please subscribe to the channel." "Thank you for watching until the end today" "See you in the next video." and "Thank you for watching!" in the text when it clearly isn't being said in the video?

Because the AI model has been trained on youtube videos and all AI do is guessing so since they say that a lot, it's a go to phrase it'll use when uncertain about something, especially if it matches when it's usually said, either at the beginning or end of it.
 
Wow! You're rocking it, Dude. Thanks for sharing.
Thanks Scully, I was starting to think that only a few people gave a crap about them and was toying with the idea of just pm'ing those people with the links instead of just posting them in the open forum. I'll ponder that but ultimately if I go that route you are definitely on the list. Cheers.
 
  • Like
Reactions: tobas and Imscully
Because the AI model has been trained on youtube videos and all AI do is guessing so since they say that a lot, it's a go to phrase it'll use when uncertain about something, especially if it matches when it's usually said, either at the beginning or end of it.
That is interesting Sam. I have my configs set to minimize AI Guessing/learning and only translating exactly what it 'hears' . I don't see it much but I suppose you can't totally rely on Whisper to not 'guess' now and then. Thanks for the response.
 
I don't know how those settings work exactly, but what we have as AI doesn't think at all, it's all educated guesses from doing/associating something over and over again(when I see this, that is the expected result) so if the data it's trained on has issues, those issues are passed on to your results, it doesn't know it's just guessing since for it, what it knows is what it is.

Since most youtube video will have that sentence or something similar at the end, the AI learns that this is the normal thing to say at the end of something and if you use a VAD, you end up with a lot of ends since it's all split into chunks.

They are getting better but it's still happening.
 
I don't know how those settings work exactly, but what we have as AI doesn't think at all, it's all educated guesses from doing/associating something over and over again(when I see this, that is the expected result) so if the data it's trained on has issues, those issues are passed on to your results, it doesn't know it's just guessing since for it, what it knows is what it is.

Since most youtube video will have that sentence or something similar at the end, the AI learns that this is the normal thing to say at the end of something and if you use a VAD, you end up with a lot of ends since it's all split into chunks.

They are getting better but it's still happening.
Yeah, that makes a lot of sense. One day, Probably past my expiry date, but One day the technology will be better at context. Right now it reads text and puts the dots together. Sometimes I wonder if I should even bother with this subtitle project because who knows what the technology will be like in a Year, or 6 Months, or Tomorrow Morning.
 
Anyone know why when you use whisper in Large mode to get subtitles it has things like: "Naokiman Show Instagramуют" "Please subscribe to the channel." "Thank you for watching until the end today" "See you in the next video." and "Thank you for watching!" in the text when it clearly isn't being said in the video?

This cleaner removes stuff like that, as most hallucinations are always the same (like subscribe to my channel). It's the best one I've seen on here. Was writing my own, but I'm too lazy to constantly tweak it.
The only part I don't like about this one is that it removes words like "umm, well ...," but it's something you can tweak.

In the models there's also an internal list of hallucinations that get auto-removed, it hallucinates even more than what you can see.

No amount of settings will fully remove hallucinations, I think the only thing we can do is penalize repetitions and the suppress_tokens trick ? Sure, you might tweak something and it's gone, but it could have been gone using the exact same settings as well. I don't fiddle with it too much, atm the best way to handle this is with post-processing scripts like this cleaner.
 
Last edited:
Upgraded Whisper config settings. I posted these a few Weeks ago but they have gained a lot of good upgrades. I don't know how these translate to using the online version and some of the settings, like thread usage, is based on my own specs (I9-14900k/Nvidia 4080/96G Ram) but perhaps some of these settings may have some value for you. For you advanced cats, I'm always happy to have suggestions. Cheers.
 

Attachments