akiba resident JAV subtitlers & subtitle talk★NOT A SUB REQUEST THREAD★

Hey all! I'm new here and I was wondering if any one wants to partner with me in developing high quality english subtitles for JAV videos.

A bit about me, I work professionally as a software engineer, I'm mostly a backend engineer so I write data pipelines for a living. I was wondering if any one of you is well versed in ML/AI so we can fine tune whisper models.

I can create web tools and infrastructure to support the research, for example I created https://www.simplejav.com/ (work in progress) so users can easily watch JAV on mobile and have access to English subtitles. (Currently it's just simple Whisper AI for the english subtitles, so it's pretty rudimentary and inaccurate, but it's better than nothing).

Or if you speak Japanese! Let's collaborate :D. I am looking for any help I can get, I want to create the best experience watching JAV as a english speaker.
 
Hey all! I'm new here and I was wondering if any one wants to partner with me in developing high quality english subtitles for JAV videos.

…. I was wondering if any one of you is well versed in ML/AI so we can fine tune whisper models.…

Hi @joowonton, I’m not an ML engineer but I’ve done a fair bit of playing around with Whisper. I’d be keen on any collaboration to improve and automate for better quality subs. My public repo is here: WhisperJAV.

What I’ve found most difficult in improving subs (and fine-tuning Whisper) is having a reliable test data set. I have tried to create my own data set from subbed anime and Netflix, but none are at a satisfactory level. The best so far has been Netflix’s Naked Director—I specifically use the episode where Nanami Kawakami appears, as her voice is quite the same as in her JAV works (and she is my top favorite ;)).

I personally think, to begin with, the most valuable partner would be a Japanese-speaking person who could help (you/us) in creating/validating a test data set. For example, a set of 5-minute audio/subs from the top 20 actresses each would go a long way. So I hope some Japanese-speaking member here will reply to your call.

Hope to hear from other members.
 
  • Like
Reactions: Taako and Imscully
Hi @joowonton, I’m not an ML engineer but I’ve done a fair bit of playing around with Whisper. I’d be keen on any collaboration to improve and automate for better quality subs. My public repo is here: WhisperJAV.

What I’ve found most difficult in improving subs (and fine-tuning Whisper) is having a reliable test data set. I have tried to create my own data set from subbed anime and Netflix, but none are at a satisfactory level. The best so far has been Netflix’s Naked Director—I specifically use the episode where Nanami Kawakami appears, as her voice is quite the same as in her JAV works (and she is my top favorite ;)).

I personally think, to begin with, the most valuable partner would be a Japanese-speaking person who could help (you/us) in creating/validating a test data set. For example, a set of 5-minute audio/subs from the top 20 actresses each would go a long way. So I hope some Japanese-speaking member here will reply to your call.

Hope to hear from other members.
Yea that's what I thought. I'm excited to check out your repo and see the work you have done so far. I know there's some phrases in a public dataset we might be able to translate and then use as training:

Though I'm not sure how relevant these phrases are to apply to JAV. I've DMed you about some other possible techniques, so I won't bore this thread.
 
  • Like
Reactions: Taako and mei2
Hi .. why subtitles edit is horrible at generating subtitles, and translation makes it even worse.. like not being able to generate subtitles if BG music exists. Can you suggest me alternative applications
 
Hi .. why subtitles edit is horrible at generating subtitles, and translation makes it even worse.. like not being able to generate subtitles if BG music exists. Can you suggest me alternative applications
Hello
You must understand no translation machine is perfect.
They will never capture the tone, the context, the mannerism, etc of ANY language.
Speaking for me, I've seen where the "subs look good and sound good" but...
when you hear it and understand the context, it's wrong.

They also add words and make up words that doesn't exist lol.

What works is its a base to help you understand the scenes a little better.

If you want perfection or greatness. Then you will need to find a
native Japanese person. Or someone who has study the language.

Just remember IF you are using any ML than it is up to you to clean
up the repeat lines, the hiccups, the missing dialogues, etc.

I don't use ML but my problem when I sub is deciding
should I add the repeats like "Oneechan/Onechan/Sister/Neechan".
Or any words that's repeat so often after being said.

Just do your best and ask for help if you need it.

REAL subtitling takes a very long time.
ML subtitling will be faster and more wrong
but help you understand.

I guess you can't go wrong and depends on you.
 
Just wanted to add this here. I will add more from my other laptop later.

1. I'm not hitting you = You're too close.
Also, it could mean, Isn't it standing?

2. It feels good to be very slippery = It feels so nice and smooth.
Also, It's so smooth and comfortable.
It's so slippery and feels good.
:p
 
Last edited:
You mean SubtitleEdit, right? I don't think SubtitleEdit has any capability to accept external glossary.
You cant edit the glossary, but if you use the free translation and large model it usually comes back with the same type of translations consistently. Like "nani/what" is usually translated as "picture". You can click on edit then multiple replace and make it replace "picture" with "what". You can do this with any wording, like chenpo, go, chew, etc. Subtitleedit isn't perfect but hopefully they continue updating the program and whisper until its near perfect, hopefully not long either.
 
Japanese interjections in JAV plays a big role in writing.
And for me sometime I get it and most time i ignore. Ignoring it in your subs is probably easier :D
But if you can't ignore it than here's an interjection that's constant in Jav;
ano' meaning, Well... Um... You know... What? See... Excuse me... Hey... Say... etc, etc
It's like an interruption in speech when you're thinking of what to say next.

Then, you might hear something similar...
an ne あんね. Also it can get a little tricky if it's made into a question. So then...
it could mean Yeah?(Yeah) Okay? Eh? See? Hey? You know? Well. etc, etc.

Again, if there are any native speakers or masters of the Japanese language please correct me
or give further advice. We are all here to have fun, share, and learn.;)
 
Last edited:
  • Like
Reactions: Imscully and mei2
its super irritation how the standard translation is coming out from Japanese to English SubtitleEdit. i haven't tested enough and use the translation together with the coding Purfviews faster whisper xxl. some of the words i just laugh. i am not sure what they use in sextb because i feel that's very good.
 
its super irritation how the standard translation is coming out from Japanese to English SubtitleEdit. i haven't tested enough and use the translation together with the coding Purfviews faster whisper xxl. some of the words i just laugh. i am not sure what they use in sextb because i feel that's very good.
about sextb, do you know if there's a way to download their subtitles? I know there is a button but it never worked for me
 
its super irritation how the standard translation is coming out from Japanese to English SubtitleEdit. i haven't tested enough and use the translation together with the coding Purfviews faster whisper xxl. some of the words i just laugh. i am not sure what they use in sextb because i feel that's very good.
It's difficult to know in honest. Because sextb seems... and no offense to you or anyone, but quite bad/laughable too.
I gave up on trying to understand the aspect of ML. And decided the heart in the right place but...
the tech has a long way to go.
 
Often you will hear a phrase/word repeated.
Sometimes it will make that word more urgent or emphasize it.
But for JAV, sometimes it's fine on its own.

For example you might know these words or heard them used alone.
*Hora, Look. Listen. Hey. Come on. Here. See, etc
*īkara Okay. It's okay. It's fine, It's good, Good enough. Just do it. You're good. Listen to me, etc
*Hai はい Yes. Hey. Hello. Hi. Okay. Here, etc

But it can change like the examples below...


Hora hora hora ほらほらほら
Hey, hey, hey. Here, here, here.
Look, look, look. Now, now, now. Come on, come on. Come on, let's go.
Come on, all of you. Here you go, etc

īkara īkara いいからいいから
It's okay, it's okay. It's fine. I'm fine. It's okay, Alrigh, alrigh, etc

Hai Hai はいはい
Yes, yes. Hey, hey. Hello. Hi. Okay, okay. Here you go. Yes, sir. Sure, okay, etc.

I don't think it gets too complicated in JAV so keep it simple unless you wanna do more. :D
 
Last edited:
  • Like
Reactions: Michifuz
Hi

I’m making English soft subs for the long version of BGN-068. This long version has about 40 min of BTS and interview material after the 3 hour main video. As far as I can tell subtitles for that extra material hasn’t been posted by anyone yet.

I’m making the subs by extracting the Jav.Guru hard subs from the main video then adding subtitles for the extra material at the end.

I’d like to know, before I post the subtitles here, if it’s allowed and considered acceptable to post on this forum extracted hard subs done by someone else.

If this is considered unacceptable I could post subtitles for only the extra material.
 
  • Like
Reactions: Rezzonicco
Hi

I’m making English soft subs for the long version of BGN-068. This long version has about 40 min of BTS and interview material after the 3 hour main video. As far as I can tell subtitles for that extra material hasn’t been posted by anyone yet.

I’m making the subs by extracting the Jav.Guru hard subs from the main video then adding subtitles for the extra material at the end.

I’d like to know, before I post the subtitles here, if it’s allowed and considered acceptable to post on this forum extracted hard subs done by someone else.

If this is considered unacceptable I could post subtitles for only the extra material.
Don’t the subs made by jav-subtitles.com cover the whole length of the vídeo??
 
Hi

I’m making English soft subs for the long version of BGN-068. This long version has about 40 min of BTS and interview material after the 3 hour main video. As far as I can tell subtitles for that extra material hasn’t been posted by anyone yet.

I’m making the subs by extracting the Jav.Guru hard subs from the main video then adding subtitles for the extra material at the end.

I’d like to know, before I post the subtitles here, if it’s allowed and considered acceptable to post on this forum extracted hard subs done by someone else.

If this is considered unacceptable I could post subtitles for only the extra material.
Please ignore my earlier post. I was mistaken about the video code but didn’t find how to delete the post.
 
I took subs from javgg and subtitle cat and went through each line to determine which best fit (based on my 1-year of college Japanese course). The movie is 2:40 hours long but it took me way over twice that time to go through each line. There's 365 lines. I don't think I will ever try to sub anything manually ever again.

The story is quite nice, there's character development for both the actor and actress. No one is inherently evil, just a story about how humans are able to influence change in other humans.

Maybe I'm just coping because I spent so many hours going through each voice line, only one way to find out.
 

Attachments

Hello @mei2

I've quickly tried out the notebook version of WhisperJAV_1_1 and it seems pretty good. The post-processing removes hallucinations very well. I usually run a script to remove garbage (hallucinations, vocalizations, etc) before translation but I may not need to do that with the post-processing you've implemented.

I'm running another transcription now (using balanced speed and aggressive granularity settings) on a video that has periods of many people talking at once to see how that works.

A couple of minor points.
  • The "WhisperJAV Control Panel" cell is numbered 3 when it should be 2 (this doesn't affect functionality)
Screenshot 2025-07-12 132928.png
  • I get an error regarding numpy being previously imported when running the notebook for the first time. If I restart the session the error is resolved.
Screenshot 2025-07-12 132017.png
  • And a minor request... Is it possible to add a code cell to disconnect the runtime on completion? This would avoid using compute units, while sitting idle, once a batch is completed - without having to manually disconnect the runtime. I do this by adding the code below in a cell at the end. There's probably nicer ways to do this. I don't know if the 5 second pause before disconnecting is necessary - I just put that there to ensure the last file is saved before disconnection.
#@markdown Disconnect and delete runtime
import time
time.sleep(5) # Pause for 5 seconds
from google.colab import runtime
runtime.unassign()
Thanks for the new version!
 
  • Like
Reactions: mei2