akiba resident JAV subtitlers & subtitle talk★NOT A SUB REQUEST THREAD★

Is it possible to translate japanese --> french ?

Yes it is. I had kept the number of langauges limited during the ramp up to manage bug fixes and issues. Things are getting settled, so I can open up more languages and options in comming release. Some things still need improvements: the "pornify" tone is not working as it should, and fgew other things.
 
  • Like
Reactions: lynx74

Attachments

  • Like
Reactions: mei2
Hello author, I am deploying using the source code. The version 1.8.6 has fewer segments, but your v1.8.10 has 80 or 64 segments, and there are frequent jumps between segments.

I'm guessing by "frequent jumps between segments" you mean these logs:


Transcribing 80 scenes with VAD-enhanced processing:
Transcribing: [------------------------------] 1/80 [1.2%] | NS-10-1 0-1 new_scene_...2026-04-18 13:40:50
Transcribing: [=-----------------------------] 3/80 [3.8%] | NS-10-1 0-1 new_scene_...2026-04-18 13:41:27

and


Transcribing 80 scenes with VAD-enhanced processing:
Transcribing: [------------------------------] 1/80 [1.2%] | NS-10-1 0-1 new_scene_...2026-04-18 13:41:53
Transcribing: [=-----------------------------] 5/80 [6.2%] | NS-10-1 0-1 new_scene_... | ETA: 8.4m2026-04-18 13:42:27



Those log printouts are only the progress report snapshots. All 80 scenes are being processed one by one. The progress report jumps every 30 second to report howmany segmenets have been, and are being processed. The progress report only reports the progress approximately every 30 seconds to avoid overwhelming the console. Overwhelming the console has the risk of chocking the system. Hence the discrete printouts.


Did that answer your question?
 
I'm guessing by "frequent jumps between segments" you mean these logs:


Transcribing 80 scenes with VAD-enhanced processing:
Transcribing: [------------------------------] 1/80 [1.2%] | NS-10-1 0-1 new_scene_...2026-04-18 13:40:50
Transcribing: [=-----------------------------] 3/80 [3.8%] | NS-10-1 0-1 new_scene_...2026-04-18 13:41:27

and


Transcribing 80 scenes with VAD-enhanced processing:
Transcribing: [------------------------------] 1/80 [1.2%] | NS-10-1 0-1 new_scene_...2026-04-18 13:41:53
Transcribing: [=-----------------------------] 5/80 [6.2%] | NS-10-1 0-1 new_scene_... | ETA: 8.4m2026-04-18 13:42:27



Those log printouts are only the progress report snapshots. All 80 scenes are being processed one by one. The progress report jumps every 30 second to report howmany segmenets have been, and are being processed. The progress report only reports the progress approximately every 30 seconds to avoid overwhelming the console. Overwhelming the console has the risk of chocking the system. Hence the discrete printouts.


Did that answer your question?
So, the second section that it skipped, 2-80 or 4-80, were these sections skipped due to overload or were they skipped because there was no human voice in those sections? Also, is the purpose of having multiple paragraphs to conduct more detailed detection and recognition of the subtitles? There are 80 paragraphs in total. One of them is a complete sentence that has been split. Will it maintain the coherence of the sentence and have proper connection with the surrounding context as well as a smooth timeline?
 
@xch8888
No scenes have been jumped or unprocessed. Evrything gets processed. What you see as progress report on console screen is a status report (snapshot) roughly every half a minute:

  • 13:40:50: The system reports it has started working on Scene 1.
  • 13:41:27: (About 37 seconds later), the next report comes in. Because the system was busy working during that gap, it has already finished Scenes 1 and 2 and is now actively working on Scene 3.
Think of the console report like a road sign on a highway. Your car (the processor) is driving at a variable speed, hitting every kilometer of the road. There isn't a sign for every single kilometer. Instead, there is a sign every few kilometers to tell you where you are right now.

In terms of your other question: the probability that a scene would cut in middle of a scentence is very low. The most accurate scene algorithm is "semantic" scene detect. If you come across scenteces that are cut short, most likely the culprit is the VAD.

I hope this helps.
 
Last edited:
Yes it is. I had kept the number of langauges limited during the ramp up to manage bug fixes and issues. Things are getting settled, so I can open up more languages and options in comming release. Some things still need improvements: the "pornify" tone is not working as it should, and fgew other things.
Thanks a lot mei2 for your impressive and useful work.
 
BTW, if you have access to github, you can add feature requests and bug report just here (it gets automated into my worksflow):
WhisperJAV Feature Requests and Issue Reports
ive been using ai to try and get best setting for 'voice works' or Japanese amsr. ive been getting good and quick results with base setting and with merge, just slight differences between some but not enough that i have a fav. I dont know Japanese so i can only tell if its right by the tone and summary of audio story. Do you any recommendations o a preset i can download? is this a big enough request to make a github request, never done that before. I dont really know what the settings do i just plug in what im told lol.