akiba resident JAV subtitlers & subtitle talk★NOT A SUB REQUEST THREAD★

A couple of minor points.
  • The "WhisperJAV Control Panel" cell is numbered 3 when it should be 2 (this doesn't affect functionality)
  • I get an error regarding numpy being previously imported when running the notebook for the first time. If I restart the session the error is resolved.
  • And a minor request... Is it possible to add a code cell to disconnect the runtime on completion? This would avoid using compute units, while sitting idle, once a batch is completed - without having to manually disconnect the runtime. I do this by adding the code below in a cell at the end. There's probably nicer ways to do this. I don't know if the 5 second pause before disconnecting is necessary - I just put that there to ensure the last file is saved before disconnection.
#@markdown Disconnect and delete runtime
import time
time.sleep(5) # Pause for 5 seconds
from google.colab import runtime
runtime.unassign()
Thanks for the new version!

Thanks for the input. Yes I've been pleased with the results of Japanese postprocessing and the timestamps. There are still some escapees every now an then. If you come across any please let me know, so I can add them to the filter. Good points on the notebook, I'll add them to the next version.
 
  • Like
Reactions: Novus.Toto
A couple of minor points.
  • The "WhisperJAV Control Panel" cell is numbered 3 when it should be 2 (this doesn't affect functionality)
  • I get an error regarding numpy being previously imported when running the notebook for the first time. If I restart the session the error is resolved.
  • And a minor request... Is it possible to add a code cell to disconnect the runtime on completion? This would avoid using compute units, while sitting idle, once a batch is completed - without having to manually disconnect the runtime. I do this by adding the code below in a cell at the end. There's probably nicer ways to do this. I don't know if the 5 second pause before disconnecting is necessary - I just put that there to ensure the last file is saved before disconnection.
#@markdown Disconnect and delete runtime
import time
time.sleep(5) # Pause for 5 seconds
from google.colab import runtime
runtime.unassign()

I updated the notebook WhipsrJAV1.1.1
Note: For now, just ignore the numpy error. I will need to work on it more. The core problem is that google has loaded colab with bunch of fastai stuff which require numpy 2. However numpy 2 has broken backward compatibility with numpy 1.x.
 
  • Like
Reactions: Novus.Toto
Hello @mei2

I've quickly tried out the notebook version of WhisperJAV_1_1 and it seems pretty good. The post-processing removes hallucinations very well. I usually run a script to remove garbage (hallucinations, vocalizations, etc) before translation but I may not need to do that with the post-processing you've implemented.
  • I get an error regarding numpy being previously imported when running the notebook for the first time. If I restart the session the error is resolved.

I just dropped a new version for colab: notebook version 1.2 .
This one has solved for the numpy restart session.
I also cleaned up the UX flow and has set the default mode to "faster" .
cheers


 
Hi,
I want to ask, I know there are some site providing subtitle but usually we need to download it one by one like subtitlecat. But, is there any web or app providing jav subtitle to download it in batch updated like weekly or montly for new releases?
 
  • Like
Reactions: Imscully
Hi,
I want to ask, I know there are some site providing subtitle but usually we need to download it one by one like subtitlecat. But, is there any web or app providing jav subtitle to download it in batch updated like weekly or montly for new releases?

I'd say, areliable source of "weekly" updates for mass download of subs would be the user Runbkk in Sukebei: runbkk-posts.
He packages subs together with the movie packs. If you're interested to download only the subs, select the subs from the torrent tree. A very big thank you to him, he has been doing this for several years. Depending on what you mean by "fresh", his packs are usually few weeks after the release.

Another great source for same day release of subs is the discord thread by Heavenazer (Azer). However this is a discord channel, so no mass download, but still fairly easy to get batch of fresh subs. Heavenazer++subs
 
  • Like
Reactions: lock_on
I'd say, areliable source of "weekly" updates for mass download of subs would be the user Runbkk in Sukebei: runbkk-posts.
He packages subs together with the movie packs. If you're interested to download only the subs, select the subs from the torrent tree. A very big thank you to him, he has been doing this for several years. Depending on what you mean by "fresh", his packs are usually few weeks after the release.

Another great source for same day release of subs is the discord thread by Heavenazer (Azer). However this is a discord channel, so no mass download, but still fairly easy to get batch of fresh subs. Heavenazer++subs
That runbkk is a good source but it is not in english subtitle, what I am searching is the english subtitles. Also I can't access the dscord for Heavenazer link above.
 
That runbkk is a good source but it is not in english subtitle, what I am searching is the english subtitles. Also I can't access the dscord for Heavenazer link above.
You can download the japanese or chinese sub then have it automatically translated at https://translatesubtitles.co/ or even at subtitle cat. It works very well with just a minimum of effort on your part. I should say, of course, if the original sub is badly done then the translation will suffer too, but in most cases it's very good.
 
As requested by @keanurefresh here is a tutorial with my subtitle translation workflow.

In the tutorial and below I have linked to the repos for the scripts I use.

https://github.com/TotoTheDog0/audioPreprocess

https://github.com/TotoTheDog0/subsTranslate

I'm sure the way I do translations is far from the best or quickest way - but it works for me. I hope some people find this useful.

I should note that I don't enjoy coding and I'm bad at it. If someone wants to fork and improve my scripts they're welcome to.
 

Attachments

Slave Color: The Complete Series, including spinoffs. Unedited, Whisper translations with a variant model that was trained on Eroge so it translates sex noises and better identifies speech mid fucking. The hallucinations are it translating regular noises as sex noises, which is a fine tradeoff for picking up the huge amount of speech that happens during sex scenes.


For movie numbers. I use just the titles for the main series and Teacher, but the other spinoffs I include the JAV code.

"Complete" has a few exceptions where my own local copies of the movies were corrupted. Other subs exist. My previous subs are more edited by hand, these have almost no edits but I think overall exceed what I had done previously.
 

Attachments

  • Like
Reactions: makaroni1
Slave Color: The Complete Series, including spinoffs. Unedited, Whisper translations with a variant model that was trained on Eroge so it translates sex noises and better identifies speech mid fucking. The hallucinations are it translating regular noises as sex noises, which is a fine tradeoff for picking up the huge amount of speech that happens during sex scenes.


For movie numbers. I use just the titles for the main series and Teacher, but the other spinoffs I include the JAV code.

"Complete" has a few exceptions where my own local copies of the movies were corrupted. Other subs exist. My previous subs are more edited by hand, these have almost no edits but I think overall exceed what I had done previously.
Is the model available to download somewhere?
 
Is the model available to download somewhere?

I think I used the CTranslate variant of this:



However, I am not sure on the actual usage now? I think the exact model I used is gone. I did the Ctranslate version with Faster Whisper in Python.
 
I think I used the CTranslate variant of this:



However, I am not sure on the actual usage now? I think the exact model I used is gone. I did the Ctranslate version with Faster Whisper in Python.

Haven't tried the Sacmi5/whisper but i do use Litagin/anime-whisper ( most used in xxx scenes when characters are stuttering or unclear )
But another suggestion that i use for line checks by exporting wav files from SE is


I run it through docker desktop