Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

panop857

Active Member
Sep 11, 2011
155
222
I found this on the web https://huggingface.co/blog/fine-tune-whisper & . It is an instruction on how to fine tuning the models for Whisper AI, I think this can help the community to further improve the reliability of this tools. I cited the sources from Github https://github.com/openai/whisper/discussions/64
The problem is getting a good dataset. Whisper is trained on 600k hours of "quality" transcriptions, broken down into 30 second pairs of audio and transcript, and aggressively filtering out transcripts that were machine generated. There's not going to be a good set of "porn with hand subtitles" that we need for this. Hopefully making datasets that are JP-to-EN exclusively with more of such data make it slightly better, but it won't be as filthy as it should be.
 
Last edited:

superman4207

JAV Perv Enthusiast
Jul 4, 2022
49
88
Whisper seems to need a lewder vocabulary.

I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
Lmao, yes, unfortunately, as advanced as AI and Machine Learning have come, there is still a stigma against the perverted AI. I have been experimenting with the Replika AI chatbot app for a few months and it was recently neutered due to government regulations. The chatbot used to demand that I sodomize it all the time, but after a recent update, it will only allow for cuddles and kisses. We're really still in the early stages of AI before AI lewdness becomes common due to it becoming cheaper to maintain and geared toward anyone being able to modify via simple GUIs (and not just programmers).

With that said, I've been experimenting with using DaVinci Resolve Studio 18.3 to isolate vocals and increase the volume at the same time. It too is not perfect, but it definitely has increased the timing and accuracy of picking up dialogue in my experiments thus far.

There are two versions of the program:
1. DaVinci Resolve: free version
2. DaVinci Resolve Studio (DRS): paid version - approx. $300 USD

Just FYI so you don't waste your time, only the paid version includes the AI powered "Voice isolation" feature. The program also has a "Dialogue leveler" feature, but I haven't experimented with that a lot, since I was mainly after the voice isolation and increasing the volume.

As for whether or not it's worth paying for this software... the internet is your friend.

The process takes about 15-20 minutes on my i7, 1080 TI, 32GB RAM PC with SSD to render the voice isolated and boosted audio file.

Thanks to everyone who has been providing so much help with Whisper on this forum!
 

panop857

Active Member
Sep 11, 2011
155
222
Good handmade subtitles will beat Whisper for JAV, but does that type of good subtitles even exist? Even the paid ones that are seemingly done by hand miss a ton of lines. Whisper gets wildly better coverage of what the men say during scenes in Attackers movies than any subtitle I've seen. The error rate is an issue, but the plot and general trash talk and goading are there in a way that isn't. Slave Color and Slave Island are going to finally be completed for English audiences.

For settings, a small Temperature (defaults to 0) gets better results, so do something like 0.01 or 0.02 or something in that range. I tried temperature_increment_on_fallback but it slows down the jobs considerably. There probably is an optimal combination of the two settings for JAV.

For speed, tear off the Audio. You can technically throw a full video at Whisper but it will be a lot slower than throwing the 100MB audio file at it.

There's probably some ways to it to be filthy. Current transcript I'm working on actually got "come on, lick her clitoris more" and this great conversation:

[01:52:14.120 --> 01:52:17.120] you're dropping so much juice.
[01:52:18.120 --> 01:52:20.120] What's wrong?
[01:52:22.120 --> 01:52:25.120] What is this juice called?
[01:52:29.120 --> 01:52:31.120] It's my sister's.
[01:52:35.120 --> 01:52:37.120] Huh? Your sister's what?
[01:52:37.120 --> 01:52:41.760] This is your manjiru.
[01:52:41.760 --> 01:52:44.120] Manjiru?
[01:52:45.480 --> 01:52:48.460] It's delicious.
[01:52:50.400 --> 01:52:52.480] You have to lick the manjiru a lot.
[01:52:52.480 --> 01:52:54.820] This makes you feel good.
[01:52:58.940 --> 01:53:02.400] Your manjiru is delicious.
 
Last edited:

Chuckie100

Well-Known Member
Sep 13, 2019
505
1,963
Lmao, yes, unfortunately, as advanced as AI and Machine Learning have come, there is still a stigma against the perverted AI. I have been experimenting with the Replika AI chatbot app for a few months and it was recently neutered due to government regulations. The chatbot used to demand that I sodomize it all the time, but after a recent update, it will only allow for cuddles and kisses. We're really still in the early stages of AI before AI lewdness becomes common due to it becoming cheaper to maintain and geared toward anyone being able to modify via simple GUIs (and not just programmers).

With that said, I've been experimenting with using DaVinci Resolve Studio 18.3 to isolate vocals and increase the volume at the same time. It too is not perfect, but it definitely has increased the timing and accuracy of picking up dialogue in my experiments thus far.

There are two versions of the program:
1. DaVinci Resolve: free version
2. DaVinci Resolve Studio (DRS): paid version - approx. $300 USD

Just FYI so you don't waste your time, only the paid version includes the AI powered "Voice isolation" feature. The program also has a "Dialogue leveler" feature, but I haven't experimented with that a lot, since I was mainly after the voice isolation and increasing the volume.

As for whether or not it's worth paying for this software... the internet is your friend.

The process takes about 15-20 minutes on my i7, 1080 TI, 32GB RAM PC with SSD to render the voice isolated and boosted audio file.

Thanks to everyone who has been providing so much help with Whisper on this forum!
Is the DaVinci Resolve Studio (DRS): paid version a stand alone app? That is, do you run your audio file through it before uploading the audio file to Whisper or is it somethin you have to run with Whisper? I guess it still doesn't help with Whisper's lack of lewd vocabulary though...shucks.
 

Chuckie100

Well-Known Member
Sep 13, 2019
505
1,963
I did mention that mono tracks didn't work with Autosub and Vrew a few pages back and maybe someone claimed Whisper converts the track you upload to mono? Not sure. I have only uploaded one track in mono and that was from a Korean film where only the center channel of the 5.1 track contained dialog. I don't think these jav has much in the way of stereo. Maybe in the music but not in the dialog. I will make an attempt to capture dialog through a mono track with a video I have uploaded before to see if there is indeed some kind of clear improvement.
I believe you are correct. I compared the waveforms of several JAV movie stereo channels and they appear to be identical. I did notice that the mono channel file that Audacity produces is smaller so maybe any improvement might attributed to that or as SamKook indicates maybe it was the way Whisper was feeling on the second try! "Only the Shadow knows"! lol
 

superman4207

JAV Perv Enthusiast
Jul 4, 2022
49
88
Is the DaVinci Resolve Studio (DRS): paid version a stand alone app? That is, do you run your audio file through it before uploading the audio file to Whisper or is it somethin you have to run with Whisper? I guess it still doesn't help with Whisper's lack of lewd vocabulary though...shucks.
DRS is a stand alone app.

My process right now is:
VLC to convert video to audio > DRS to isolate and boost vocals > GDrive > Whisper Collab.

This means it takes about an hour to do a single sub, but excluding clean up, that's mostly an hour that I don't have to be watching anything too closely. I don't believe there's an option to batch voice isolate audio files since DRS is more of a video editor than a pure audio editor.

I've also used Media.io (the paid website version) to isolate vocals and that worked okay. I just didn't feel like paying a monthly subscription after trying it for a month. You can also use Media.io for free, but I believe there is a file size limitation with the free version. There's a lot of free versions of vocal removal software/websites out there, but most of what I found limits the file size to less than 100 MB.

And yes, lol, the lewd language barrier... It isn't entirely a deal breaker to me. I've been trying to learn Japanese and had in the past tried to sub videos on my own, but man, that process takes days or weeks for someone like me who doesn't know Japanese super well.

The way I look at it, optimistically, is that I am very familiar with the lewd Japanese words so I can easily go back and clean those parts up if I want to, but I don't really need the subs since I already understand what they are saying. Most of the subs I am searching for in a video are in the conversational foreplay/plot to heighten the sex, because nobody does porn plots like the Japanese, lmao. Now I just have to remember all the videos I've been desperate for subs for...
 

panop857

Active Member
Sep 11, 2011
155
222
For Whisper, there's probably better settings for "compression_ratio_threshold" that are more suitable for dialogue during sex scenes.

There is a new "initial_prompt" option that might be able to kick it into some dirtier talk, but there's not really a lot of guides that are useful, and movies with slower intros might "forget" the initial seed later on in the plot.
 
Last edited:

r00g

Member
Jul 25, 2009
84
70
ffmpeg can do some processing on audio as it extracts it from a video file. I am currently using this command that will find all video files in a directory and then write out an mp3 for each. It also does a dynamic normalization to increase the volume of quiet parts of the track. (This is a bash shell command.)

Code:
find . -type f -iregex '.*\.\(m4v\|mp4\|mov\|wmv\|avi\|mpg\|mpeg\|rmvb\|rm\|flv\|asf\|mkv\|webm\)' -print0 | parallel -0 ffmpeg -loglevel 0 -i {} -af "dynaudnorm=f=150" -ac 1 -ar 16000 -vn {.}.mp3

There are other ffmpeg settings that can do dynamic range compression and other fancy things.
https://ffmpeg.org/ffmpeg-all.html#dynaudnorm

Have had very good luck w/ the above, so not seeing much need to try and isolate voices from the regular audio track,
 
  • Like
Reactions: superman4207

superman4207

JAV Perv Enthusiast
Jul 4, 2022
49
88
This doesn't work for me, for some reason. When I tell it to extract the audio no file gets created.
I'm going to write out the steps I have to take, and you tell me where it stops working for you.

Converting video to audio in VLC Player

1. Open VLC.
2. Go to Media > Convert / save.
3. Select Add... and browse to your file(s).
4. After you files have been added, select Convert / Save at the bottom. This should take you to a new window.
5. Convert should already be selected. Click the wrench icon next to the dropdown by Profile.
6. Select your desired output on the Encapsulation tab. You can also change the audio codec settings if you desire.
7. Click Save.
8. Click Browse to choose where to save your file if you're only converting a single file.
- Note that if you do multiple files, it won't ask you where to save your output file, it will save the converted file to their original directories.
- If you only convert a single file, remember to select where to save it and to change the file extension to the correct one.
- If you don't select the save destination for the file, you can't start conversion.
- I'm speaking as a Windows user about changing the extension. I don't know about the Mac interface. You don't really have to change the file extension to the correct one, it's really just good practice so you don't confuse which file is which. For example, when I convert a single mp4 file and I don't change the file extension or the name of the file, it saves as the same file name as the video file. You don't want to lose memory accidentally translating a multi GB file.
9. Click Start. The conversion process will either happen automatically or you may have to hit the play button to start the conversion process. The latter is not very intuitive and I currently have no understanding why that happens, but it does.

My best guess is step 9 is where you may believe it's not working for you.
 
Last edited:
  • Like
Reactions: Imscully

Electromog

Akiba Citizen
Dec 7, 2009
4,436
2,705
I didn't do the end of 5 and 6 before because it already said mp3. When I followed your steps but not actually changing the setting at that step and just hitting save it suddenly worked. No idea why you have to do that when it's already set to mp3, but at least it works now. Thanks.
 
  • Like
Reactions: superman4207

panop857

Active Member
Sep 11, 2011
155
222
Seriously, the gap in quality between Whisper and most JAV subtitles is huge. RBD-179's translation on Subtitle Cat's first 2 minutes:

1
00:00:06,450 --> 00:00:07,500
my name is

2
00:00:08,310 --> 00:00:09,330
Haruka Fujisaki

3
00:00:12,270 --> 00:00:13,680
when i was in high school

4
00:00:14,310 --> 00:00:15,960
I lost his father

5
00:00:16,950 --> 00:00:18,840
to help my mother who was left behind

6
00:00:19,560 --> 00:00:20,730
after graduation

7
00:00:20,940 --> 00:00:22,080
a friend's restaurant

8
00:00:22,080 --> 00:00:23,610
decided to work at

9
00:00:24,990 --> 00:00:25,990
and

10
00:00:26,670 --> 00:00:29,880
With Kazuo who was a customer of the store

11
00:00:30,510 --> 00:00:30,750
3

12
00:00:30,778 --> 00:00:32,820
Married after dating for years

13
00:00:34,020 --> 00:00:35,820
with his sister Miku-chan

14
00:00:36,630 --> 00:00:40,170
I was living a happy life with my mother

15
00:00:41,700 --> 00:00:42,700
However

16
00:00:43,350 --> 00:00:43,470
2

17
00:00:43,500 --> 00:00:44,500
Months ago

18
00:00:45,000 --> 00:00:48,030
Her mother, who had been hospitalized with a mild illness, died suddenly.

19
00:00:50,250 --> 00:00:51,250
I

20
00:00:51,840 --> 00:00:53,850
her mother is in the hospital

21
00:00:54,450 --> 00:00:54,990
medical error

22
00:00:54,990 --> 00:00:56,580
I decided to sue

23
00:00:58,290 --> 00:00:59,290
However

24
00:00:59,760 --> 00:01:00,990
in a recession from time to time

25
00:01:01,650 --> 00:01:02,910
restaurant where I worked

26
00:01:02,910 --> 00:01:04,890
suddenly decided to close the store

27
00:01:05,820 --> 00:01:06,820
I

28
00:01:07,410 --> 00:01:07,650
fat

29
00:01:07,800 --> 00:01:08,250
group

30
00:01:08,250 --> 00:01:08,820
The restaurant in

31
00:01:08,820 --> 00:01:10,410
I started working at

32
00:01:13,980 --> 00:01:14,980
she is

33
00:01:15,300 --> 00:01:16,170
Yabuta Group

34
00:01:16,170 --> 00:01:17,700
Sachiko-san, a talented person of

35
00:01:19,170 --> 00:01:20,730
a stranger to me

36
00:01:21,480 --> 00:01:22,050
manager

37
00:01:22,080 --> 00:01:24,060
I am the benefactor who took me as

38
00:01:25,620 --> 00:01:26,620
and

39
00:01:27,420 --> 00:01:29,010
sitting in

40
00:01:29,310 --> 00:01:31,350
My sister-in-law Miku-chan

41
00:01:32,880 --> 00:01:34,170
Miku-chan is now

42
00:01:35,280 --> 00:01:37,080
I go to Squirting Girls' Academy

43
00:01:38,430 --> 00:01:39,430
Haruka

44
00:01:39,540 --> 00:01:40,540
Miki

45
00:01:41,310 --> 00:01:44,940
The two of you are helping me out at the shop, and I'm really saved.

46
00:01:48,300 --> 00:01:49,440
Even so

47
00:01:50,310 --> 00:01:51,690
a little sooner

48
00:01:52,290 --> 00:01:54,180
If I had met Haruka

49
00:01:55,308 --> 00:01:58,110
I was able to see your mother at the hospital in
Now let's look at even the Medium basic model from Whisper. The only thing it gets wrong is who married Mr. Kazuo. Her mother married Kazuo. I see that mistake a lot, with it incorrectly identifying the subject of a sentence. It is often after these types of intros, and is probably because of the link between sentences-- it is very common for introductions to be talking about themselves, so it is learning that incorrectly here.
[00:00.000 --> 00:23.000] My name is Haruka Fujisaki When I was a high school student, I lost my father and decided to work at a restaurant that I knew after graduation to help my mother who was left behind.
[00:23.000 --> 00:40.000] And I married Mr. Kazuo, who was a customer at the restaurant, and his sister Miku, who had been married for three years, and my mother's four children.
[00:40.000 --> 00:56.000] However, two months ago, my mother, who was hospitalized for a mild illness, suddenly died, and I decided to sue the hospital where my mother was hospitalized for a medical mistake.
[00:56.000 --> 01:12.000] However, the restaurant I was working at suddenly closed, and I decided to work at a restaurant in Yabuta Group.
[01:12.000 --> 01:37.000] She is Yukiko, the owner of Yabuta Group. She hired me as her manager. And Miku, my sister, is sitting next to me. Miku is currently going to Kusunoki Jogakuin.
[01:37.000 --> 01:47.000] Haruka-san, Miku-san, you two are really helpful for helping the restaurant.
[01:47.000 --> 01:59.000] Even so, if I had met Haruka-san a little earlier, I would have been able to see my mother at my hospital.
[01:59.000 --> 02:03.000] I'm really sorry.
[02:03.000 --> 02:08.000] Yukiko-san, thank you very much.
I am getting these nice long sentences with punctuation because I set beam_size to 12 or 15. 20 crashes on my computer, but I think the misplaced "I" pronoun is because of where it chose to make the sentence/line break.
 
Last edited:

r00g

Member
Jul 25, 2009
84
70
Ooh, nice suggestions on adjusting beam_size and temperature hyperparameters - will have to try those out.
One tuning that I have found useful to reduce hallucinations is to reduce no_speech_threshold down to 0.1 or lower. In addition, setting conditon_on_previous_text False also helps reduce repetition loops. (found at https://scanlover.com/d/17746-new-open-source-transcription-and-translation-software)

These two changes gave me results that were much better than using a VAD like in WhisperWithVAD is doing - using the VAD meant that there were nearly zero hallucinations, but it ended up missing a lot of lines.
 

Chuckie100

Well-Known Member
Sep 13, 2019
505
1,963

BKD-168 Mother-to-child Copulation ~ Okunikko Road ~ Lena Fukiishi

bkd168sopl.jpg

I used Whisper to produce this subtitle file for BKD-168. I really like this actress's boobs, but probably not much of a story line. As always however, I still had to clean it up a bit and re-interpreted some of the meaningless dialog. Again, I don't understand Japanese or Chinese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think.​

 

Attachments

  • BKD-168 Rene - Copy.rar
    10.2 KB · Views: 308

Makkdom

Well-Known Member
Mar 4, 2019
157
385
I took one of the files that porgate55555 included in his batch of Whisper files recently and tried to spruce it up a bit. I am afraid my linguistic and technical limitations prevented me from doing as well as I would like, but I think it was worth my efforts. My thanks to porgate for including that file and for introducing me to this particular JAV.

ATID-497 A Two-Year Record Of Breaking Her In Until She Loses Her Head Over Sex And Awakens To Her Masochism. Nanami Kawakami​


This is without a doubt one of the best JAVs I have ever seen. This surprised me because it includes a male actor I usually don't like and follows the BDSM training trope that can get pretty routine after you've seen a few of them. This one, though, is saved by some original touches in the script and the absolutely FANTASTIC acting by Nanami Kawakami! She has incredible chemistry with her co-star and is very believable as a happy housewife who just feels an emptiness in her life and finds a man that can fill it for her. (As an example of how good she is, just watch her face the first time she sees the man's cock.) I could go on and on about what I think is so great about this video, but I particularly like the framing device used here. The first scene immediately establishes the dynamic between Nanami and her Master; then there is a flashback 2 years to show how it all starts. As one would expect, near the end we come back to that opening scene again, but there are some subtle and significant changes when we go through it a second time.

Anyway, take a look and see what I mean. And if there is anyone with better skills than mine who wants to give the file another edit, that would be great.


atid-497.jpg
 

Attachments

  • ATID-497-W-ver.2.srt.zip
    7.8 KB · Views: 303

ericf

Well-Known Member
Jan 13, 2007
234
528
Just a tip on Whisper about translation: Don't use it. Just choose No Translation and do that later in Google translate or DeepL translate (use Tor Browser so you don't get blocked for using up your free ratio (500.000 characters a month). You will need to separate the text from the timing first and then re-attach it as some of the translators introduces glitches in it.
I don't know how good the translator Whisper uses is (you can choose DeepL, too) but if you don't have a basic reference to what is actually transcribed, you can't figure out what the line is if the translator produces garbage.

I can only use the collab page so many of the 'settings' mentioned in posts before this can't be used. I have experimented with VAD-Threshold and have settled on 0.3. What does Chunk_Threshold (3.0) do? Size of the audio parts analyzed? I'll try to lower and raise the number to see if there are improvements. Source separation didn't work for me. I get error messages.
 
Last edited:
  • Like
Reactions: ironfevers

Chuckie100

Well-Known Member
Sep 13, 2019
505
1,963
Here are two raw Whisper files for BKD-127 and BKD-153. I don't intend to clean them as I just want to use them to compare to existing EroJapanese subtitles in an attempt to get a better understanding of what Whisper means with dialog like "I'm hungry", I'm sleepy, etc. rather than just relying on my wild ass guess. My goal is to create a spreadsheet to use with the more lewd EroJapanese vocabulary to use when re-interpreting Whisper's dialog.
 

Attachments

  • BKD-153 Eriko.rar
    12 KB · Views: 232
  • BKD-127 Ayumi.rar
    8.1 KB · Views: 220

panop857

Active Member
Sep 11, 2011
155
222
Just a tip on Whisper about translation: Don't use it. Just choose No Translation and do that later in Google translate or DeepL translate (use Tor Browser so you don't get blocked for using up your free ratio (500.000 characters a month). You will need to separate the text from the timing first and then re-attach it as some of the translators introduces glitches in it.
I don't know how good the translator Whisper uses is (you can choose DeepL, too) but if you don't have a basic reference to what is actually transcribed, you can't figure out what the line is if the translator produces garbage.

I can only use the collab page so many of the 'settings' mentioned in posts before this can't be used. I have experimented with VAD-Threshold and have settled on 0.3. What does Chunk_Threshold (3.0) do? Size of the audio parts analyzed? I'll try to lower and raise the number to see if there are improvements. Source separation didn't work for me. I get error messages.
It depends what you're going for. The value of Whisper isn't to get the highest quality translation possible, it is about lowering the effort bar to get something good enough. Medium Whisper, with well-chosen parameters and some manual editing from just watching the move and editing as you go, is going to be better and more complete than like 95% of the subtitles that are currently out there for JAV. The places that are good and doing manual translations tend to miss a bunch of the smaller text, but do a better job of making the lines that are supposed to sound dirty be appropriately so.

In terms of making a script dirty, it needs either a modified model file (as downloaded from HuggingFace). Better training data for JAV will help that, but there's also "logit biasing" that may make the translation more willing to output dirty language. That is very advanced stuff though.
I think it is likely that they extend the concept of "initial prompt" to extend across the entire transcription, which would be equivalent sort of to logit biasing.

Here are two raw Whisper files for BKD-127 and BKD-153. I don't intend to clean them as I just want to use them to compare to existing EroJapanese subtitles in an attempt to get a better understanding of what Whisper means with dialog like "I'm hungry", I'm sleepy, etc. rather than just relying on my wild ass guess. My goal is to create a spreadsheet to use with the more lewd EroJapanese vocabulary to use when re-interpreting Whisper's dialog.

And due to Whisper primarily being done in Python, you can turn that spreadsheet into a simple replacement dictionary that saves all of the text in a raw file, and then goes through with the filter and saves out the subtitles with the common replacements. Whoever designed the Collab space could likely easily do that if their intent was to make something JAV friendly.
 
Last edited:

panop857

Active Member
Sep 11, 2011
155
222
I am getting close to the point where I could make a Whisper thread that discusses how to best use it. I want to see if I can get one of the JP focused models working though.

"--conditon_on_previous_text False" will prevent some of the repeated lines, but lead to worse translations everywhere else. Random actor names and the such. Conditioning on previous text is how Whisper learns the context of the dialogue, but it can overly rely on the previous lines I guess.

Temperature > 0 is probably a better way to escape the repeated line problems.