Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

Sup guys. I’ve figured the red catalog was dead but now I see u guys talking about it. I tried signing up for the news letter but it fails with an “invalid form” and wants a security code that is not shown. I’ve tried different links but all the old passwords fail.
Got any tips?
 
Hello everyone, it's been a long time since the last post, which means i have made a lot of transcriptions and this is the list.
(many of them already are on subtitlecat)

****All are raw files made with Whisper****

200GANA (2894)
215LES (077)
230ORECZ (006)
277DCV (027 054 130 207 221 245)
285ENDX (442 506)
328HMDNV (583)
348NTR (051)
829ANYA (015)
849OSMSM (003)
ACZD (191)
ADN (544 621)
AGAV (104)
ARAN (085 092)
ASIA (111)
ATID (233)
AUKG (589 609)
AVSA (332 354)
BBAN (408 470)
BLMC (001 012 018 019)
BLOR (247 248 251 253 156 257 261 262)
BOKD (210 281 288 290 291 295)
BOKO (005 014 015 019)
BTIS (115 120 136 137 140 142)
C (2567 2698 2758 2853 2863 2888)
CAWD (336)
CHIN (336)
DANDY (639)
DAPS (53)
DASD (434 506)
DASS (010 351 353 440 456 469 487 493 501 547 563 568 598)
DOKS (611)
FC2PPV (2386297 2498047 2498047 2526023 3135663 4011236
4352001 4356815 4435441 4470067 4501675 4510747 4535126
4546688 4550101 4552872 4554967 4556744 4573817 4574064
4576113 4576889 4576927 4583477 4584124 4586531 4588138
4588714 4588954 4589162 4592046 4592767 4595709 4596068
4597077 4597313 4607598 4615093 4655067)
FONE (091)
FSDSS (988)
FSTA (004)
FTHTD (012)
GHKQ (97)
GJDD (002)
GOJU (225)
HAZU (139)
HERY (139 145 149 151 152 156)
HMN (588)
HMNF (078)
HND (390)
HSM (037 060 064 066 071 072 074)
HSODA (047 055)
HUNTC (282)
HYBR (017)
IBW (961Z)
IPZ (447)
JRZD (759)
JRZE (182 218)
JUQ (871 952)
JUTA (151)
KTKZ (106)
LC (001)
LZDM (066)
MAS (20)
MCSR (432 549)
MEKO (113 302 326 327)
MEYD (777 946)
MIAB (198)
MIAD (898)
MIMK (008)
MISM (207 315 358)
MOGI (013 034 041 065)
MOOC (001 002)
MOON (035)
MVSD (625 629)
NAIAD (006)
NHDTA (642)
NHDTB (879 937 978)
NHDTC (005)
NVH (002 025 026 038 039 045)
NYH (110 111 112 126 127 202)
OLM (014)
OPPW (001 006 038 053 058 063 069 104 168 171 176)
PETS (036)
PMGG (001 003)
PTS (496)
RBK (100)
RCTD (917)
REAL (856)
REXD (554)
ROE (236 275)
SDAM (126)
SDJS (228 233 260 271)
SDNM (021 093 224 256 325 329 332 334 362 405 417 436 438
471 489 494 498 507)
SDNT (001)
SGSR (355)
SIRO (4985)
SONE (460)
SORA (558)
SPLY (014 015)
SPSC (84)
START (108 171 171 281)
STSK (114)
SW (982)
SXMA (009)
TANF (021)
TANP (021 026 028 029 032 033)
TCD (290 292 295 296)
TEEN (021 029)
TIMD (005)
TLZ (011 014 015)
TPNS (018)
VAIAV (006)
VICD (006)
 

Attachments

Thanks Chuckie! Love this Mom Sex Education films! Looking at that cover it looks like there are 7 in the series.

btw, looks like EROJapanese is selling all their subs in their inventory! Over 1,000 for a low price! Not bad!
 
Thanks Chuckie! Love this Mom Sex Education films! Looking at that cover it looks like there are 7 in the series.

btw, looks like EROJapanese is selling all their subs in their inventory! Over 1,000 for a low price! Not bad!
Yea, I love them too. I think there were actually 8 with the last feature in DVDES-794 being the 8th one, packaged together with shortened versions of the first 7. Not a bad run of similar storylines.
 
Hello everyone, it's been a long time since the last post, which means i have made a lot of transcriptions and this is the list.
(many of them already are on subtitlecat)

****All are raw files made with Whisper****

200GANA (2894)
....
wow -- you've been busy :)
Thanks!
 

[Reducing Mosaic]NSFS-365 Mature Mother 33 ~Forbidden Mother And C***d Elopement Sex~ Yurine Tsukino​

1747488733453.png
I used WhisperJAV0.7 to create this Sub and I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think..
 

Attachments

VENU-617 Relatives [silence] Gonna Have A Dad In Incest Next To ... Ayumi Shinoda

1747661852500.png
I recently downloaded what I thought was a reduced mosaic version of VENU-613 but what I got was a reduced mosaic version of VENU-617 starring Ayumi Shinoda my GOAT of JAV actresses! I used WhisperJAV0.7 to update a previous Sub that I had posted and I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think.

The movie had a lot of action but little dialog; however, it did have a lot of Ayumi!.
 

Attachments

It's been a long time since I posted. I recently discovered how good Deepseek is, and I got excited. I spent ~30 hours on this Python Script to translate Chinese SRT files to English using Deepseek v3. Why use it? My Python script translates in batches, so it has the context of previous lines. Create an account on fireworks ai. Click on your profile picture and get your API KEY. It comes with $1 free credit. Check usage in billing. $1 can translate at least 100 subtitles. I wouldn't top up as it's not the cheapest. I plan to convert the Python script to use the official Deepseek API which is cheaper but I'll wait for my ChatGPT o3 quota to refresh before coding again. I couldn't make more subs as I spent all my credit testing the same subs over and over again for any differences when I change the system prompt and temperature values.

Set your API KEY in powershell and restart the terminal. I put some instructions at the top of the script but if you get stuck just ask AI. The script is ready to use! I put all the best default settings. I find temperature 0.9 to be the upper limit for explicit dirty language before responses are likely to ignore the rules and format. If you find translations mismatching with the timecode, then lower temperature by 0.1 each run. Top_p 0.95 is good. SYS_MSG is already good but if you want to change the erotic instructions then that's ok. BATCH_SIZE_DEFAULT 500 is a good upper limit. Too high and it will exceed the max tokens, too low and it will make more API calls, increasing costs. If you need to change source language to Japanese, then rewrite the SYS_MSG and rewrite the example.

I'll talk about the challenges and how the code works in case anyone wants to play with the code. Fixing bugs of problematic responses and testing took majority of the time. "SEQ_NUMBER\nORIGINAL_TEXT <eol>\n" is how I send it to get translated. I omit the timecode to save tokens. Before this I tried a <space> in between the number and text but it wasn't a robust structure and I encountered line merges all the time, so do not do this. The <eol> is a safety net that reinforces the structure. When temperature is too high, the responses are less likely to follow the rules. Problematic responses given back can contain ">" after the number, the incorrect number, the English translation will merge two Chinese lines and translate as one (this is the biggest problem which causes timecode mismatch when combining), and other quirky oddities. I accounted for nearly all this in the code. Map_translated function parses the response and cleans it up. Suspected short and missing lines get sent to the prompt again for re-translation. If you need to fix bugs, the log files are useful for checking responses. Things I wanted to test but got lazy: testing different values of top_p between 0.96 to 1 with different values of temperature, and testing the code without inserting <eol>. If we truly don't need <eol>, then we can save a few tokens each line, but I'm not 100% sure if it's useful in preventing line merges or not. Anyway, you're free to do whatever you want with the code.

Subs of Mori Hinako (favorite actress), Akai Miki, ROE-168 (mother-son), MIAA-750 (female slutty boss, amazing loud plopping cowgirl), YUJ-031 (high energy, enthusiastic girl that kisses a lot) and more!
cjod439pl.jpgjuq939pl.jpgroe168pl.jpgmiaa750pl.jpgyuj031pl.jpg
 

Attachments

Just learned about this. Use the new model for improved translations. Change from MODEL = "accounts/fireworks/models/deepseek-v3" to MODEL = "accounts/fireworks/models/deepseek-v3-0324". My first run caused line merges at TEMPERATURE = 0.9. My new recommended default is TEMPERATURE = 0.8

Nao Jinguji, she is getting thick. Beautiful jiggly ass!
midv553pl.jpg
 

Attachments

1500 .srt files

ATTENTION: READ THIS BEFORE DOWNLOADING SUBS


The following are the configurations for both my .bat file and post-processing .py file. These are the commands that I gave to WhisperAI to attempt to get the best possible translations. One issue, for example , is that to clean up hallucinations instead of repeating the same words over and over the .py file will just remove them. Now this can be a problem because sometimes hallucinations carry on where there is dialog and so you may find places where the actors are clearly speaking but there is no translation. This can't be helped because of the quirkiness ofWhisper. What I have attempted to do is to ameliorate the worst of the idiosyncrasies of Whisper to yield the best possible result. That does not mean perfect. Now, my purpose for this disclaimer is that I still have about 20K files to process. It is a long and boring process and doing so means, even with my stats, that I have to put other processes on the shelf to perform this. And so, I will becontinuing to upload a few hundred titles at a time when I can accumulate them. I welcome any tweaking suggestions.

So, Here is my Whisper.bat and clean_subs.py data. Bear in mind the details of my rig, I9-14900K/Nvidia 4080 Super (16G V-RAM)/96G DDR5 RAM/Samsung 990 PRO NVMe/Windows 11 Pro, because if you should choose to take advantage of my configs that these settings are based on a high-performance PC. OK, here are the settings to be followed by a link to a ZIP containing aroud 1500 .srt of Subtitle files. For newer users of these files you may have to make sure that your filename and the srt file are the same. Cheers





Here is the link for the subs, there's about 1500 titles :
 

Attachments

Last edited:
It's been a long time since I posted. I recently discovered how good Deepseek is, and I got excited. I spent ~30 hours on this Python Script to translate Chinese SRT files to English using Deepseek v3. Why use it? My Python script translates in batches, so it has the context of previous lines. Create an account on fireworks ai. Click on your profile picture and get your API KEY. It comes with $1 free credit. Check usage in billing. $1 can translate at least 100 subtitles. I wouldn't top up as it's not the cheapest. I plan to convert the Python script to use the official Deepseek API which is cheaper but I'll wait for my ChatGPT o3 quota to refresh before coding again. I couldn't make more subs as I spent all my credit testing the same subs over and over again for any differences when I change the system prompt and temperature values.

Set your API KEY in powershell and restart the terminal. I put some instructions at the top of the script but if you get stuck just ask AI. The script is ready to use! I put all the best default settings. I find temperature 0.9 to be the upper limit for explicit dirty language before responses are likely to ignore the rules and format. If you find translations mismatching with the timecode, then lower temperature by 0.1 each run. Top_p 0.95 is good. SYS_MSG is already good but if you want to change the erotic instructions then that's ok. BATCH_SIZE_DEFAULT 500 is a good upper limit. Too high and it will exceed the max tokens, too low and it will make more API calls, increasing costs. If you need to change source language to Japanese, then rewrite the SYS_MSG and rewrite the example.

I'll talk about the challenges and how the code works in case anyone wants to play with the code. Fixing bugs of problematic responses and testing took majority of the time. "SEQ_NUMBER\nORIGINAL_TEXT <eol>\n" is how I send it to get translated. I omit the timecode to save tokens. Before this I tried a <space> in between the number and text but it wasn't a robust structure and I encountered line merges all the time, so do not do this. The <eol> is a safety net that reinforces the structure. When temperature is too high, the responses are less likely to follow the rules. Problematic responses given back can contain ">" after the number, the incorrect number, the English translation will merge two Chinese lines and translate as one (this is the biggest problem which causes timecode mismatch when combining), and other quirky oddities. I accounted for nearly all this in the code. Map_translated function parses the response and cleans it up. Suspected short and missing lines get sent to the prompt again for re-translation. If you need to fix bugs, the log files are useful for checking responses. Things I wanted to test but got lazy: testing different values of top_p between 0.96 to 1 with different values of temperature, and testing the code without inserting <eol>. If we truly don't need <eol>, then we can save a few tokens each line, but I'm not 100% sure if it's useful in preventing line merges or not. Anyway, you're free to do whatever you want with the code.

Subs of Mori Hinako (favorite actress), Akai Miki, ROE-168 (mother-son), MIAA-750 (female slutty boss, amazing loud plopping cowgirl), YUJ-031 (high energy, enthusiastic girl that kisses a lot) and more!
View attachment 3670033View attachment 3670034View attachment 3670035View attachment 3670036View attachment 3670037
I just posted, following this post, a link to about 1500 .srt files. In the post I listed my whisper.bat and post-processing clean_subs.py files. You may find some of my settings of use to you. Cheers.
 
1500 .srt files

.... I welcome any tweaking suggestions.

Thanks for the collection. I plan to check them out during the weekend.
Meanwhile I noticed you're not using --word_timestamps. That should give you more accurate timing.
Also, you're using --task translate with model large-v3. That model was not trained for translation task over large-v2. It was only additionally trained for --task transcribe.
 
  • Like
Reactions: DScott
Thanks for the collection. I plan to check them out during the weekend.
Meanwhile I noticed you're not using --word_timestamps. That should give you more accurate timing.
Also, you're using --task translate with model large-v3. That model was not trained for translation task over large-v2. It was only additionally trained for --task transcribe.
Mei, thank you for this input. I have been wrestling back and forth with V2 and V3. I had originally be advised by a reliable source that V2 was actually superior to V3 with respect to Japanese-English translation. Later the same source said , V3 is superior. My intent with these subs is to upload them for Akiba members and consequently I want to have the most usable/accurate translations. I installed the faster-whisper model but I have never been able to get it to work. I honestly would prefer to use V2 because it is quicker but that said my own 'research' seems to indicate V3 is better. Just an example, in one file that i tested the interviewer in the beginning was referring to Gokkun. IN V2 it was translated as Kokkun while V3 translated it as Gokkun. Now I'm sure not many people wouldn't have had a problem making the connection with V2's Kokkun but again I am after the most perfect translation that i can find. I'll add the Word timestamps arg to my batch but could you please clarify for me about V2 vs. V3 because if I can get away with V2 and speed up the process that would be super-keano. Thanks.
 

VENU-617 Relatives [silence] Gonna Have A Dad In Incest Next To ... Ayumi Shinoda

View attachment 3669937
I recently downloaded what I thought was a reduced mosaic version of VENU-613 but what I got was a reduced mosaic version of VENU-617 starring Ayumi Shinoda my GOAT of JAV actresses! I used WhisperJAV0.7 to update a previous Sub that I had posted and I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think.

The movie had a lot of action but little dialog; however, it did have a lot of Ayumi!.

About contents of that stepmother/stepson secret sex - next to son's father (and remarried mature woman/stepmother's husband).
Actress - Ayumi Shinoda - who was active at Japanese porn industry on 2010, and after first retirement (during that was made a significant boobs upgrade) returned in 2014 and retired again on 2016. Currently 44 years old, but not performing any more already 9 years. In particular movie - VENU-617 she is in real life 36 years old.

Stepson - actor Oyoyo Nakano , started his career in Japanese porn industry on 2006 when 20 years old. In VENU-617 he is in real life 30 years old.

Son's father and Ayumi's remarried husband - actor Naoki Otsuka - close to 50 years old.


Introductory text for VENU-617 from FANZA page -

"My new remarried husband has a son from a previous marriage, called Tohru.
He seems to be at a difficult age, and even now, after six months our marriage,
Tohru still won't talk to me properly. I didn't think it would be that easy to become
a parent and child, but I wanted us to at least have a fun chat...
Then, one day, after getting out of the bath, I encountered Tohru looking through
my underwear in the dressing room.
Surprised, I confronted Tohru, but he said something I never expected..."

Just to explain about real contents of this movie - if you don't understand Japanese...
This is movie about stepmother/stepson intercourse, no blood connection.

Of course, one could imagine/dream about real incest, watching whatever exciting movie. :)
 
  • Like
Reactions: Chuckie100
Mei, thank you for this input. I have been wrestling back and forth with V2 and V3. I had originally be advised by a reliable source that V2 was actually superior to V3 with respect to Japanese-English translation. Later the same source said , V3 is superior. My intent with these subs is to upload them for Akiba members and consequently I want to have the most usable/accurate translations. I installed the faster-whisper model but I have never been able to get it to work. I honestly would prefer to use V2 because it is quicker but that said my own 'research' seems to indicate V3 is better. Just an example, in one file that i tested the interviewer in the beginning was referring to Gokkun. IN V2 it was translated as Kokkun while V3 translated it as Gokkun. Now I'm sure not many people wouldn't have had a problem making the connection with V2's Kokkun but again I am after the most perfect translation that i can find. I'll add the Word timestamps arg to my batch but could you please clarify for me about V2 vs. V3 because if I can get away with V2 and speed up the process that would be super-keano. Thanks.
I've done some tests as well and V2 is just better from the tests I've done. While there were a few tests where V3 was better, in general V2 was the clear winner for me. From what I've read is that V3 is only better if the audio is super clear, like a podcast. In JAV the audio level quality can vary a lot.

I also have better results with silero_v4_fw vs pyannote_v3 (VAD method). mdx_kim2 (voice extraction method) also doesn't work super great for me, but can help in splitting different speakers I've noticed. But mdx is not in faster_whisper I think ?

There's 1 specific JAV that is really bad for me, and I've tried so many different things on it but I can barely get anything out of it. It's WAAA-501. The audio is super clear as well. There's 1 subtitle on subtitlecat which looks like it has been done with whisperjav and it's having the exact same problem.
 
Last edited:
  • Like
Reactions: DScott
I've done some tests as well and V2 is just better from the tests I've done. While there were a few tests where V3 was better, in general V2 was the clear winner for me.

From what I've read is that V3 is only better if the audio is super clear, like a podcast. In JAV the audio level quality can vary a lot.
I'm in the early stages of converting my Second round. I think that I'll re-set and run it with V2 for about a Week and see what I come up with. Thanks for the input T22.
 
I've done some tests as well and V2 is just better from the tests I've done. While there were a few tests where V3 was better, in general V2 was the clear winner for me. From what I've read is that V3 is only better if the audio is super clear, like a podcast. In JAV the audio level quality can vary a lot.

I also have better results with silero_v4_fw vs pyannote_v3 (VAD method). mdx_kim2 (voice extraction method) also doesn't work super great for me, but can help in splitting different speakers I've noticed. But mdx is not in faster_whisper I think ?

There's 1 specific JAV that is really bad for me, and I've tried so many different things on it but I can barely get anything out of it. It's WAAA-501. The audio is super clear as well. There's 1 subtitle on subtitlecat which looks like it has been done with whisperjav and it's having the exact same problem.
Well I just spent the last couple of Hours goofing around with WAAA-501. IT is very very weird. It is almost like there's a second audio track that his coded into the file and Whisper is decoding that. I tried V2 and V3, I tried upscaling the video to see if the re-encoding may resolve the issue. I'm flummoxed! The only thing I can suggest is have someone here manually create a subtitle file. There was a cat doing it a few Months ago but I cannot remember his name. I wish that i could have found the solution but no dice here.