[th]
|
[th]
|
[th]
|
[th]
|
Yea, I love them too. I think there were actually 8 with the last feature in DVDES-794 being the 8th one, packaged together with shortened versions of the first 7. Not a bad run of similar storylines.Thanks Chuckie! Love this Mom Sex Education films! Looking at that cover it looks like there are 7 in the series.
btw, looks like EROJapanese is selling all their subs in their inventory! Over 1,000 for a low price! Not bad!
wow -- you've been busyHello everyone, it's been a long time since the last post, which means i have made a lot of transcriptions and this is the list.
(many of them already are on subtitlecat)
****All are raw files made with Whisper****
200GANA (2894)
....
IStay tuned for revised infoIt's been a long time since I posted. I recently discovered how good Deepseek is, and I got excited. I spent ~30 hours on this Python Script to translate Chinese SRT files to English using Deepseek v3. Why use it? My Python script translates in batches, so it has the context of previous lines. Create an account on fireworks ai. Click on your profile picture and get your API KEY. It comes with $1 free credit. Check usage in billing. $1 can translate at least 100 subtitles. I wouldn't top up as it's not the cheapest. I plan to convert the Python script to use the official Deepseek API which is cheaper but I'll wait for my ChatGPT o3 quota to refresh before coding again. I couldn't make more subs as I spent all my credit testing the same subs over and over again for any differences when I change the system prompt and temperature values.
Set your API KEY in powershell and restart the terminal. I put some instructions at the top of the script but if you get stuck just ask AI. The script is ready to use! I put all the best default settings. I find temperature 0.9 to be the upper limit for explicit dirty language before responses are likely to ignore the rules and format. If you find translations mismatching with the timecode, then lower temperature by 0.1 each run. Top_p 0.95 is good. SYS_MSG is already good but if you want to change the erotic instructions then that's ok. BATCH_SIZE_DEFAULT 500 is a good upper limit. Too high and it will exceed the max tokens, too low and it will make more API calls, increasing costs. If you need to change source language to Japanese, then rewrite the SYS_MSG and rewrite the example.
I'll talk about the challenges and how the code works in case anyone wants to play with the code. Fixing bugs of problematic responses and testing took majority of the time. "SEQ_NUMBER\nORIGINAL_TEXT <eol>\n" is how I send it to get translated. I omit the timecode to save tokens. Before this I tried a <space> in between the number and text but it wasn't a robust structure and I encountered line merges all the time, so do not do this. The <eol> is a safety net that reinforces the structure. When temperature is too high, the responses are less likely to follow the rules. Problematic responses given back can contain ">" after the number, the incorrect number, the English translation will merge two Chinese lines and translate as one (this is the biggest problem which causes timecode mismatch when combining), and other quirky oddities. I accounted for nearly all this in the code. Map_translated function parses the response and cleans it up. Suspected short and missing lines get sent to the prompt again for re-translation. If you need to fix bugs, the log files are useful for checking responses. Things I wanted to test but got lazy: testing different values of top_p between 0.96 to 1 with different values of temperature, and testing the code without inserting <eol>. If we truly don't need <eol>, then we can save a few tokens each line, but I'm not 100% sure if it's useful in preventing line merges or not. Anyway, you're free to do whatever you want with the code.
Subs of Mori Hinako (favorite actress), Akai Miki, ROE-168 (mother-son), MIAA-750 (female slutty boss, amazing loud plopping cowgirl), YUJ-031 (high energy, enthusiastic girl that kisses a lot) and more!
View attachment 3670033View attachment 3670034View attachment 3670035View attachment 3670036View attachment 3670037
1500 .srt files
.... I welcome any tweaking suggestions.
Mei, thank you for this input. I have been wrestling back and forth with V2 and V3. I had originally be advised by a reliable source that V2 was actually superior to V3 with respect to Japanese-English translation. Later the same source said , V3 is superior. My intent with these subs is to upload them for Akiba members and consequently I want to have the most usable/accurate translations. I installed the faster-whisper model but I have never been able to get it to work. I honestly would prefer to use V2 because it is quicker but that said my own 'research' seems to indicate V3 is better. Just an example, in one file that i tested the interviewer in the beginning was referring to Gokkun. IN V2 it was translated as Kokkun while V3 translated it as Gokkun. Now I'm sure not many people wouldn't have had a problem making the connection with V2's Kokkun but again I am after the most perfect translation that i can find. I'll add the Word timestamps arg to my batch but could you please clarify for me about V2 vs. V3 because if I can get away with V2 and speed up the process that would be super-keano. Thanks.Thanks for the collection. I plan to check them out during the weekend.
Meanwhile I noticed you're not using --word_timestamps. That should give you more accurate timing.
Also, you're using --task translate with model large-v3. That model was not trained for translation task over large-v2. It was only additionally trained for --task transcribe.
VENU-617 Relatives [silence] Gonna Have A Dad In Incest Next To ... Ayumi Shinoda
View attachment 3669937
I recently downloaded what I thought was a reduced mosaic version of VENU-613 but what I got was a reduced mosaic version of VENU-617 starring Ayumi Shinoda my GOAT of JAV actresses! I used WhisperJAV0.7 to update a previous Sub that I had posted and I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think.
The movie had a lot of action but little dialog; however, it did have a lot of Ayumi!.
I've done some tests as well and V2 is just better from the tests I've done. While there were a few tests where V3 was better, in general V2 was the clear winner for me. From what I've read is that V3 is only better if the audio is super clear, like a podcast. In JAV the audio level quality can vary a lot.Mei, thank you for this input. I have been wrestling back and forth with V2 and V3. I had originally be advised by a reliable source that V2 was actually superior to V3 with respect to Japanese-English translation. Later the same source said , V3 is superior. My intent with these subs is to upload them for Akiba members and consequently I want to have the most usable/accurate translations. I installed the faster-whisper model but I have never been able to get it to work. I honestly would prefer to use V2 because it is quicker but that said my own 'research' seems to indicate V3 is better. Just an example, in one file that i tested the interviewer in the beginning was referring to Gokkun. IN V2 it was translated as Kokkun while V3 translated it as Gokkun. Now I'm sure not many people wouldn't have had a problem making the connection with V2's Kokkun but again I am after the most perfect translation that i can find. I'll add the Word timestamps arg to my batch but could you please clarify for me about V2 vs. V3 because if I can get away with V2 and speed up the process that would be super-keano. Thanks.
I'm in the early stages of converting my Second round. I think that I'll re-set and run it with V2 for about a Week and see what I come up with. Thanks for the input T22.I've done some tests as well and V2 is just better from the tests I've done. While there were a few tests where V3 was better, in general V2 was the clear winner for me.
From what I've read is that V3 is only better if the audio is super clear, like a podcast. In JAV the audio level quality can vary a lot.
I guess that is one advantage of not knowing the Japanese lanuage!Of course, one could imagine/dream about real incest, watching whatever exciting movie.![]()
Well I just spent the last couple of Hours goofing around with WAAA-501. IT is very very weird. It is almost like there's a second audio track that his coded into the file and Whisper is decoding that. I tried V2 and V3, I tried upscaling the video to see if the re-encoding may resolve the issue. I'm flummoxed! The only thing I can suggest is have someone here manually create a subtitle file. There was a cat doing it a few Months ago but I cannot remember his name. I wish that i could have found the solution but no dice here.I've done some tests as well and V2 is just better from the tests I've done. While there were a few tests where V3 was better, in general V2 was the clear winner for me. From what I've read is that V3 is only better if the audio is super clear, like a podcast. In JAV the audio level quality can vary a lot.
I also have better results with silero_v4_fw vs pyannote_v3 (VAD method). mdx_kim2 (voice extraction method) also doesn't work super great for me, but can help in splitting different speakers I've noticed. But mdx is not in faster_whisper I think ?
There's 1 specific JAV that is really bad for me, and I've tried so many different things on it but I can barely get anything out of it. It's WAAA-501. The audio is super clear as well. There's 1 subtitle on subtitlecat which looks like it has been done with whisperjav and it's having the exact same problem.