akiba resident JAV subtitlers & subtitle talk★NOT A SUB REQUEST THREAD★

JetTee

New Member
Dec 4, 2022
1
0
Hopefully someone can help me out here as I'm tearing my hair out over this one. I've used pyTranscriber to get some basic translations/timestamps on a video, but after getting it to work once every single subsequent .srt file it churns out is blank. It produces .txt files just fine. Any suggestions?
 

Taako

Well-Known Member
May 25, 2017
866
590
Hopefully someone can help me out here as I'm tearing my hair out over this one. I've used pyTranscriber to get some basic translations/timestamps on a video, but after getting it to work once every single subsequent .srt file it churns out is blank. It produces .txt files just fine. Any suggestions?
1. Restart your computer again and try again.
2. You can make sure nothing in the background is running with CTRL+ATL+DEL and then try it.
3. You can try uninstalling the software and then reinstall. Btw I've been using it 3 times today and no problems. I have the portable version 1.4V
good luck:)
 
  • Like
Reactions: Imscully

mei2

Active Member
Dec 6, 2018
112
206
I've written a notebook that combines Whisper with a separate VAD. It works much better than Whisper alone on long-form inputs, and also runs about 2-4x faster.

@Non_Entity , by any chnace could you add a feature to your WhisperVAD notebook to go through all files in a drive folder? In my view your implementation of the VAD enabled Whisper has been better than any other ones I have seen. Well done!!!
 

Prinsipe

Member
Aug 31, 2013
31
16
It's like Maload said. The clearer the movie dialog, the better pyTranscriber works. And along with Audacity it's great. Remember pytranscriber will give you timing code and some subs at the very least. The rest is up you :D
You can use both the .mp4(audio/video file) or mp3(audio file) with pyTranscriber. I think mp3 works best because with Audacity you can make the audio a little better.

And remember, 90% of JAV dialog is the same words/phrases.

You should be prepare to guess some scenes as the music and other people talking is just too hard to understand.. unless you know Japanese.

If it's girls-- and they're at school-- and it's just background talk-- then just say, Girl A: How was the exam? Or something like that. Girl B: Um, I didn't score well. Or something like that.

But look/listen for the main star(s) to talk louder OR whoever talks the loudest. That's your focus when everyone talking at once. OR just skip it. Sometimes its just better than stressing ;)
Hi - how can you make the audio file (mp3) with Audicity to have better-transcribing results? I mean, what do you edit or tweak to make it better recognized by whisper / pytranscriber / or any software you use to transcribe?
 

Taako

Well-Known Member
May 25, 2017
866
590
Hi - how can you make the audio file (mp3) with Audicity to have better-transcribing results? I mean, what do you edit or tweak to make it better recognized by whisper / pytranscriber / or any software you use to transcribe?
My simple advice is increase the volume to reasonable levels, add bass reasonably, listen to it, then export to mp3, and then used that mp3 for pytranscriber. I used this alot with some personal settings.

My difficult advice have you break the scenes using muxtool, and then used the simple advice. When I say scenes... I mean scenes that are usually 20-40 min long.
My difficult advice#2 I won't even recommended even though it's not super difficult, as it's annoying:pI abandon long ago as it takes too long but it works.

Remember NOTHING is 100% perfect and I used pytranscriber mostly for time stamps.
 

TmpGuy

Well-Known Member
Jun 1, 2013
324
423
I've posted this before, but my posts got wiped out due to reasons (long story). I'm posting it again in case it helps anyone.

Essentially, it's a Japanese to English spreadsheet (csv format) of common words and phrases I learned when translating over a dozen titles. The Japanese phrases are written and sorted in romaji (Japanese written using English characters). There is the romaji word or phrase, a literal translation, an alternate translation (in case the phrase is a euphemism), and notes.

I'm far from literate in Japanese, so these are all based on my own research and best guesses based on context. If anyone spots flaws or inconsistencies, or wants to add their own words or phrases, let me know. Also, I translate lesbian titles, so the phrases are skewed in that direction.
 

Attachments

  • JAV Translation Notes.zip
    2.4 KB · Views: 115

maload

Active Member
Jul 1, 2008
480
68
I've posted this before, but my posts got wiped out due to reasons (long story). I'm posting it again in case it helps anyone.

Essentially, it's a Japanese to English spreadsheet (csv format) of common words and phrases I learned when translating over a dozen titles. The Japanese phrases are written and sorted in romaji (Japanese written using English characters). There is the romaji word or phrase, a literal translation, an alternate translation (in case the phrase is a euphemism), and notes.

I'm far from literate in Japanese, so these are all based on my own research and best guesses based on context. If anyone spots flaws or inconsistencies, or wants to add their own words or phrases, let me know. Also, I translate lesbian titles, so the phrases are skewed in that direction.
thank you so much
 
  • Like
Reactions: Imscully

Taako

Well-Known Member
May 25, 2017
866
590
I've posted this before, but my posts got wiped out due to reasons (long story). I'm posting it again in case it helps anyone.

Essentially, it's a Japanese to English spreadsheet (csv format) of common words and phrases I learned when translating over a dozen titles. The Japanese phrases are written and sorted in romaji (Japanese written using English characters). There is the romaji word or phrase, a literal translation, an alternate translation (in case the phrase is a euphemism), and notes.

I'm far from literate in Japanese, so these are all based on my own research and best guesses based on context. If anyone spots flaws or inconsistencies, or wants to add their own words or phrases, let me know. Also, I translate lesbian titles, so the phrases are skewed in that direction.
Those are indeed some very common words in JAV.
I have created such a spreadsheet on notepad 2 and really need to organize mines. It's why I have released it.
I even started alphabetically putting them in order. I have over 2000 words, some are variation of the same words(spelled different because how the actors say it) and given the context on screen.
For example: I have many variation for Nande(kore, gurai, kana, koko, and etc) this is why list is just big, messy, and insane! hahaha
Thank you for your list:cool:
 
Last edited:
  • Like
Reactions: mei2 and Imscully

Taako

Well-Known Member
May 25, 2017
866
590
klako said:
Anyone can recommend JAV where the eng sub significantly change the story? Basically the story is hard to predict if you don't understand what they are saying. Implied there is accurate eng sub, and the story is unique/unexpected, not one of the standard scenario.
-------------------------------------------------------------------------------------------------------------------
I don't get your question? Are you asking if some eng subs made the original story different?
If that's what you're asking...than I will say yes. Machine translations can mess up an original movie pretty bad.

Also, unless you're Japanese, the subs are never gonna 100% accurate.
 
  • Like
Reactions: Imscully

xsf27

New Member
May 23, 2012
12
11
Long time lurker here who is immensely grateful for all of the great work that all you talented and hard working JAV subtitlers have done! It really does add a new level of enjoyment when you actually understand what they are saying, especially in the drama-driven films which I particularly enjoy.

Now, I've been meaning to get off my butt and try my hand at attempting to make some subs for some of my favourite films and hopefully if all goes well, I'll share some of my work here. This is coming from a non-Japanese speaker so the challenges are quite numerous, as you would imagine. Thankfully, there is an abundance of AI-driven methods out there, with varying accuracy.

This takes me to the point of my post - and I apologise if it is something that has been mentioned already - which is whether anyone has tried to use Adobe Premiere Pro's Speech-To-Text captioning tools yet? If so, I'd be interested to know how it compares with other more popular or proven methods of creating captions from the speech of a video from scratch without hard or soft-coded subtitles already available?

I have only run a preliminary test (thankfully Japanese was one of the 13 languages supported), and it seems to work for the most part, but there is a lot of polishing to be done. Now, it only extracts the captions in the language spoken (it won't autodetect which language is spoken, you must specify) so the resulting transcription must be translated afterwards (for which I just use good ol' Google Translate).

So it looks promising, but I'm just currently racking my brain as to how to export the resulting transcription into a conventionally-recognised subtitle file format, short of painstakingly going through it to make the edits myself, so if anyone who is more versed in it can point me in the right direction, I'd greatly appreciate it!
 

soloporhoy666

Active Member
Nov 29, 2021
104
104
Long time lurker here who is immensely grateful for all of the great work that all you talented and hard working JAV subtitlers have done! It really does add a new level of enjoyment when you actually understand what they are saying, especially in the drama-driven films which I particularly enjoy.

Now, I've been meaning to get off my butt and try my hand at attempting to make some subs for some of my favourite films and hopefully if all goes well, I'll share some of my work here. This is coming from a non-Japanese speaker so the challenges are quite numerous, as you would imagine. Thankfully, there is an abundance of AI-driven methods out there, with varying accuracy.

This takes me to the point of my post - and I apologise if it is something that has been mentioned already - which is whether anyone has tried to use Adobe Premiere Pro's Speech-To-Text captioning tools yet? If so, I'd be interested to know how it compares with other more popular or proven methods of creating captions from the speech of a video from scratch without hard or soft-coded subtitles already available?

I have only run a preliminary test (thankfully Japanese was one of the 13 languages supported), and it seems to work for the most part, but there is a lot of polishing to be done. Now, it only extracts the captions in the language spoken (it won't autodetect which language is spoken, you must specify) so the resulting transcription must be translated afterwards (for which I just use good ol' Google Translate).

So it looks promising, but I'm just currently racking my brain as to how to export the resulting transcription into a conventionally-recognised subtitle file format, short of painstakingly going through it to make the edits myself, so if anyone who is more versed in it can point me in the right direction, I'd greatly appreciate it!
I am currently using Whisper, an AI technology, with more precision in the transcription of the dialogue of the movies, it can also be used on computers and in my case remotely, that is to say virtually without using my computer resources, to improve some programmers still including technologies such as VAD audio improvement, which greatly improves the quality of the dialogue lines, even better it does not create cut or invented words or that do not exist, logically it is not perfect but today it is better than autosu or some other system, It also gives you the .srt file in the language or if you prefer in English.
Here is an example of what AI Whisper technology can achieve.
Pantallazo 15-12-2022 21.19.37.png
Pantallazo 15-12-2022 21.19.57.png
 
  • Like
Reactions: xsf27

Taako

Well-Known Member
May 25, 2017
866
590
I am currently using Whisper, an AI technology, with more precision in the transcription of the dialogue of the movies, it can also be used on computers and in my case remotely, that is to say virtually without using my computer resources, to improve some programmers still including technologies such as VAD audio improvement, which greatly improves the quality of the dialogue lines, even better it does not create cut or invented words or that do not exist, logically it is not perfect but today it is better than autosu or some other system, It also gives you the .srt file in the language or if you prefer in English.
Here is an example of what AI Whisper technology can achieve.
View attachment 3119103View attachment 3119104
Seems promising.
1. How does it handle multiple people talking at once?
2. Background noises overrides the speaker? Most JAV has the inner dialogue of the actress/actor talking to themselves and with music(piano music or whatever). Or music introduction between scenes.
3. Low talking. The actress talks softly or the sound person(mic) is too far away.
 
  • Like
Reactions: xsf27

xsf27

New Member
May 23, 2012
12
11
I am currently using Whisper, an AI technology, with more precision in the transcription of the dialogue of the movies, it can also be used on computers and in my case remotely, that is to say virtually without using my computer resources, to improve some programmers still including technologies such as VAD audio improvement, which greatly improves the quality of the dialogue lines, even better it does not create cut or invented words or that do not exist, logically it is not perfect but today it is better than autosu or some other system, It also gives you the .srt file in the language or if you prefer in English.
Here is an example of what AI Whisper technology can achieve.
View attachment 3119103View attachment 3119104
Thanks for the heads up, I'll definitely check it out. I must say, though, that it does indeed look so much more polished that with my rudimentary attempts with Adobe Premiere Pro.

Although after reading @Taako's comments above, I left out the part where Adobe automatically transcribes the results with multiple speakers, the cast of which is automatically populated but later editable after the captions are finished.

However, it's accuracy (transcription quality and recognition of different speakers) is something I've yet assessed properly, but it does give me an excuse to peruse through my favourite JAV again lol!

I'll report back once I can get a more definitive answer to these questions, although the quality of the transcription can be better assessed by more (or less non-existent) Japanese linguists.

This brings me to another question about Whisper AI (which, I gather, is currently the gold-standard in automatic transcription) and that is whether it is competent at recognising slang or colloquial terms, not to mention the many expletives prevalent in such 'exotic' movies lol?
 
  • Like
Reactions: Taako

soloporhoy666

Active Member
Nov 29, 2021
104
104
Seems promising.
1. How does it handle multiple people talking at once?
2. Background noises overrides the speaker? Most JAV has the inner dialogue of the actress/actor talking to themselves and with music(piano music or whatever). Or music introduction between scenes.
3. Low talking. The actress talks softly or the sound person(mic) is too far away.
Hello, the Whisper artificial intelligence system is really new, it is open source, so those people who have the knowledge will be able to add improvements to the AI, like the following example, there is currently a version that adds VAD, I understand this system improves the audio, that added to whisper has given me results than with any other similar program, like the example that I uploaded, that film is recorded outdoors, there are between 3 and 6 people at a time, the Collab, you can also find it in this page.
 

Taako

Well-Known Member
May 25, 2017
866
590
Thanks for the heads up, I'll definitely check it out. I must say, though, that it does indeed look so much more polished that with my rudimentary attempts with Adobe Premiere Pro.

Although after reading @Taako's comments above, I left out the part where Adobe automatically transcribes the results with multiple speakers, the cast of which is automatically populated but later editable after the captions are finished.

However, it's accuracy (transcription quality and recognition of different speakers) is something I've yet assessed properly, but it does give me an excuse to peruse through my favourite JAV again lol!

I'll report back once I can get a more definitive answer to these questions, although the quality of the transcription can be better assessed by more (or less non-existent) Japanese linguists.

This brings me to another question about Whisper AI (which, I gather, is currently the gold-standard in automatic transcription) and that is whether it is competent at recognising slang or colloquial terms, not to mention the many expletives prevalent in such 'exotic' movies lol?
thanks, I can't wait to hear the results.
 

Taako

Well-Known Member
May 25, 2017
866
590
Hello, the Whisper artificial intelligence system is really new, it is open source, so those people who have the knowledge will be able to add improvements to the AI, like the following example, there is currently a version that adds VAD, I understand this system improves the audio, that added to whisper has given me results than with any other similar program, like the example that I uploaded, that film is recorded outdoors, there are between 3 and 6 people at a time, the Collab, you can also find it in this page.
1. So does it work when multiple speakers...talking at the same?
2. Does it work when the speaker is talking low?
3. Does it work if the sound quality of the movie is low?
4. What movie did you test it on?
5. Did you try a difficult movie with a group talking such as rctd-459. This has a sub...
but i remember there's scenes between "sisters" talking, while the "brother" have sex with mom. The brother and mom would talk as well.
In this movie, the mom and daughters is unaware. So the talking continues as if the brother is not there.
I wonder how Whisper would handle it?
 

soloporhoy666

Active Member
Nov 29, 2021
104
104
1. So does it work when multiple speakers...talking at the same?
2. Does it work when the speaker is talking low?
3. Does it work if the sound quality of the movie is low?
4. What movie did you test it on?
5. Did you try a difficult movie with a group talking such as rctd-459. This has a sub...
but i remember there's scenes between "sisters" talking, while the "brother" have sex with mom. The brother and mom would talk as well.
In this movie, the mom and daughters is unaware. So the talking continues as if the brother is not there.
I wonder how Whisper would handle it?
to answer your other question, with the sound or the voice of the actress, yesterday I transcribed a low-quality film, both audio and video, it was SD quality at 480, the actress speaks in a low voice in several of the scenes and even so, the whisper+VAD program managed to obtain many clean lines of text, the AI is programmed to give you real text, it does not put invented text or invent words that do not exist, in any case it leaves some spaces without translation, obviously it is not perfect I have detected that sometimes, mainly in sex scenes, he repeats some words when he was unable to detect the audio, but this was before using the version with VAD, in conclusion up to now it is the best system that I have tried and it has left me satisfied, passing from 200 line subtitles to 500+ lines depending on the movie, most are throwing me that amount in others even many more, the record is 1700 lines and the best thing is that this can only get better, the developments lladores formed at some point in the TESLA company.
and the best, it's free.
 

Taako

Well-Known Member
May 25, 2017
866
590
to answer your other question, with the sound or the voice of the actress, yesterday I transcribed a low-quality film, both audio and video, it was SD quality at 480, the actress speaks in a low voice in several of the scenes and even so, the whisper+VAD program managed to obtain many clean lines of text, the AI is programmed to give you real text, it does not put invented text or invent words that do not exist, in any case it leaves some spaces without translation, obviously it is not perfect I have detected that sometimes, mainly in sex scenes, he repeats some words when he was unable to detect the audio, but this was before using the version with VAD, in conclusion up to now it is the best system that I have tried and it has left me satisfied, passing from 200 line subtitles to 500+ lines depending on the movie, most are throwing me that amount in others even many more, the record is 1700 lines and the best thing is that this can only get better, the developments lladores formed at some point in the TESLA company.
and the best, it's free.
Thank you. It sounds promising. I will wait to see how others might like it.
Did you provide a link on how best to run this program? Online or download? Is a certain system requirement is needed?

If it can do well on rctd-459 as a test, I think it might be good. It's not a movie I like but it would be a good a test subject.