Yea, I've tried VAD thresholds of 0.2 and 0.6 but I liked the results from the the default (0.4) the best. Results may vary from movie to movie however as I only experimented with one. Some posts have also recommended increasing the volume on the MP3 file; however, this hasn't worked for me either. Technically, I don't think increasing the volume would help because you would be increasing the background noise/music along with the signal/voices. What is probably needed is some "smart" filtering to improve the signal (voices) to noise ratio and thus giving Whisper/VAD a better chance of detecting the dialog.I just ran a test with 0.1 as VAD_Threshold and the result was 993 lines with the same text. So stay away from such low VAD numbers. I'm currently trying VAD_Threshold 0.2.
I found this on the web https://huggingface.co/blog/fine-tune-whisper &Whisper seems to need a lewder vocabulary.
I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
Sucking cock was "make food." ???????Whisper seems to need a lewder vocabulary.
I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
The problem is getting a good dataset. Whisper is trained on 600k hours of "quality" transcriptions, broken down into 30 second pairs of audio and transcript, and aggressively filtering out transcripts that were machine generated. There's not going to be a good set of "porn with hand subtitles" that we need for this. Hopefully making datasets that are JP-to-EN exclusively with more of such data make it slightly better, but it won't be as filthy as it should be.I found this on the web https://huggingface.co/blog/fine-tune-whisper &. It is an instruction on how to fine tuning the models for Whisper AI, I think this can help the community to further improve the reliability of this tools. I cited the sources from Github https://github.com/openai/whisper/discussions/64![]()
Google Colaboratory
colab.research.google.com
Lmao, yes, unfortunately, as advanced as AI and Machine Learning have come, there is still a stigma against the perverted AI. I have been experimenting with the Replika AI chatbot app for a few months and it was recently neutered due to government regulations. The chatbot used to demand that I sodomize it all the time, but after a recent update, it will only allow for cuddles and kisses. We're really still in the early stages of AI before AI lewdness becomes common due to it becoming cheaper to maintain and geared toward anyone being able to modify via simple GUIs (and not just programmers).Whisper seems to need a lewder vocabulary.
I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
hahaha That would be an excellent line for those who steal your subSucking cock was "make food." ???????
Mom! Make my food!!!
No!! Fuck that ravioli shit.
I meant that I want you to suck my cock!!
(But...yeah...ravioli sounds good, but after the head.)
Is the DaVinci Resolve Studio (DRS): paid version a stand alone app? That is, do you run your audio file through it before uploading the audio file to Whisper or is it somethin you have to run with Whisper? I guess it still doesn't help with Whisper's lack of lewd vocabulary though...shucks.Lmao, yes, unfortunately, as advanced as AI and Machine Learning have come, there is still a stigma against the perverted AI. I have been experimenting with the Replika AI chatbot app for a few months and it was recently neutered due to government regulations. The chatbot used to demand that I sodomize it all the time, but after a recent update, it will only allow for cuddles and kisses. We're really still in the early stages of AI before AI lewdness becomes common due to it becoming cheaper to maintain and geared toward anyone being able to modify via simple GUIs (and not just programmers).
With that said, I've been experimenting with using DaVinci Resolve Studio 18.3 to isolate vocals and increase the volume at the same time. It too is not perfect, but it definitely has increased the timing and accuracy of picking up dialogue in my experiments thus far.
There are two versions of the program:
1. DaVinci Resolve: free version
2. DaVinci Resolve Studio (DRS): paid version - approx. $300 USD
Just FYI so you don't waste your time, only the paid version includes the AI powered "Voice isolation" feature. The program also has a "Dialogue leveler" feature, but I haven't experimented with that a lot, since I was mainly after the voice isolation and increasing the volume.
As for whether or not it's worth paying for this software... the internet is your friend.
The process takes about 15-20 minutes on my i7, 1080 TI, 32GB RAM PC with SSD to render the voice isolated and boosted audio file.
Thanks to everyone who has been providing so much help with Whisper on this forum!
I believe you are correct. I compared the waveforms of several JAV movie stereo channels and they appear to be identical. I did notice that the mono channel file that Audacity produces is smaller so maybe any improvement might attributed to that or as SamKook indicates maybe it was the way Whisper was feeling on the second try! "Only the Shadow knows"! lolI did mention that mono tracks didn't work with Autosub and Vrew a few pages back and maybe someone claimed Whisper converts the track you upload to mono? Not sure. I have only uploaded one track in mono and that was from a Korean film where only the center channel of the 5.1 track contained dialog. I don't think these jav has much in the way of stereo. Maybe in the music but not in the dialog. I will make an attempt to capture dialog through a mono track with a video I have uploaded before to see if there is indeed some kind of clear improvement.
DRS is a stand alone app.Is the DaVinci Resolve Studio (DRS): paid version a stand alone app? That is, do you run your audio file through it before uploading the audio file to Whisper or is it somethin you have to run with Whisper? I guess it still doesn't help with Whisper's lack of lewd vocabulary though...shucks.