Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

Anyone know why when you use whisper in Large mode to get subtitles it has things like: "Naokiman Show Instagramуют" "Please subscribe to the channel." "Thank you for watching until the end today" "See you in the next video." and "Thank you for watching!" in the text when it clearly isn't being said in the video?

Because the AI model has been trained on youtube videos and all AI do is guessing so since they say that a lot, it's a go to phrase it'll use when uncertain about something, especially if it matches when it's usually said, either at the beginning or end of it.
 
Wow! You're rocking it, Dude. Thanks for sharing.
Thanks Scully, I was starting to think that only a few people gave a crap about them and was toying with the idea of just pm'ing those people with the links instead of just posting them in the open forum. I'll ponder that but ultimately if I go that route you are definitely on the list. Cheers.
 
  • Like
Reactions: Imscully
Because the AI model has been trained on youtube videos and all AI do is guessing so since they say that a lot, it's a go to phrase it'll use when uncertain about something, especially if it matches when it's usually said, either at the beginning or end of it.
That is interesting Sam. I have my configs set to minimize AI Guessing/learning and only translating exactly what it 'hears' . I don't see it much but I suppose you can't totally rely on Whisper to not 'guess' now and then. Thanks for the response.
 
I don't know how those settings work exactly, but what we have as AI doesn't think at all, it's all educated guesses from doing/associating something over and over again(when I see this, that is the expected result) so if the data it's trained on has issues, those issues are passed on to your results, it doesn't know it's just guessing since for it, what it knows is what it is.

Since most youtube video will have that sentence or something similar at the end, the AI learns that this is the normal thing to say at the end of something and if you use a VAD, you end up with a lot of ends since it's all split into chunks.

They are getting better but it's still happening.
 
I don't know how those settings work exactly, but what we have as AI doesn't think at all, it's all educated guesses from doing/associating something over and over again(when I see this, that is the expected result) so if the data it's trained on has issues, those issues are passed on to your results, it doesn't know it's just guessing since for it, what it knows is what it is.

Since most youtube video will have that sentence or something similar at the end, the AI learns that this is the normal thing to say at the end of something and if you use a VAD, you end up with a lot of ends since it's all split into chunks.

They are getting better but it's still happening.
Yeah, that makes a lot of sense. One day, Probably past my expiry date, but One day the technology will be better at context. Right now it reads text and puts the dots together. Sometimes I wonder if I should even bother with this subtitle project because who knows what the technology will be like in a Year, or 6 Months, or Tomorrow Morning.