Whisper and its many forms

ironfevers · Tuesday at 3:21 AM

ArtemisINFJ said:
What's your edit? I'm using the one that you've uploaded before

Before it can only process one file then it will say out of memory error. Now it'll keep continuing after each file without out of memory errors.

ArtemisINFJ · Tuesday at 3:26 AM

ironfevers said:
Before it can only process one file then it will say out of memory error. Now it'll keep continuing after each file without out of memory errors.

I'll try it later, thank you for the update, currently using the previous one. It was so convenient but haha I need to manually clean out the moan sound

ironfevers · Tuesday at 3:34 AM

ArtemisINFJ said:
I'll try it later, thank you for the update, currently using the previous one. It was so convenient but haha I need to manually clean out the moan sound

Yup, the initial_prompt tells Whisper to transcribe everything even moans. You don't have to manually clean it. Use my clean_japanese.py. It will automatically clean all repetitions, moaning, laughing, and garbage.

ArtemisINFJ · Tuesday at 3:38 AM

ironfevers said:
Yup, the initial_prompt tells Whisper to transcribe everything even moans. You don't have to manually clean it. Use my clean_japanese.py. It will automatically clean all repetitions, moaning, laughing, and garbage.

So I should just leave it as is for now, and it will be cleaned up later? I’ve used up my runtime for today, but I’ll continue working on the file in Colab tomorrow.

ironfevers · Tuesday at 3:42 AM

ArtemisINFJ said:
So I should just leave it as is for now, and it will be cleaned up later? I’ve used up my runtime for today, but I’ll continue working on the file in Colab tomorrow.

Keep the audio file, json and partial filenames the same and it will load the partial srt tomorrow and continue appending to it. Or upload those files to a different Colab account and it will also continue.

ArtemisINFJ · Tuesday at 3:44 AM

I'll just wait for now, thanks for the clarification though!

ironfevers · Tuesday at 11:31 PM

Having Python installed on your PC will save you so much time. I have lots of Python scripts to share. I have Python 3.10.11 installed for reference.

My clean_japanese.py removes repetitions, cleans up moaning, deletes garbage lines and does replacements of mis-transcribed phrases with whatever you specify.

Sharing my Python Selenium script to automate translation of srt files on DeepL. It doesn't use any API, just uses the free website. It mimics how a user might manually do it. You don't have to move your mouse, it does all the work! Need to download chromedriver.exe and set it for example driver_path=r'C:\Program Files (x86)\chromedriver.exe' My script takes the srt file, removes the index numbers, timestamps and only pastes the Chinese/Japanese text to the input box, then scrolls to the center, then clicks the copy button, then clears the textbox and keeps repeating. At the end, all English text will combine to form the final srt. Note there is a limit on DeepL, a ~900 line subtitle file might barely finish till completion, sometimes DeepL will have a popup saying something like, "You have been translating too much. Please subscribe to Pro". The script will also break whenever DeepL changes the look of their website.
Download: https://drive.google.com/drive/folders/18xVzdHG8X1PoLZWbvaYliADLnY3kNr_L

ArtemisINFJ · Wednesday at 8:12 AM

ironfevers said:
Having Python installed on your PC will save you so much time. I have lots of Python scripts to share. I have Python 3.10.11 installed for reference.

My clean_japanese.py removes repetitions, cleans up moaning, deletes garbage lines and does replacements of mis-transcribed phrases with whatever you specify.

View attachment 3682384

Sharing my Python Selenium script to automate translation of srt files on DeepL. It doesn't use any API, just uses the free website. It mimics how a user might manually do it. You don't have to move your mouse, it does all the work! Need to download chromedriver.exe and set it for example driver_path=r'C:\Program Files (x86)\chromedriver.exe' My script takes the srt file, removes the index numbers, timestamps and only pastes the Chinese/Japanese text to the input box, then scrolls to the center, then clicks the copy button, then clears the textbox and keeps repeating. At the end, all English text will combine to form the final srt. Note there is a limit on DeepL, a ~900 line subtitle file might barely finish till completion, sometimes DeepL will have a popup saying something like, "You have been translating too much. Please subscribe to Pro". The script will also break whenever DeepL changes the look of their website.
Download: https://drive.google.com/drive/folders/18xVzdHG8X1PoLZWbvaYliADLnY3kNr_L

View attachment 3682385

Thank you so much for this, can I used it in Linux Mint? I ditched Windows 3 years ago and never go back to that OS.

ironfevers · Wednesday at 4:21 PM

ArtemisINFJ said:
Thank you so much for this, can I used it in Linux Mint? I ditched Windows 3 years ago and never go back to that OS.

Yes both scripts will work. If you are not able to start the script, then you missed installing something. Ask Claude, Gemini, Perplexity or ChatGPT for installation help.

Here are some quick instructions from ChatGPT:
Check your python version
python3 --version

To install google chrome:
sudo apt install google-chrome-stable
or if you prefer Chromium:
sudo apt install chromium-browser

Install ChromeDriver matching your Chrome version:
sudo apt install chromium-chromedriver # for Chromium
OR
visit https://googlechromelabs.github.io/chrome-for-testing/ and find your download link
wget https://chromedriver.storage.googleapis.com/<version>/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/local/bin/
sudo chmod +x /usr/local/bin/chromedriver

pip3 install selenium
python3 your_script.py

ArtemisINFJ · Thursday at 2:49 AM

I have made my own cleaner for transcribed subs, take a look and let me know what should I improve in the next iteration.

hobbies · Friday at 5:43 AM

After trying mei's updated script and updating pandas, tensorflow and numpy i still get this error. It really hates tensorflow lol

Search

Search

Whisper and its many forms

ironfevers

Active Member

ArtemisINFJ

God Slayer, Dawnbreaker

ironfevers

Active Member

ArtemisINFJ

God Slayer, Dawnbreaker

ironfevers

Active Member

ArtemisINFJ

God Slayer, Dawnbreaker

ironfevers

Active Member

ArtemisINFJ

God Slayer, Dawnbreaker

ironfevers

Active Member

ArtemisINFJ

God Slayer, Dawnbreaker

Attachments

hobbies

New Member