Whisper (OpenAI) - Automatic English Subtitles for Any Film in Any Language - An Intro & Guide to Subtitling JAV

Seems odd to me that you have 2 different python folder involved(if you use a virtual environment, the system python won't see installed modules on it for example and vice-versa), but I don't know how his installation works at all so that might be normal.
 
Seems odd to me that you have 2 different python folder involved(if you use a virtual environment, the system python won't see installed modules on it for example and vice-versa), but I don't know how his installation works at all so that might be normal.
It's probably me screwing up the installation, lol. I'm really trying to understand Python and the virtual environments like venv and pyenv, but I'm just not great at this stuff.

Edit: I'm up and running!
 
Last edited:
Oops, I mixed up the threads. Ijust posted this in the other thread:

 
I ran in "Balanced" mode, and it worked beautifully (if slowly, since it was CPU only-about 2 hours for a 2 hour video) The quality of translation is just really, really good-thank you for your hard work!

I'm now trying "Transformers" mode, which supports Metal, as I would like to give you some benchmarks between a MacMini M1 (Geekbench Metal 32K) and a MBP M1Pro (Geekbench Metal 68k)...it initially tried to just translate without Transcribing, but I think setting both the initial Transcription tab and Ensemble tab to Transformers mode has fixed it-running tests on both machines.
 
  • Like
Reactions: mei2
I ran in "Balanced" mode, and it worked beautifully (if slowly, since it was CPU only-about 2 hours for a 2 hour video) The quality of translation is just really, really good-thank you for your hard work!

I'm now trying "Transformers" mode, which supports Metal, as I would like to give you some benchmarks between a MacMini M1 (Geekbench Metal 32K) and a MBP M1Pro (Geekbench Metal 68k)...it initially tried to just translate without Transcribing, but I think setting both the initial Transcription tab and Ensemble tab to Transformers mode has fixed it-running tests on both machines.

Thanks that would be helpful.
Meanwhile, give a try to ensemble mode 2-pass. You will like the results :)
 
  • Like
Reactions: ToastFrench
Thanks that would be helpful.
Meanwhile, give a try to ensemble mode 2-pass. You will like the results :)
I'll be sure to give them all a workout!

I'm seeing this error in the logs (on both Macs), but it still seems to run-is it not that important?
/Users/Toast/venvs/whisperjav/lib/python3.12/site-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.6.3) or chardet (7.0.1)/charset_normalizer (3.4.4) doesn't match a supported version!
 
I'm seeing this error in the logs (on both Macs), but it still seems to run-is it not that important? /Users/Toast/venvs/whisperjav/lib/python3.12/site-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.6.3) or chardet (7.0.1)/charset_normalizer (3.4.4) doesn't match a supported version!

Yes, that is a harmless dependency version warning
 
  • Like
Reactions: ToastFrench
Sorry for all the Mac questions!
I am trying the Metal-enabled Transformers mode:
2026-03-06 12:12:59 - whisperjav - INFO - Step 3/5: Transcribing with HF Transformers...
2026-03-06 12:12:59 - whisperjav - INFO - Transcribing full audio file...
2026-03-06 12:13:02 - whisperjav - INFO - Loading HF Transformers ASR pipeline...
2026-03-06 12:13:02 - whisperjav - INFO - Model: kotoba-tech/kotoba-whisper-bilingual-v1.0
2026-03-06 12:13:02 - whisperjav - INFO - Device: cpu
2026-03-06 12:13:02 - whisperjav - INFO - Dtype: torch.float32
2026-03-06 12:13:02 - whisperjav - INFO - Attention: sdpa
2026-03-06 12:13:02 - whisperjav - INFO - Batch: 8
`torch_dtype` is deprecated! Use `dtype` instead!
Device set to use cpu
Based on MacOS Activity Monitor, Transformers mode is using only CPU, no GPU cycles.
I have used your new guide to check, and my machine is seen as MPS, arm64, and I updated PyTorch to confirm they were default and not CUDA.
is there anything else I can do to force HuggingFace Transformers mode to use MPS?

Edit: I do not believe the CPU mode worked; the .SRT file was 0 bytes:
2026-03-06 13:31:54 - whisperjav - INFO - TransformersASR transcribing with task='translate' - output should be in English
Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.
2026-03-06 14:22:21 - whisperjav - WARNING - Translation mode was requested but output appears to be in Japanese (14437/17022 chars are Japanese). This may indicate HuggingFace translation is not working as expected.
2026-03-06 14:22:21 - whisperjav - WARNING - Sample output: 皆さん、こんにちは。今日も始まりました。アイヘンリオンのビジルトレーニング、皆さん3分間、しっかりと頑張っていきましょう。今日も頑張っていきましょう。いいですね。ちょっと待って、一緒に一緒に頑張ります...
[DONE] Transcription complete: 1 segments
 
Last edited: