--temperature TEMPERATURE temperature to use for sampling (default: 0)
--best_of BEST_OF number of candidates when sampling with non-zero temperature (default: 5)
Mei's IPX-998 is pretty damn good in terms of translation, was this manually edited?
If you do whisper --help it'll show you all the options and how to use them(kinda).
This is part of what it says:
Code:--temperature TEMPERATURE temperature to use for sampling (default: 0) --best_of BEST_OF number of candidates when sampling with non-zero temperature (default: 5)
Doesn't say how it can use multiple but I'd assume something like whisper --temperature (0.2, 0.4, 0.5) from your python example. Maybe " instead of ( and ).
I haven't messed with extra options at all so no idea how they work.
I looked at the help, and tried all combintaions of "" (), and [].
I looked at the help, and tried all combintaions of "" (), and []. Searching for help with Whisper online is just extremely difficult because it is new, the name sucks, and because virtually nobody seems to be using the command line. https://blog.deepgram.com/exploring-whisper/ has examples for Python with:
Are you using a legit adobe premiere pro or just a cracked one?This is the srt for RKI-606. Its my most recent completed raw srt. Ran the video through premiere on japanese detection and translated to english using subtitle edit. Zero touch-up. Can someone with whisper run the same video and post it here? Im curious how they compare.
Link to the video
PiratedAre you using a legit adobe premiere pro or just a cracked one?
I am planning to try it myself but I don't want to pay the expensive subscription of adobe products.![]()
Has anybody tested how whisper translation compares to deepl (free version)? Up to know I found whisper doing a very good job but since I can't save a japanese version for deepl and let whisper translate it for he same file, it's hard to tell.
I know how it workes, the question is if you get better results using whisper or deepl.change the option of translation_mode : No translation
This will not work as Whisper generates different subtiltes every time you run it, so there is no one-to-one comparison unless you get the japanese version with no translation and the translated one from the same run.You can test it yourself by running whisper twice on the same file, once with translation and once without.
I know how it workes, the question is if you get better results using whisper or deepl.
I did one file today with only transcribe and it took nearly 1h 30min instad of approx. 30min, then ran it through Deepl and I wasn't impressed with the result. Did not seem better than just straight letting whisper do the whole job.The good things with Whisper end-to-end translation are:
(a) It uses context for translation. It tries to build a context for example guessing gender (he, she), and punctuations for translation task.;(b) It makes the entire Whisper output faster. Translate tsak is faster than transcribe task. It is funny but their main sw engineer was saying that the way the algorithm is written, the end-to-end trasnlation task is performed faster than just transcribe task
The good things with DeepL is that It is just a better translator. Fullstop. One bad thing with DeepL is that it often mixes up he/she, it/they, sir/ma'am.
For me I decided to just stick with DeepL. I did some comparisons during the early days of Whisper (v1). I haven't done any comparison with v2 but I understand that the translation capability did not change from v1 to v2. To me, DeepL translations came out better. But then again, I don't speek Japanese so my read might be quite wrong.
In terms of being able to compare the outputs as @SamKook suggested, one can make Whisper to be more deterministic by setting both temperature and beam to zero. That makes the output close to determinstic. But the pitfal is that it produces more halucination and repeating lines in the output.
DeepL is theoretically better, but there's probably some value in doing direct-to-English with the same deep learning model rather than taking the transcribed output and feeding into a second deep learning model that isn't specifically designed to work interact with the first. There's just an additional loss of information during that intermediate step.I know how it workes, the question is if you get better results using whisper or deepl.