We peobbaly need to take this conversation to another thread so we don't hi-jack the main theme. The work that you're planning to do (and doing) is quite valuable and deserves more attention. I will spend more time on it during the weekend, but I thought of doing a quick exploration into your first set of the subs by running some stats on them. Here are some interesting stats:
(1)
Hallucination: Your de-halluicnation seems to have worked quite well for known phrases. There doesn't seem to be too many of those occuring in the majority of the files:
View attachment 3671696
(2)
Quality: Looking at the characters per seconds (CPS) metric can give an indication of the quality of the timing, or the sub. That can be one way of measuring the quality of a sub. I'd say anything below 10 and higher than 20 might be not optimal.
View attachment 3671723
(3)
Quality: Another metric I looked at was for repetition. One metric can be Type-Token Ratio (TTR). This is a bit difficult metric for JAV (or any porn movie), as the vocabulary is not that vast

However, one doesn't want the TTR to be too close to 0. That can indicate problems in Whisper output like repetition loop.
View attachment 3671724
These are just dumb stats with no knowledge of the genre, or type of movies. So it must be taken with grain of salt. It would be interesting to run the stats for each separate genre or series to see what are the characteristics.