Little update to the Chinese subtitle pack with about 350 extra subtitles, most of them new of the last 2-3 months.
This file has been shared with you on pixeldrain
pixeldrain.com
https://pixeldrain.com/u/gPmJQrh6 (a list of all subtitles different from v2 of the pack)
Now a full new pack, instead of ordering them by studio labels, I've ordered them by Actress names. I think it's currently sitting at about 3500 names.
This file has been shared with you on pixeldrain
pixeldrain.com
There will be a lot of mistakes in this pack. An actress might be in it twice but just under a different name. I have an idea of how to fix it but I need to learn how to talk to javstash graphql endpoint, I've tried a bunch but I can't figure it out right now.
If you want to know more about what other problems there will be in this pack you can read the workflow below.
1. First I wrote a script that checked every subtitle against the database of R18.dev. I think that script ran for about 15 hours to check over 20k subtitles. The problem here is that actresses that debuted in the last 3 year are only available in the database with their Kanji name in the r18 DB. So I had to translate them through either Javstash or Javgg. There will still be some Kanji names, but it's only actresses with less than 3 subtitles. Was just too much manual work. There were about 600 total.
2. Anything that wasn't found through R18.dev database, there were about 1600 subtitles left from the first script, was then first scraped vs Javguru and secondly Javmost. But now the problem arises that the names can be differently translated (ex. Sato vs Satou, Misuki vs Mizuki, Yuki vs Yuuki) or the names are inverted (Rima Arai vs Arai Rima).
3. To combat the last problem I parsed my full map tree in a couple of AI to ask if it saw any duplicates that have this problem. Deepseek, Chatgpt couldn't deal with it, they just invented duplicate names. Claud.AI context window was too small (about 3000 actress names). Perplexity caught a couple, maybe like 10. Mistral only caught name inversion. But the hero ? Gemini 2.5 pro experimental, caught probably about a 150 duplicates. It even noticed that some actress names were the same but in parentheses it had their alias, and it found that alias as duplicate.
There's still a big problem, which is aliases. 1 actress can have 2 different directories but under a different alias. I know a solution I could fix it, but I need to figure out how to query javstash graphql as they have almost all aliases. Or I need access to another better database. It would also make it easier to search for the actresses cause I could just put a .txt named with all aliases in the maps.
If you find anything big that is incorrect in the pack feel free to send me a DM.