Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

The following is batch 2 of my marathon subtitle adventure. There are about 2200 srt files and accompanying covers for easier identification. My original plan was to zip them up in folders by actress but that had too many complications so this batch is just a mass of srt files without any order.

One thing that I discovered too late in this process is that I just started using the VAD command in Whisper and sometimes it will yield a very small srt file, 0-8 or so KB. This is because of the sensitivity of the VAD arg. Unfortunately you will find a few of these still present in the zip. I may re-do these in a future batch.

All future batches will be arranged according to their code so the next batch will all be files that start with A??-???. This should help with browsing. I would be interested in some feedback about whether to add the thumbs, as I have in this zip, in future batches. If you guys don't want them please let me know and I will leave them out of future batches.Cheers, Here's the link to Batch 2.


 
Little update to the Chinese subtitle pack with about 350 extra subtitles, most of them new of the last 2-3 months.
https://pixeldrain.com/u/gPmJQrh6 (a list of all subtitles different from v2 of the pack)

Now a full new pack, instead of ordering them by studio labels, I've ordered them by Actress names. I think it's currently sitting at about 3500 names.

There will be a lot of mistakes in this pack. An actress might be in it twice but just under a different name. I have an idea of how to fix it but I need to learn how to talk to javstash graphql endpoint, I've tried a bunch but I can't figure it out right now.

If you want to know more about what other problems there will be in this pack you can read the workflow below.

1. First I wrote a script that checked every subtitle against the database of R18.dev. I think that script ran for about 15 hours to check over 20k subtitles. The problem here is that actresses that debuted in the last 3 year are only available in the database with their Kanji name in the r18 DB. So I had to translate them through either Javstash or Javgg. There will still be some Kanji names, but it's only actresses with less than 3 subtitles. Was just too much manual work. There were about 600 total.
2. Anything that wasn't found through R18.dev database, there were about 1600 subtitles left from the first script, was then first scraped vs Javguru and secondly Javmost. But now the problem arises that the names can be differently translated (ex. Sato vs Satou, Misuki vs Mizuki, Yuki vs Yuuki) or the names are inverted (Rima Arai vs Arai Rima).
3. To combat the last problem I parsed my full map tree in a couple of AI to ask if it saw any duplicates that have this problem. Deepseek, Chatgpt couldn't deal with it, they just invented duplicate names. Claud.AI context window was too small (about 3000 actress names). Perplexity caught a couple, maybe like 10. Mistral only caught name inversion. But the hero ? Gemini 2.5 pro experimental, caught probably about a 150 duplicates. It even noticed that some actress names were the same but in parentheses it had their alias, and it found that alias as duplicate.

There's still a big problem, which is aliases. 1 actress can have 2 different directories but under a different alias. I know a solution I could fix it, but I need to figure out how to query javstash graphql as they have almost all aliases. Or I need access to another better database. It would also make it easier to search for the actresses cause I could just put a .txt named with all aliases in the maps.

If you find anything big that is incorrect in the pack feel free to send me a DM.
I found some problem with the filenames, i think batch renaming caused the issue. For example, the HMN folder has a lot of HMN-004V2 all the way to HMN-004V21, when in fact you will find HMN-004V2 is really HMN-451, etc.
 
  • Like
Reactions: gondor001