Tutorial OCR Chinese(english) movies hardsub

Was this tutorial helpful?


  • Total voters
    2
Sep 21, 2016
19
23

Optical Character Recognition (OCR) for natural images/scenes​

RBD-186-C_001_4655.jpg
他也不是故意要那么做的

Another option to recognize text from images, it can recognize directly from natural images, and it recognizes better than tesseract


Requirements:​

Windows 10 64 bits
3 GB hard disk
8 GB RAM
Internet connection, to download the required languages only once

Software​

Python

Pytorch

EasyOCR (IA)

videosubfinder




Installation​


Download and Install Python 3.9.4 in C:\python39
python-3.9.4

Run PIP for Python
Open line command win+R and "cmd"
Bash:
C:\Python29\Scripts\pip.exe install easyocr


There are two options, if you have the Nvidia video card run step a), if you have only AMD/intel video card run step b)
a) Only for video cards CUDA/NVIDIA 11.1
Bash:
C:\Python29\Scripts\pip3.exe install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
b) FOR CPU (not video cards CUDA/NVIDIA)
Code:
C:\Python29\Scripts\pip3.exe install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html


Download and install in C:\VideoSubFinder5x64 (rename of release_x64)
VideoSubfinder 5.5
For working of this program will be required "Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019"



Usage​

Run VideoSubFinderWXW.exe
Open video and ajust area video(optional, but fast)
Press Button "Run Search"
videosubfinder.jpg
Optional.- Manually delete images (explorer windows) that do not have text in folder C:\VideoSubFinder5x64\RGBImages

Run script ( Download )
Bash:
C:\Python29\python.exe easyOcrImage.py
or
Code:
easyOcrImage.py -l ch_tra -d "c:\youDirectoyImages"

At the end of the script, the text files(OCR) are generated in the folder TXTResults
And now just
Press the button "Create Sub From TXTResults" (save subtitle srt)

videosubfinder_createEmpty.jpg

Python:
directoryDefault=r'C:\VideoSubFinder5x64\RGBImages'
extensions=[".jpg",".png",".jpeg",".bmp"]
languagesDefault="ch_tra"
import os
import argparse
def main():
    parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter,description=r"easyOcrImage.py -l en,ch_tra -d " + directoryDefault,epilog=codeLanguages)
    parser.add_argument('-l','--langs',dest="langs",default=languagesDefault,help="Separated by (,) \"en,ch_tra\" for mix langs english & Traditional Chinese")
    parser.add_argument('-d','--directory',dest="directory", default=directoryDefault,help='directory help')
    args = parser.parse_args()
    if not os.path.isdir(args.directory):
        print ("Not exists directory: " + args.directory )
        return
    parentDirectory = os.path.dirname(args.directory)
    directoryTXTResults = os.path.join(parentDirectory, "TXTResults")
    if os.path.isdir( directoryTXTResults ):
        directoryTxt=directoryTXTResults
    else:
        directoryTxt=args.directory
    os.system("title OCR for " + args.directory + " - " + args.langs)
    import easyocr
    reader = easyocr.Reader( args.langs.replace(" ","").split(",") )

    files = [x for x in os.listdir(args.directory) if os.path.splitext(x)[1] in extensions]
    for i,x in enumerate(files):
        os.system("title OCR {}/{} Processed".format(i,len(files)) )
        fileImage = os.path.join(args.directory,x)
        fileTxt = os.path.join(directoryTxt,x)
        result = reader.readtext(fileImage,detail=0, paragraph=True)
        with open(fileTxt+".txt", "w", encoding="utf-8") as f:
            f.write( " ".join(result) )

codeLanguages="""Languages
Code Name
--- ----
abq    Abaza
ady    Adyghe
af    Afrikaans
ang    Angika
ar    Arabic
as    Assamese
ava    Avar
az    Azerbaijani
be    Belarusian
bg    Bulgarian
bh    Bihari
bho    Bhojpuri
bn    Bengali
bs    Bosnian
ch_sim    Simplified Chinese
ch_tra    Traditional Chinese
che    Chechen
cs    Czech
cy    Welsh
da    Danish
dar    Dargwa
de    German
en    English
es    Spanish
et    Estonian
fa    Persian (Farsi)
fr    French
ga    Irish
gom    Goan Konkani
hi    Hindi
hr    Croatian
hu    Hungarian
id    Indonesian
inh    Ingush
is    Icelandic
it    Italian
ja    Japanese
kbd    Kabardian
kn    Kannada
ko    Korean
ku    Kurdish
la    Latin
lbe    Lak
lez    Lezghian
lt    Lithuanian
lv    Latvian
mah    Magahi
mai    Maithili
mi    Maori
mn    Mongolian
mr    Marathi
ms    Malay
mt    Maltese
ne    Nepali
new    Newari
nl    Dutch
no    Norwegian
oc    Occitan
pl    Polish
pt    Portuguese
ro    Romanian
ru    Russian
rs_cyrillic    Serbian (cyrillic)
rs_latin    Serbian (latin)
sck    Nagpuri
sk    Slovak (need revisit)
sl    Slovenian
sq    Albanian
sv    Swedish
sw    Swahili
ta    Tamil
tab    Tabassaran
te    Telugu
th    Thai
tl    Tagalog
tr    Turkish
ug    Uyghur
uk    Ukranian
ur    Urdu
uz    Uzbek
vi    Vietnamese (need revisit)"""
if __name__ == "__main__":
    main()


Reminder, Download link 28,000+ Subtitle pack! (2001-2021)
https://www.akiba-online.com/thread...not-a-sub-request-thread.1920331/post-4193115
 

Attachments

  • easyOcrImage.zip
    14.4 KB · Views: 10
Last edited: