I have been looking for a cheap or free Japanese OCR program for some time, mainly to help me read pages of manga. Today I found one that looks very promising. It was an afternoon’s work to get it working, but if you are desperate for a cheap Japanese OCR program you’ll probably think it was worth it.
You need to do the following:
Locate and download the file SmartOCRlite107 (I had to download the .rar veriant as the .zip I found first was persistently corrupt or incomplete)
Ensure that your computer has Microsoft’s .NET 1.1 installed (I found that I had later versions of .NET in the Windows folder, plus an empty folder for .NET 1.1). You can download .NET 1.1 from Microsoft. Later versions won’t do.
Ensure that your computer displays Japanese text properly. (Japanese text should display properly everywhere, notably in Wordpad) You may have to fiddle with “Regional and Language” in the XP control panel, and maybe ensure that Japanese fonts are loaded. In particular, do this:
- Open Regional and Language Options in Control Panel.
- On the Advanced tab, under Language for non-Unicode programs, click the language version of non-Unicode programs that will be used (=Japanese).
- In Control Panel/Reg & Lang. Options/Languages, install the Asian languages support.
- Confirm that your computer has the MS Gothic font installed. (The above actions or some other Japanese-using program may have installed them.) Look in Windows/Fonts on XP. This font is not installed by default and now (Nov 2012) seems to be a chargeable download. You need the right version for your OS.
If you don’t do this, or it gets un-done, Wordpad won’t display Japanese characters, which is irritating, and the Jap. OCR programs will fail subtly but completely, which is a nightmare!
Unpack and install the program. If it’s installing successfully, you should get a succession of dialog boxes and buttons with Japanese text. If you don’t know which button to click, click on the right-hand button.
Start up the program. The interface looks like an OCR interface, and it’s in Japanese – yes this is a real Japanese program! It’s not quite as impenetrable as it looks: the top left menu button (F) takes you via (O), or ctrl-O to a box for opening image files, so use it to open a pre-scanned image. This should appear in the middle of the screen. Press function key F7 to perform OCR. In the file menu item (A) or maybe F8 should open a box for saving text files, i.e. saving your output. Name and save a file, then open it in Wordpad to check the output against the original paper.
If all that worked, you just have to practice using the program. I found it was impressively accurate with a scan at 300 dpi from a good-quality manga volume. Wrongly recognised kanji, katakana and hiragana were almost nil. IIRC, the now unavailable Neocor KanjiScan had the same mysterious system of red and blue rectangles and yellow labels.
Alternative Source: here
Addendum: I have been informed that there’s a later variant.
This is a bigger program and requires .Net 3.5 and a faster PC. The interface looks much like the previous program. I gave it a workout converting a 74-page manga story, and found that it works well, and resolves furigana, kanji, hiragana and katakana equally well.
(for sample of manga translation result see here)
I’m also told that Japanese OCR is available as an add-on for MS Office 2007 – hints here. But I was told that it’s very basic.