Read the FAQ's and then the original messages which follow.
Is it possible to get software that performs machine translation of Japanese texts?
Yes. See Neocor, & Dragon Writer, etc.Comment
Is it possible to scan Japanese text and thus input it into a computer as electronically coded text?
Yes. This is a two-part process; image scanning and optical character recognition (OCR). See Kanji-Scan and OmniPage.
So can I scan a page from a Japanese magazine, and get a machine translation of the text?
So how well does this work?
That's really a multi-part question; let's ask it again...
How well does scanning and OCR work in practice?
This depends on several factors.
Scanning:The quality of the original print is very important:
Clear black-and white print on glazed paper is good, small characters on cheap paper not good, photocopies not good, colour and patterned backgrounds very bad, Japanese handwriting hopeless. Comment
More about scanning?
You can either use your own (scanner bundled) scan software or the scan software that is part of the Japanese OCR package. There are lots of variables to tweak that affect the final result. It may prove best to use your own OCR and an option like "B/W Line Art". Allowing the OCR software to convert from colour to B/W may prove nasty. Comment
What about the OCR?
Rather surprisingly, the results with Kanji can be startlingly good, with a very high percentage correct achieved at a very high speed. The results with hiragana/katakana are generally disappointing, with a high rate of errors, not helped by the variable size of kana characters. Performance is severely affected by the quality of the print being scanned.
You always have to do this. If there are a lot of errors, it may well take longer to manually correct the text than it would take to type it in from scratch.
How well does machine translation work?
Results vary from the impressive (usually with carefully prepared text, as in software suppliers' demos) to the unintelligible. Overall, what you should expect is a translation which, while not proper English, does give you a fair idea of what the original text said. You can then decide if you are sufficiently interested to have it translated properly by a human (with the aid of the machine draft). This is how the programs are actually employed in commerce and academia.
Results seem to be better with regular Japanese (as in company notices) rather than with slangy or colloquial texts like manga or anime magazines. Comment
So is it worth trying an OCR on a manga?
At the present state of the art it probably isn't. The scanner and OCR performance will be so poor that you could enter the text quicker by hand. We tried it on a regular size tankoubon (Video Girl Ai) and found that the scanner couldn't resolve the kanji properly, because of the poor print, and that it took far longer to correct the errors in the kana than it would have taken to type them. Comment
Is it worth trying an OCR on a page from an anime magazine?
On balance, yes, if you have the budget. In a few hours a person with elementary knowledge of Japanese should get a result that would take days for them to obtain by hand.
Is it worth using a machine translator on a manga after entering the text by hand?
Somebody try it and let us know.
Can Web pages be entered and translated?
We tried using Typhoon and found that it worked well enough on corporate Web sites to show that we were looking at employees' terms and conditions (for instance). It also worked on an Evangelion anime site but not quite so well.
Do I need a Japanese operating system?
Hopefully, no. Software is available that runs on US/English Windows. Installing Japanese Windows or dual boot Japanese/US Windows is for propeller heads only.
Are there any budget or shareware packages?
Sorry I can't recommend anything from personal experience. There are a few things from 99 dollars US upwards. I tried the Word Translator beta but couldn't get it to work properly.
Shoudouka Launchpad (displays Japanese text)
Japanese Software Digest
Web translator Translates Japanese WWW pages online.
Date: Mon, 06 Oct 1997 10:58:11
From: Dan Hauck
Subject: Re: Electronic translation of Manga into English
At 07:00 PM 10/4/97 +0100, you wrote:
>I thought this posting might be of interest to Shoujo ML members. If you >have any hands-on experience, please let me know! All I can say is that its really hard. I tried to write a translator once using UNIX's lexical (.lex) libraries and it's just too unpredictable and there's too much there. So much of its just context. There are shorter words that can mean just about anything, and this is where it goes bonkers. >
>I have, together with a friend, started to evaluate programs that will >a) scan Japanese text and perform OCR. It's much harder than english / ASCII chars. >b) machine translate Japanese electronic text into English. I have and probably always will use JDIC (you've probably found this). It's a huge dictionary, if you type in something romanized it'll give you just about every possible definition of that word (in english). >The potential of such programs for translating manga and Japanese anime/manga related magazines, e.g Animage, is obvious. Maybe... >
>It seems there are several programs available that will machine translate Japanese text, at prices ranging from shareware through to many hundreds of dollars. I haven't looked in about a year. Is there anything that actually works out there? At least for manga. There's differences between manga and ordinary "japanesse"..., ie. the kind you learn from reading a book or taking a class. >This still leaves the problem of input. It is possible to type in Japanese Have you really found something that will go from romanized text to an okay english translation? I can romanize the stuff in my sleep, that's how my electronic dictionary takes its input. It's not very hard, just memorization. Since manga always have furigana (hiragana next to any kanji), at least all the manga I read does. >using either the interface of one of the afore-mentioned programs, or a stand-alone wordprocessor such as JWP. This is easier than translating it, but... >We know of one program that will scan Japanese text, perform OCR and output an electronic Japanese text. This is KanjiScan from Neocor (www.neocor.com) Japanese text in manga and other books is hard because there's no concept of a space " ". It makes word recognition harder. >Test report:
>We had a quick look at the shareware machine translator (called Kanji-Word, if I remember rightly) and at Neocor's TYPHOON and KANJI-SCAN. >Frankly, after a hard business day I couldn't make any sense of the share-ware one, which is in fact an add-on to programs such as Microsoft WORD etc. >The Typhoon demo seemed to work, though we couldn't figure how to feed a Japanese WWW page into it. It looks like some skill in text preparation may be required for the most intelligible result. >The Kanji-scan certainly seemed to work, though again the completed electronic text seemed to need corrections. To have something that recognises Kanji that quickly is pretty amazing. >Needless to say, you can transfer text from Kanji-scan to Typhoon at the click of a mouse.
>(BTW, once you have Kanji in electronic form it's not hard to look up the meaning in an electronic dictionary, like that in JWP for instance) >
>If you are at all interested in this field, please E-mail me with your own experiences, and I will reply with a better bibliography of programs, and more test reports. Please. I tried a year or so ago to find anything and all I found was JDIC, which is useful but its only a dictionary. If there's anything out there that can do romanized japanesse text (I can do this, no problem) to some semblance of english that you could tell me about I'd be very grateful. Also I might be able to translate the rest of hime-chan.
X-Originating-IP: [188.8.131.52] From: "Baka-sama" To: email@example.com Subject: Greetings Date: Mon, 06 Oct 1997 17:21:08 PDT X-UIDL: 74006ad0d31b118893ce0ff27e477239 Greetings, I am definitly interested. I've been doing some computer translation work, but it's kind of tedious transcribing and looking up all those characters. Using a scanner would be great. I've mostly been translating whatever I can get my hands on. I'm currently working, very slowly, on my friend's copy of Sailor V v.1. I'm a fan of Fushigi Yuugi, Maison Ikkoku, VGAi, KOR, etc. Please get back to me. Thanks. Steve "Baka-sama" McIntosh firstname.lastname@example.org Baka-sama email@example.com ______________________________________________________
To: firstname.lastname@example.org Subject: Re: Computerized translation of manga into English References: <email@example.com> From: Jeffrey Rowe Date: 09 Oct 1997 10:16:10 -0700 Lines: 37 X-UIDL: cdab7aaa1a561d23f5d051ff17ead44b firstname.lastname@example.org (GC) writes:
> Test report:
> We had a quick look at the shareware machine translator (called
> Kanji-Word, if I remember rightly) and at Neocor's TYPHOON and
> Frankly, after a hard business day I couldn't make any sense of the
> share-ware one, which is in fact an add-on to programs such as
> Microsoft WORD etc.
> The Typhoon demo seemed to work, though we couldn't figure how to feed
> a Japanese WWW page into it. It looks like some skill in text
> preparation may be required for the most intelligible result.
> I too have tried Neocor's TYPHOON demo. In particular, I've tried translating personal letters from my in-laws into English. I find, however, that the translation of the often colloquial Japanese is nearly incomprehensible. Typically, several alternate choices are offered during the translation process, but selection requires knowledge of the original meaning which isn't always obvious. The entertainment value from some of the mistranslations, however, shouldn't be discounted :). On the technical side though, Japanese text input for me is a simple cut-and-paste operation from my mail reader to TYPHOON's text buffer. Cheers, Jeff Rowe
In article <email@example.com>, firstname.lastname@example.org wrote:
> This still leaves the problem of input. It is possible to type in
> Japanese using either the interface of one of the afore-mentioned
> programs, or a stand-alone wordprocessor such as JWP. This is easier
> than translating it, but... Once you get up to a couple hundred kanji, you can input the "easy" stuff by its reading using an input method like Kotoeri from the Mac JLK. Assuming you can already touch-type, that is. :) Leave some particular character in the place of kanji you don't know the reading for, then go all over all of them afterward, using whatever lookup method is easiest first. And cheating (such as typing the wrong reading or a jukugo you know and then deleting the excess) IS allowed. Manga should be a LOT easier typing-wise, because there is so much less of it per page. And a lot of it has furigana. Of course that DOES tend to leave the "double-reading" (a planet name over "chikyuu" kanji, etc.) problem open... :) (And it's kinda hard to find a double-byte enabled WP that supports arbitrary furigana unless it's made for Japanese text!)
> We know of one program that will scan Japanese text, perform OCR and
> output an electronic Japanese text. This is KanjiScan from Neocor
> (www.neocor.com) Mangajin had a review of an OCR program a couple of months back. Seems these things, while great at Kanji, have some real problems recognizing hiragana! If an OCR program isn't accurate enough, you might as well not bother, because the time to proofread could be more than the time to type it. Except in the case of kanji, it's knowing the reading you need to type it that is the troublesome part.
email@example.com (GC) writes:
>I have, together with a friend, started to evaluate programs that
a) scan Japanese text and perform OCR
>b) machine translate Japanese electronic text into English.
>The potential of such programs for translating manga and Japanese
>anime/ manga related magazines, e.g Animage, is obvious.
Hmmm. Potential, yes. Problems, many. MT is hard enough, but attacking the highly colloquial mangaese will be a fine challenge.
>It seems there are several programs available that will machine
>translate Japanese text, at prices ranging from shareware through to
>many hundreds of dollars.
The best affordable one I have encountered is Neocor's Tsunami.
It is several hundred dollars.
>This still leaves the problem of input. It is possible to type in
>Japanese using either the interface of one of the afore-mentioned
>programs, or a stand-alone wordprocessor such as JWP. This is easier
>than translating it, but...
It certainly is possible. Tsunami has a crudish WP builtin. JWP, NJSTAR or any Japanese WP would do.
>We know of one program that will scan Japanese text, perform OCR and
>output an electronic Japanese text. This is KanjiScan from Neocor
>We had a quick look at the shareware machine translator (called
>Kanji-Word, if I remember rightly) and at Neocor's TYPHOON and
>Frankly, after a hard business day I couldn't make any sense of the
>share-ware one, which is in fact an add-on to programs such as
>Microsoft WORD etc.
As I understand it, Kanji-Word is a word/phrase glosser, not an MT
system. There are other such systems. Really they are translation aids.
>The Typhoon demo seemed to work, though we couldn't figure how to feed
>a Japanese WWW page into it. It looks like some skill in text
>preparation may be required for the most intelligible result.
Save the page to a text file. Open up a blank Tsunami document (they
call them projects) and import the text file.
>The Kanji-scan certainly seemed to work, though again the completed
>electronic test seemed to need corrections. To have something that
>recognises Kanji that quickly is pretty amazing.
>Needless to say, you can transfer text from Kanji-scan to Typhoon at
>the click of a mouse.
>(BTW, once you have Kanji in electronic form it's not hard to look up
>the meaning in an electronic dictionary, like that in JWP for
I suspect that for Manga, thius is what you'll find most useful.
>If you are at all interested in this field, please post a response, or
>E-mail me with your own experiences, and I will reply with a better
>bibliography of programs, and more test reports.
Jim Breen Department of Digital Systems
Email: firstname.lastname@example.org Monash University
http://www.dgs.monash.edu.au/~jwb/ Clayton VIC 3168 Australia
Geoff, Testing "yonde!!koko" of A.I.S. (Nagano-ken) I find it increasingly useful as far as I am able to produce original print quality.
I am mainly translating technical Japanese into German. The Japanese are very fond of text in tables.
I would scan in all the text tables and then perform a good part of the translation work simply by the find-and-replace function of my Japanese textprocessor program, because much of the text table contents is of non-grammatical nature.
The grammatic sections i.e. the sentences I will translate manually (with the OCRed text assisting me in looking up vocabulary by "drag-and-drop" in the very useful "kanjikai" dictionaries) and then erase the OCR sentences succesively.
Life would be more easier for me if my OCR would read bad quality copies as good as I can.
Thanks for your reply. I have sent the following message to Richard, perhaps you might be interested in. It is taken from a recent NIHONGO mailing list.
>From Sabolc@compuserve.com Wed Sep 17 19:17:17 1997
Date: Sun, 14 Sep 1997 10:07:05 -0400
From: Szabolcs Varga
Subject: Japanese OCR
Greg Dabelstein wrote:
> Does anyone know of any Japanese capable OCR software
> for either Japanese or English Win 95??? There is a multitude of them here in Japan (for Japanese Windows), their price ranging from about $100 to $1300. I can only refer to my experiences with one program called E.typist. (I got it with my HP ScanJet 4c scanner I bought in Japan. I also tried Autotype which promptly died on my machine. So much about TWAIN compatibility.) Nothing fancy, does the job fairly well. I did have some problems with it so I set out for getting something better, but I was told that basically all the Japanese OCR programs have exactly the same problems, namely:
1. It does make a lot of mistakes, but not where we gaijin would expect. It recognizes the difficult kanjis (OK, not the extremely old ones but all of the JIS 2-suijun) for it uses a dictionary and there is a very limited number of jukugo with very difficult kanji. So rest assured that it reads "kikai", "yuuutsu" and the like. But apparently it has a lot of problems making the distinction between the small and big "ya", "yu" and "yo", and especially with the voiced kana. It has to be a very neat printout to be able to have all the "daku-on" recognized.
2. It kicks the bucket with furigana (and texts with all kinds of sizes). Apparently it messes up something in its "genkoo yoshi"-like mind. I hope there are already better ones.
I personally think that yes, it is faster to scan in and OCR a Japanese text than to type it in, but it is significantly slower than to do the same with English. To start with, the OCR process itself is a lot slower, in my case (Pentium 100) about 100 char/sec, and then of course proofreading is more difficult. To give an idea: I am still below the Nooryoku Shiken Level 1 and I tried to scan in a few pages from "Chijin no ai" by Tanizaki Jun'ichiro, from an ugly second-hand paperback edition. There are about 5 mistook characters in every hundred so -- for me -- proofreading one page takes about 20-30 minutes. I had a lot better results with better quality texts but still I managed a few times to make a fool of myself when making a presentation and the material I prepared (in haste, with OCR) included some tiny mistakes (like one dot missing, but a completely stupid word). Proofreading seems to be an ugly job. HTH.
Paul Naitoh 12-Oct-1997, @ 12:50:18 PDT Internet e-mail email@example.com Using Windows NavCIS PRO 1.77 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Date: Mon, 3 Nov 1997 07:17:10 -0500 From: Szabolcs Varga Subject: Re: OCR & Translation Software. Sender: Szabolcs Varga To: GC Greetinx,
> Is it possible to get software that performs machine translation
> of Japanese texts?
> Yes. See Neocor, & Dragon Writer, etc. I would rather recommend either Logovista E/J (and J/E), Atlas from Fujitsu or Pensee from OKI. They run under Japanese OS only, though.
> How well does scanning and OCR work in practice?
> photocopies not good Not exactly true. Depends mostly on what you photocopy.
> Japanese handwriting hopeless. Not true again. There is here one piece of software (of course I don't remember the name) which does exactly that... more than that, it is the main product of its maker. Results are still mixed, but not hopeless.
> More about scanning? I cannot recommend HP scanners enough. They have a feature called AccuPage that does wonders on stained and cheap material like most of the manga books. Honestly, wonders. I tried it with a coffee-stained page.
> How well does machine translation work? This is my field so let me bark in.
> Results vary from the impressive (usually with carefully prepared
> text, as in software suppliers' demos) Technical material with no ellipses usually gives results close to perfect. Funny thing, the more technical, and the longer the sentences are, the better result as long as the original sentence had a meaning.
> to the unintelligible. Spoken English/Japanese is still close to impossible to translate. But, honestly, many times the problems are with the human, not with the machine. I met a project in Denmark where they tried to translate patents by machine and of course, the results were hopeless. However, when I saw the original English text, I could not believe what I saw: one sentence, more than half a page long, without a predicate. I seriously doubt that any human would have done much better on that. After breaking the text into about six sentences and providing the missing predicates, the result was practically perfect. I mean, perfect.
> Results seem to be better with regular Japanese (as in
> company notices) rather than with slangy or colloquial texts
> like manga or anime magazines. I know NO translation software that would translate "Hirumeshi kui ni ikoo ze" correctly, with all the emphasis and roughness included. They are not for that...
> So is it worth trying an OCR on a manga? It works with good quality manga -- like Buichi Terasawa's Gokuh and Cobra that I have.
> Is it worth using a machine translator on a manga after entering
> the text by hand? Definitely NOT! Machine translation SW cannot handle most of the features of the spoken language, most of all ELLIPSIS. If the grammar parser fails in the translating process, do not expect anything worthy from the above levels of parsing...
> Can Web pages be entered and translated? The other way (E-J) seems to be a lot more flourishing. There are SW here especially for that, like a special Internet version of Pensee.
> Do I need a Japanese operating system?
> Hopefully, no. Software is available that runs on US/English
> Windows. Installing Japanese Windows or dual boot Japanese/US
> Windows is for propeller heads only. I don't want to hurt anybody but the software that runs on JP only are a lot better quality. I mean quality. Both OCR and MT. They start at $250 though. Live long and prosper. Szabolcs