Hi
It wouldn't be hard to add a translator to translate everything from one language to another with some control/help by the user.
Ivan Tomek,
If you have a look at the babelfish translator that altavista uses (http://babelfish.alstavista.com) you will see the limits and possibilities of this approach...
In Squeak the strict "Object message: Object" grammer certainly simplifies the parsing of Squeak sentences. However, a translating browser would still have to cope with translating all of the symbols in the system (class names, messages, temp variables etc), and would be better if it could make at least a credible stab at translating comments by the user. You'd have to (at a minimum) cope with noun phrases such as
anOrderedCollection FormInspectView ShortIntegerArray
as well as verbs in various forms:
scanFor: addSubclass: isVariable atPutAll:
You probably know more about how to approach this then I do; but it would be (as far as I can see) quite difficult to do, requiring a fairly complex NLP engine. One of the problems would be that many of the symbols are not "gramatically correct", but are paraphrases/simplifications of longer phrases...
However, it would be a great solution to the problem of a multi-language image - browse and make changes in whatever language you choose! :)
Russell
---------------------------------------- Russell Allen
russell.allen@firebirdmedia.com
----------------------------------------
Russell Allen wrote:
If you have a look at the babelfish translator that altavista uses (http://babelfish.alstavista.com) you will see the limits and possibilities of this approach...
In Squeak the strict "Object message: Object" grammer certainly simplifies the parsing of Squeak sentences. However, a translating browser would still have to cope with translating all of the symbols in the system (class names, messages, temp variables etc), and would be better if it could make at least a credible stab at translating comments by the user.
Forget this as fast as you can. I have worked with automatic translators, and it just won't work; translating natural language is actually *easier* than translating program texts (which are, from a linguistic perspective, rudimentary natural language).
Long explanation: Parsing is not the real problem in modern automatic translation. The main problem is (and has always been from the beginning) disambiguation. Automatic translators need the context to determine which of a dozen possible meanings is the right one. Smalltalk messages provide less of that context, so the problems are worse. It *might* be feasible to translate the comments on a semi-automatic basis. They are near to optimal for such a task: They (usually) use a limited word set (cutting down on dictionary size), and they (usually) have a limited universe of discourse (cutting down on the number of ambiguous meanings to consider). Automated translation technology isn't ready for prime time yet. The EU as well as the UN spend millions to get a working system, and the best that they got is a system that makes a professional translator more productive. The gain was a factor of about two last time I checked (which was about 5 years ago); I've been monitoring the proceedings in the area of automated translation, and I think they managed to up the factor a bit, but no substantial improvements seem to have occurred.
However, it would be a great solution to the problem of a multi-language image - browse and make changes in whatever language you choose! :)
Not Technically Feasible. (Unfortunately.) Human language is just to irregular to be accessible to volunteer effort; you need real money to research problems of this complexity. This doesn't mean that you need real money for implementing the algorithms once they are known. And I sincerely hope that I'll see that day!
Regards, Joachim
I don't think that the problem is as hard if suboptimal translation is acceptable. Two kinds of translation would be required - comments, and Smalltalk words (selectors, class names, etc.).
Comments: These are 'natural language' but even rather poor translation would be better than none: If the original comment is in Chinese or Portuguese, even a very approximate translation to English is better (for me) than none.
Smalltalk words: There are not that many of these in the library and they are context free so they can be translated by look-up from a dictionary constructed in cooperation by the user and a language dictionary lookup. This would not require a large effort on the user's part.
Date forwarded: 25 Mar 1999 19:08:57 -0000 Date sent: Thu, 25 Mar 1999 07:21:30 +0100 From: Joachim Durchholz joachim.durchholz@munich.netsurf.de To: squeak@cs.uiuc.edu Subject: Re: Multilingual Squeak Forwarded by: squeak@cs.uiuc.edu Send reply to: squeak@cs.uiuc.edu
Russell Allen wrote:
If you have a look at the babelfish translator that altavista uses (http://babelfish.alstavista.com) you will see the limits and possibilities of this approach...
In Squeak the strict "Object message: Object" grammer certainly simplifies the parsing of Squeak sentences. However, a translating browser would still have to cope with translating all of the symbols in the system (class names, messages, temp variables etc), and would be better if it could make at least a credible stab at translating comments by the user.
Forget this as fast as you can. I have worked with automatic translators, and it just won't work; translating natural language is actually *easier* than translating program texts (which are, from a linguistic perspective, rudimentary natural language).
Long explanation: Parsing is not the real problem in modern automatic translation. The main problem is (and has always been from the beginning) disambiguation. Automatic translators need the context to determine which of a dozen possible meanings is the right one. Smalltalk messages provide less of that context, so the problems are worse. It *might* be feasible to translate the comments on a semi-automatic basis. They are near to optimal for such a task: They (usually) use a limited word set (cutting down on dictionary size), and they (usually) have a limited universe of discourse (cutting down on the number of ambiguous meanings to consider). Automated translation technology isn't ready for prime time yet. The EU as well as the UN spend millions to get a working system, and the best that they got is a system that makes a professional translator more productive. The gain was a factor of about two last time I checked (which was about 5 years ago); I've been monitoring the proceedings in the area of automated translation, and I think they managed to up the factor a bit, but no substantial improvements seem to have occurred.
However, it would be a great solution to the problem of a multi-language image - browse and make changes in whatever language you choose! :)
Not Technically Feasible. (Unfortunately.) Human language is just to irregular to be accessible to volunteer effort; you need real money to research problems of this complexity. This doesn't mean that you need real money for implementing the algorithms once they are known. And I sincerely hope that I'll see that day!
Regards, Joachim -- Please don't send unsolicited ads.
Ivan Tomek,
Jodrey School of Computer Science Acadia University Nova Scotia, Canada
fax: (902) 585-1067 voice: (902) 585-1467
Life would be so much easier if we could just look at the source code.
Elegance: The Mona Lisa has it, and so does the binary search algorithm. The Golden Gate Bridge has it, as do the World Wide Web, Visicalc, Smalltalk and the U.S. Constitution. Public-key cryptography and Michelangelo's Pieta also have it." - Gary H. Anthes , Computer World
"Beauty is more important in computing than anywhere else in technology because software is so complicated. Beauty is the ultimate defense against complexity." - David Gelernter, Professor of Computer Science, Yale University.
Ivan Tomek wrote:
I don't think that the problem is as hard if suboptimal translation is acceptable.
The problem is not that the translation is just suboptimal. It's plain misleading. If you want an example, let Alta Vista search for a few pages in (say) Chinese and have it translate them to English. The translations are everything from incomplete to misleading to incomprehensible. And let me tell you that the Alta Vista translations are *not* bad; I'd rate them at average for current automatic translation technology.
Comments: These are 'natural language' but even rather poor translation would be better than none: If the original comment is in Chinese or Portuguese, even a very approximate translation to English is better (for me) than none.
OK. If you don't understand the comment, just read the code. However, readers should be aware that translated comments can be *very* misleading. So translated comments should be marked as such.
Smalltalk words: There are not that many of these in the library and they are context free so they can be translated by look-up from a dictionary constructed in cooperation by the user and a language dictionary lookup. This would not require a large effort on the user's part.
Right, it's a simple lookup. No problem here. The actual problem is coordinating the creation and maintenance of the word lists. Don't underestimate the size of this task though; it's the sort of work that mailing list and archive maintainers face, and it will never end.
Regards, Joachim
squeak-dev@lists.squeakfoundation.org