(This is resend of off-list message. Attachment was stripped) ----- Hi. I am writing a simple MO reader. First cut is attached: I don't care performance for now. I tested it with japanese MOs bundled with Ruby-gettext package, as I don't have gettext runtime to compile POs for EToys right now.
I think we need to decide for some design issues to go forward:
a) How to package MO from those highly fragmented POs: how many MOs will we have? what is assigned to textdomain ? I am not certain whether current packaging scheme for POs will be also good for MO/textdomain.
In typical usage textdomain represents "application" and translation is provided for each app.
Also note that this decision will affect design for good performance and resource usage.
b) how to decide textdomain on #translated?
c) Where will MO reside in runtime environment? I am thinking about squeakland-OLPC with SecurityPlugin enabled. (Is security plugin enabled on XO ?)
/Korakurider -------------------------------------- Easy + Joy + Powerful = Yahoo! Bookmarks x Toolbar http://pr.mail.yahoo.co.jp/toolbar/
On Sep 20, 2007, at 16:00 , korakurider@yahoo.co.jp wrote:
Hi. I am writing a simple MO reader. First cut is attached: I don't care performance for now. I tested it with japanese MOs bundled with Ruby-gettext package, as I don't have gettext runtime to compile POs for EToys right now.
I think we need to decide for some design issues to go forward:
a) How to package MO from those highly fragmented POs: how many MOs will we have? what is assigned to textdomain ? I am not certain whether current packaging scheme for POs will be also good for MO/textdomain.
In typical usage textdomain represents "application" and translation is provided for each app.
Also note that this decision will affect design for good performance and resource usage.
The idea was to have one MO per class category, because the best equivalent to "applications" in Squeak are class categories. This way an MO file corresponds roughly to a Monticello Package, for example.
One problem is that the classes in the etoys image are not very nicely categorized - I think that was done in 3.8.1 but not 3.8. For example, there is a stray OLPC category which would better be merged with the Sugar category.
b) how to decide textdomain on #translated?
Based on the category of the sender of #translated:
translated | classAndSelector category | classAndSelector := thisContext sender who. category := classAndSelector first category.
This is rather inefficient, when loading an MO file we might want to create a cache for looking up the category from the CompiledMethod directly.
c) Where will MO reside in runtime environment?
In a subdirectory of "Smalltalk imagePath".
I am thinking about squeakland-OLPC with SecurityPlugin enabled. (Is security plugin enabled on XO ?)
We might need to add an exception to the SecurityPlugin allowing to read from the po path. On the XO we don't really have to enable it, but the same translations should work on the regular etoys.
- Bert -
Hi Korakurider, Bert,
This is a great work! The source code helps a lot to understand the structure of mo file. I'm glad that it is so simple (I was questionable to support mo file before looking at the code). I think it is greater if you upload it as a ticket for someone who are interested in mo file (or, I can do with your permission).
Anyway, I have an idea about fragmentation of po files. We have divided translations because there are too many words in eToys. But if we sort words into reasonable way as Bert's idea, "Sorting in pot files" https://dev.laptop.org/ticket/3596, we don't need to divide translations, do we? A translator can work any reasonable fragment in a big PO file sorted by class category, class, and method.
My frustration of many PO files comes from:
- Some words will be placed in two or more po files, and conflict if those files have different translations. This will be solved by using thisContext technically, but a translator have no idea which textdomain is used for a word on the screen without looking into Smalltalk code.
- Now, there are 406 po and pot files. And it will increase. It is not difficult to manage it, but hard to keep my attention to avoid a small mistake. Honestly, boring...
I totally agree to support textdomain for eToys application. So my proposal is only registered class categories are considered as other textdomains, rest are in a big po file. I'm sorry if it confuses the last agreed discussion.
Cheers, - Takashi
a) How to package MO from those highly fragmented POs: how many MOs will we have? what is assigned to textdomain ? I am not certain whether current packaging scheme for POs will be also good for MO/textdomain.
In typical usage textdomain represents "application" and translation is provided for each app.
Also note that this decision will affect design for good performance and resource usage.
The idea was to have one MO per class category, because the best equivalent to "applications" in Squeak are class categories. This way an MO file corresponds roughly to a Monticello Package, for example.
One problem is that the classes in the etoys image are not very nicely categorized - I think that was done in 3.8.1 but not 3.8. For example, there is a stray OLPC category which would better be merged with the Sugar category.
Yes that sounds good. Let's try.
- Bert -
On Sep 20, 2007, at 21:34 , Takashi Yamamiya wrote:
Hi Korakurider, Bert,
This is a great work! The source code helps a lot to understand the structure of mo file. I'm glad that it is so simple (I was questionable to support mo file before looking at the code). I think it is greater if you upload it as a ticket for someone who are interested in mo file (or, I can do with your permission).
Anyway, I have an idea about fragmentation of po files. We have divided translations because there are too many words in eToys. But if we sort words into reasonable way as Bert's idea, "Sorting in pot files" https://dev.laptop.org/ticket/3596, we don't need to divide translations, do we? A translator can work any reasonable fragment in a big PO file sorted by class category, class, and method.
My frustration of many PO files comes from:
- Some words will be placed in two or more po files, and conflict if
those files have different translations. This will be solved by using thisContext technically, but a translator have no idea which textdomain is used for a word on the screen without looking into Smalltalk code.
- Now, there are 406 po and pot files. And it will increase. It is not
difficult to manage it, but hard to keep my attention to avoid a small mistake. Honestly, boring...
I totally agree to support textdomain for eToys application. So my proposal is only registered class categories are considered as other textdomains, rest are in a big po file. I'm sorry if it confuses the last agreed discussion.
Cheers,
- Takashi
a) How to package MO from those highly fragmented POs: how many MOs will we have? what is assigned to textdomain ? I am not certain whether current packaging scheme for POs will be also good for MO/textdomain.
In typical usage textdomain represents "application" and translation is provided for each app.
Also note that this decision will affect design for good performance and resource usage.
The idea was to have one MO per class category, because the best equivalent to "applications" in Squeak are class categories. This way an MO file corresponds roughly to a Monticello Package, for example. One problem is that the classes in the etoys image are not very nicely categorized - I think that was done in 3.8.1 but not 3.8. For example, there is a stray OLPC category which would better be merged with the Sugar category.
Thanks Bert,
To translators:
Before I change the structure of gettext files (maybe nothing happens for a while though,) I will notice about it to this email list, then merge all translation files in launchpad and trac, and export them again. I think we don't lose any translation, so please continue to translate whatever this discussion goes.
Cheers, - Takashi
Bert Freudenberg wrote:
Yes that sounds good. Let's try.
- Bert -
On Sep 20, 2007, at 21:34 , Takashi Yamamiya wrote:
Hi Korakurider, Bert,
This is a great work! The source code helps a lot to understand the structure of mo file. I'm glad that it is so simple (I was questionable to support mo file before looking at the code). I think it is greater if you upload it as a ticket for someone who are interested in mo file (or, I can do with your permission).
Anyway, I have an idea about fragmentation of po files. We have divided translations because there are too many words in eToys. But if we sort words into reasonable way as Bert's idea, "Sorting in pot files" https://dev.laptop.org/ticket/3596, we don't need to divide translations, do we? A translator can work any reasonable fragment in a big PO file sorted by class category, class, and method.
My frustration of many PO files comes from:
- Some words will be placed in two or more po files, and conflict if
those files have different translations. This will be solved by using thisContext technically, but a translator have no idea which textdomain is used for a word on the screen without looking into Smalltalk code.
- Now, there are 406 po and pot files. And it will increase. It is not
difficult to manage it, but hard to keep my attention to avoid a small mistake. Honestly, boring...
I totally agree to support textdomain for eToys application. So my proposal is only registered class categories are considered as other textdomains, rest are in a big po file. I'm sorry if it confuses the last agreed discussion.
Cheers,
- Takashi
a) How to package MO from those highly fragmented POs: how many MOs will we have? what is assigned to textdomain ? I am not certain whether current packaging scheme for POs will be also good for MO/textdomain.
In typical usage textdomain represents "application" and translation is provided for each app.
Also note that this decision will affect design for good performance and resource usage.
The idea was to have one MO per class category, because the best equivalent to "applications" in Squeak are class categories. This way an MO file corresponds roughly to a Monticello Package, for example. One problem is that the classes in the etoys image are not very nicely categorized - I think that was done in 3.8.1 but not 3.8. For example, there is a stray OLPC category which would better be merged with the Sugar category.
Hi,
Sorting po files have almost done. So I think I will reconstruct po files on Thursday. I hope I will change svn and the launchpad site around 4th 18:00 -0800(PST). But actually, I still can't estimate how it is complicated, so it could be late.
Cheers, - Takashi
Takashi Yamamiya wrote:
Thanks Bert,
To translators:
Before I change the structure of gettext files (maybe nothing happens for a while though,) I will notice about it to this email list, then merge all translation files in launchpad and trac, and export them again. I think we don't lose any translation, so please continue to translate whatever this discussion goes.
That means we will get only one po file, but sorted by categories?
- Bert -
On Oct 3, 2007, at 7:04 , Takashi Yamamiya wrote:
Hi,
Sorting po files have almost done. So I think I will reconstruct po files on Thursday. I hope I will change svn and the launchpad site around 4th 18:00 -0800(PST). But actually, I still can't estimate how it is complicated, so it could be late.
Cheers,
- Takashi
Takashi Yamamiya wrote:
Thanks Bert,
To translators:
Before I change the structure of gettext files (maybe nothing happens for a while though,) I will notice about it to this email list, then merge all translation files in launchpad and trac, and export them again. I think we don't lose any translation, so please continue to translate whatever this discussion goes.
Yes. Thanks to Korakurider, generated po file is only one big file. Ah, I forgot about categories. My current code just sorts by classes -> methods -> keywords (alphabetical)
I will try sorting by categories first.
Cheers,
Takashi
Bert Freudenberg wrote:
That means we will get only one po file, but sorted by categories?
--- Takashi Yamamiya tak@metatoys.org wrote:
So my proposal is only registered class categories are considered as other textdomains, rest are in a big po file.
The combination of a big PO/MO and a few small ones sounds good. Takashi, do you have idea what to register as "other"? Or will OLPC EToys have basically only one big MO?
/Korakurider
-------------------------------------- Easy + Joy + Powerful = Yahoo! Bookmarks x Toolbar http://pr.mail.yahoo.co.jp/toolbar/
Hi Korakurider,
I think one big mo is not bad. But may be including a couple of other textdomains is more useful as an example for developers who want to make an eToys application with gettext. Tools-* might be a good example for such class category.
Cheers, - Takashi
korakurider wrote:
--- Takashi Yamamiya tak@metatoys.org wrote:
So my proposal is only registered class categories are considered as other textdomains, rest are in a big po file.
The combination of a big PO/MO and a few small ones sounds good. Takashi, do you have idea what to register as "other"? Or will OLPC EToys have basically only one big MO?
Hi Bert,
--- Bert Freudenberg bert@freudenbergs.de wrote:
On Sep 20, 2007, at 16:00 , korakurider@yahoo.co.jp wrote:
Hi. I am writing a simple MO reader. First cut is
attached: I
don't care performance for now. I tested it with japanese MOs bundled with
Ruby-gettext
package, as I don't have gettext runtime to
compile POs
for EToys right now.
I think we need to decide for some design issues
to go
forward:
a) How to package MO from those highly fragmented
POs: how
many MOs will we have? what is assigned to
textdomain ?
I am not certain whether current packaging scheme
for POs
will be also good for MO/textdomain.
In typical usage textdomain represents
"application" and
translation is provided for each app.
Also note that this decision will affect design
for good
performance and resource usage.
The idea was to have one MO per class category, because the best equivalent to "applications" in Squeak are class categories. This way an MO file corresponds roughly to a Monticello Package, for example.
One problem is that the classes in the etoys image are not very nicely categorized - I think that was done in 3.8.1 but not 3.8. For example, there is a stray OLPC category which would better be merged with the Sugar category.
b) how to decide textdomain on #translated?
Based on the category of the sender of #translated:
translated | classAndSelector category | classAndSelector := thisContext sender who. category := classAndSelector first category.
This is rather inefficient, when loading an MO file we might want to create a cache for looking up the category from the CompiledMethod directly.
I think it would be safe to implement like this if receiver of #translated is literal. But how about #translatedNoop? I am not sure if senders of #translated and #translatedNoop have same category.
c) Where will MO reside in runtime environment?
In a subdirectory of "Smalltalk imagePath".
I am thinking about squeakland-OLPC with
SecurityPlugin
enabled. (Is security plugin enabled on XO ?)
We might need to add an exception to the SecurityPlugin allowing to read from the po path. On the XO we don't really have to enable it, but the same translations should work on the regular etoys.
I agree it is needed to tweak SecurityPlugin. But SecurityPlugin doesn't differentiate access modes (read only or read/write), just allow/disallow file access. So It would need major work on plugin/VM of each architecture to support that exception. (Just arrowing file access under imagePath should be dangerous).
-------------------------------------- Easy + Joy + Powerful = Yahoo! Bookmarks x Toolbar http://pr.mail.yahoo.co.jp/toolbar/
On Sep 21, 2007, at 6:19 , korakurider wrote:
Hi Bert,
--- Bert Freudenberg bert@freudenbergs.de wrote:
Based on the category of the sender of #translated:
translated | classAndSelector category | classAndSelector := thisContext sender who. category := classAndSelector first category.
This is rather inefficient, when loading an MO file we might want to create a cache for looking up the category from the CompiledMethod directly.
I think it would be safe to implement like this if receiver of #translated is literal. But how about #translatedNoop? I am not sure if senders of #translated and #translatedNoop have same category.
You misunderstood - the lookup is independent of the *receiver* it only looks at the *sender*, that is, the method in which the #translated or #translatedNoop send happens.
Or did I misunderstood your question?
c) Where will MO reside in runtime environment?
In a subdirectory of "Smalltalk imagePath".
I am thinking about squeakland-OLPC with
SecurityPlugin
enabled. (Is security plugin enabled on XO ?)
We might need to add an exception to the SecurityPlugin allowing to read from the po path. On the XO we don't really have to enable it, but the same translations should work on the regular etoys.
I agree it is needed to tweak SecurityPlugin. But SecurityPlugin doesn't differentiate access modes (read only or read/write), just allow/disallow file access. So It would need major work on plugin/VM of each architecture to support that exception. (Just arrowing file access under imagePath should be dangerous).
That is true unfortunately.
Maybe building a gettext plugin linking to libintl.a is indeed the best solution. Other ideas?
- Bert -
--- Bert Freudenberg bert@freudenbergs.de wrote:
On Sep 21, 2007, at 6:19 , korakurider wrote:
Hi Bert,
--- Bert Freudenberg bert@freudenbergs.de wrote:
Based on the category of the sender of
#translated:
translated | classAndSelector category | classAndSelector := thisContext sender who. category := classAndSelector first category.
This is rather inefficient, when loading an MO
file
we might want to create a cache for looking up the category from
the
CompiledMethod directly.
I think it would be safe to implement like this
if
receiver of #translated is literal. But how about #translatedNoop? I am not sure if
senders
of #translated and #translatedNoop have same
category.
You misunderstood - the lookup is independent of the *receiver* it only looks at the *sender*, that is, the method in which the #translated or #translatedNoop send happens.
Or did I misunderstood your question?
I think I understood correctly. The lookup decides translation context (i.e. textdomain) by class category of sender of #translated. (#translatedNoop is just for extraction of original phrase, and isn't directly related to lookup) But when exporting receiver literal of #translatedNoop to PO, that category is based on not #translated but #translatedNoop. Then what if senders of #translated and #translatedNoop are diffrent class with different class category?
c) Where will MO reside in runtime environment?
In a subdirectory of "Smalltalk imagePath".
I am thinking about squeakland-OLPC with
SecurityPlugin
enabled. (Is security plugin enabled on XO ?)
We might need to add an exception to the SecurityPlugin allowing to read from the po path. On the XO we don't really have to enable it, but the same translations should work on the
regular
etoys.
I agree it is needed to tweak SecurityPlugin.
But
SecurityPlugin doesn't differentiate access modes
(read
only or read/write), just allow/disallow file
access. So
It would need major work on plugin/VM of each
architecture
to support that exception. (Just arrowing file access under imagePath should
be
dangerous).
That is true unfortunately.
Maybe building a gettext plugin linking to libintl.a is indeed the best solution. Other ideas?
I think original GNU gettext implementation assumes that application doesn't use multiple textdomains (i.e. MOs) simultanaously. If multiple textdomains are need, application must switch dynamically. I imagine frequet switching will cause performance degration. So I am not sure using libintl.a would be appropriate if we will have multiple MOs. Ruby-gettext package that I reviewed during writing my sample code is not binding to libintl.a, but gettext compliant implementation written in Ruby. And the implementation can handle multiple MOs simultanaously. I would ike to go similar way if possible.
/Korakurider
-------------------------------------- Easy + Joy + Powerful = Yahoo! Bookmarks x Toolbar http://pr.mail.yahoo.co.jp/toolbar/
On Sep 21, 2007, at 10:51 , korakurider wrote:
The lookup decides translation context (i.e. textdomain) by class category of sender of #translated. (#translatedNoop is just for extraction of original phrase, and isn't directly related to lookup) But when exporting receiver literal of #translatedNoop to PO, that category is based on not #translated but #translatedNoop. Then what if senders of #translated and #translatedNoop are diffrent class with different class category?
Ah, I see. How about having "#translatedNoopFor: 'classcategory'" which would be used by the exporter to place the translations in a different category?
I think original GNU gettext implementation assumes that application doesn't use multiple textdomains (i.e. MOs) simultanaously. If multiple textdomains are need, application must switch dynamically. I imagine frequet switching will cause performance degration. So I am not sure using libintl.a would be appropriate if we will have multiple MOs.
Actually, you can give the domain in the dgettext() call:
http://www.gnu.org/software/gettext/manual/html_node/Ambiguities.html
And even set the directory using bindtextdomain().
- Bert -
Bert Freudenberg wrote:
On Sep 21, 2007, at 10:51 , korakurider wrote:
The lookup decides translation context (i.e. textdomain) by class category of sender of #translated. (#translatedNoop is just for extraction of original phrase, and isn't directly related to lookup) But when exporting receiver literal of #translatedNoop to PO, that category is based on not #translated but #translatedNoop. Then what if senders of #translated and #translatedNoop are diffrent class with different class category?
Ah, I see. How about having "#translatedNoopFor: 'classcategory'" which would be used by the exporter to place the translations in a different category?
I prefer too simple way at least for First Deployment (next month). My thought is not doing any dynamic resolution like using thisContext. An eToys application programmer should keep using #translated and #translatedNoop in same class category, or designate a text domain explicitly. Yes, this is poor idea. And especially it cause a problem in certain place like the parts bin (#descriptionForPartsBin). But I don't think is fatal issue in short term.
I agree it is needed to tweak SecurityPlugin. But
SecurityPlugin doesn't differentiate access modes (read only or read/write), just allow/disallow file access. So It would need major work on plugin/VM of each architecture to support that exception. (Just arrowing file access under imagePath should be dangerous).
That is true unfortunately.
Maybe building a gettext plugin linking to libintl.a is indeed the best solution. Other ideas?
Using libintl.a is good (although, this way is not so easy because we still have to convert UTF-8 to inner character representation in Squeak).
But placing user's .mo files into the untrusted directory is not a bad idea. When startup, we can still access imagePath, so we can read default mo file in the imagePath. If a user want to modify translation, new mo file is saved into untrusted directory, and it overrides original mo at imagePath. Is it reasonable?
A defect of this idea is that you can not change to other language once eToys loads a project and becomes secure mode.
Cheers, - Takashi
etoys-dev@lists.squeakfoundation.org