Squeak to/from UTF-8 conversions
Yoshiki Ohshima
yoshiki at squeakland.org
Fri Jun 29 01:53:07 UTC 2007
As Bert suggested, the Right Thing is to build a system on an
assumption that bare String and Characters cannot be really displayed.
For method source, the tag is encoded as the text property so they are
retained. A XML-like (or whatever) format in UTF-8 for storing Squeak
Text and use it almost always is the consecuence from it.
-- Yoshiki
At Tue, 26 Jun 2007 00:19:04 -0700,
Andreas Raab wrote:
>
> Hi -
>
> I was working on a little improvement in UTF-8 conversion speed (so far
> it's about 150x faster for latin-1 text ;-) and for measuring the
> improvements was running a test that said:
>
> strings := String allSubInstances.
> 1 to: strings size do:[:i|
> original := strings at: i.
> utf8 := original squeakToUtf8.
> copy := utf8 utf8ToSqueak.
> original = copy ifFalse:[self error: 'Encoding problem'].
> ].
>
> When I ran this test it failed on each and every WideString instance.
> Digging into it, it seems that all of the WideStrings in Squeak have a
> language tag that is being supplied implicitly by the current
> LanguageEnvironment.
>
> Questions:
> 1) From what it looks like right now there is no way to preserve that
> language tag through a UTF-8 conversion. Is this indeed the case or am I
> missing something?
> 2) Given that my language environment is being set to Latin-1, how
> should clients treat UTF-8 to provide the "proper" language tag? For
> example, I expected that a client be able to read and write UTF-8 text
> without implicitly providing that language tag. If that's the case, then
> how does one store these in common text files? (I could see how to do
> this for formatted text but not for "plain text files" without further
> attributation)
> 3) More generally asking, isn't the language tag here more of a
> "decorator" along the lines of text attributes? This would certainly
> model more closely the effect that I'm seeing here (some attributes are
> dropped by the squeak -> utf8 -> squeak conversion) *except* that I
> didn't expect any lossy conversion for strings (contrary to Text where
> dropping text attributes is obviously lossy).
>
> Thanks for any help,
> - Andreas
More information about the Squeak-dev
mailing list
|