Squeak to/from UTF-8 conversions
philippe.marschall at gmail.com
Sun Jul 1 11:45:40 UTC 2007
What's the status of these patches? Seaside shows a measurable speed
drop when doing utf-8 encoding/decoding so we'd be more than willing
to test them. We don't care about the stripping of language tags, we
are fine with the unification aspect of unicode.
2007/6/26, Andreas Raab <andreas.raab at gmx.de>:
> Hi -
> I was working on a little improvement in UTF-8 conversion speed (so far
> it's about 150x faster for latin-1 text ;-) and for measuring the
> improvements was running a test that said:
> strings := String allSubInstances.
> 1 to: strings size do:[:i|
> original := strings at: i.
> utf8 := original squeakToUtf8.
> copy := utf8 utf8ToSqueak.
> original = copy ifFalse:[self error: 'Encoding problem'].
> When I ran this test it failed on each and every WideString instance.
> Digging into it, it seems that all of the WideStrings in Squeak have a
> language tag that is being supplied implicitly by the current
> 1) From what it looks like right now there is no way to preserve that
> language tag through a UTF-8 conversion. Is this indeed the case or am I
> missing something?
> 2) Given that my language environment is being set to Latin-1, how
> should clients treat UTF-8 to provide the "proper" language tag? For
> example, I expected that a client be able to read and write UTF-8 text
> without implicitly providing that language tag. If that's the case, then
> how does one store these in common text files? (I could see how to do
> this for formatted text but not for "plain text files" without further
> 3) More generally asking, isn't the language tag here more of a
> "decorator" along the lines of text attributes? This would certainly
> model more closely the effect that I'm seeing here (some attributes are
> dropped by the squeak -> utf8 -> squeak conversion) *except* that I
> didn't expect any lossy conversion for strings (contrary to Text where
> dropping text attributes is obviously lossy).
> Thanks for any help,
> - Andreas
More information about the Squeak-dev