Celeste encoding (was: Duplicate messages in Celeste)
lex at cc.gatech.edu
Mon Mar 13 10:31:07 UTC 2000
Bert Freudenberg <bert at isgnw.CS.Uni-Magdeburg.De> wrote:
> On Sat, 11 Mar 2000, Lex Spoon wrote:
> > > PS: I'll sent a changeset soon that makes Celeste use iso8859-1 encoding
> > > internally so it is more attractive for us non-English-natives ;-)
> > Just to clarify, it's probably good if Celeste uses Squeak's encoding
> > "internally", but then converts to/from iso8859-1 when it sends stuff on
> > the Internet.
> Actually I meant what I wrote. If I get a mail it should be stored as-is
> in the database - that's what I mean with "internally", and that is what
> Celeste already does quite well :-)
> The conversion should IMHO only happen in the UI. Currently, it uses
> isoToSqueak and the Squeak default font for displaying, but it would well
> be possible to use a specially encoded font for this. Also, this way it
> would be possible to display kyrillic or greek or whatever charsets.
> "Don't ever munge original data"
Hmm, I disagree, but not strongly. This is email, and it is supposed to be
reasonable text. As such, it is nice to treat it as text internally.
As a clear example, Celeste transforms CRLF-delimetted text
to CR-delimitted text first thing, and then changes
it back when you send mail outwards. This allows standard Squeak string
utilities to work correctly. The downside is that embedded, lone CR's
and LF's get messed up. Woo, it's clearly worth the tradeoff in that case.
In the case of character encoding, there aren't too many utilities in
Squeak that will notice, so the argument is weaker in practice. It does
still mean you have to *keep track* of which strings are in which
encoding, which is overhead that's not there if you just switch to
Squeak's encoding right after downloading.
Overall, though, the best solution is to switch Squeak to Latin-1. Even
on a Mac, I'd guess Squeak more often talks to the Internet than it does
to fellow applications on the same computer. And we could always have a
file menu that did character-set conversion, for those cases.
More information about the Squeak-dev