Extending FileList with CrLf
Ned Konz
ned at bike-nomad.com
Tue Aug 5 05:54:19 UTC 2003
On Monday 04 August 2003 07:48 pm, Richard A. O'Keefe wrote:
> What I'm getting at is that a simple binary/"raw text"/"useful
> text" distinction is a dead end; the distinction is really
> - how are the elements of the file encoded?
> - do we want those elements as characters or integers?
> and long term, we'll *have* to have
> FileStream encoding "answer the encoding"
> FileStream encoding: anEncoding "set the encoding"
>
> When you realise that, "raw text" looks like a very very very
> specific encoding, and even "smart text" looks like a bandaid.
I would agree. I just spent a day or so making UTF-8 (and X11
COMPOUND_TEXT) copy/paste to/from Squeak work under Unix, starting
with Ian's character conversion code. On most Unix systems, this uses
the iconv library; on Macs it uses the CoreFoundation stuff (because
Mac OS/X lacks a builtin iconv library). This is a simple-minded
scheme that does not change the fact that characters are represented
in Squeak using single bytes. So if you copy text into Squeak that
contains characters that aren't convertable you get '?' characters in
those places. However, you can tell the VM what encoding you would
like to use for the internal characters (so for instance you could
use all Latin-1 fonts and things would work fine).
And then I realized that we're translating bits and pieces of data
(file names, clipboard data, keystrokes) in the VM, but we already
have a number of similar bits of support in the image:
#withSqueakLineEndings
#squeakToIso/isoToSqueak
CrLfFileStream
the TrueType font mapping code (which I think uses #isoToSqueak to
translate certain Windows character maps to MacRoman)
and probably others.
And of course we already have several different font encodings in
current use:
modified MacRoman (^ and _ glyphs changed)
straight MacRoman (produced by the TTF reader if there is a Mac
character map)
Latin-1 (produced by the X11 Font reader package)
whatever isoToSqueak produces (used by the TTF reader in the absence
of a Mac character map)
Plus there's the ambitious work that Yoshiki Ohshima and Kazuhiro Abe
have done (MultiCharacter, etc.) that really makes the image able to
deal with more than 256 different characters, including font sets,
input methods, etc. They added annotated characters, strings, fonts,
etc. and also used the OS input method support.
I think that we could benefit by having (soon):
- a standard way to access character conversion routines as primitives
(with graceful fallback if they're not present, of course)
- standard representations of encodings on file streams (with the
default being the Squeak native encoding, whatever that is declared
as).
and (probably later):
- a wider keystroke event stream from the VM
- proper representations of characters and streams beyond the Squeak
single-byte character set.
--
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE
More information about the Squeak-dev
mailing list
|