Extending FileList with CrLf

Ned Konz ned at bike-nomad.com
Tue Aug 5 05:54:19 UTC 2003


On Monday 04 August 2003 07:48 pm, Richard A. O'Keefe wrote:
> What I'm getting at is that a simple binary/"raw text"/"useful
> text" distinction is a dead end; the distinction is really
>  - how are the elements of the file encoded?
>  - do we want those elements as characters or integers?
> and long term, we'll *have* to have
>     FileStream encoding "answer the encoding"
>     FileStream encoding: anEncoding "set the encoding"
>
> When you realise that, "raw text" looks like a very very very
> specific encoding, and even "smart text" looks like a bandaid.

I would agree. I just spent a day or so making UTF-8 (and X11 
COMPOUND_TEXT) copy/paste to/from Squeak work under Unix, starting 
with Ian's character conversion code. On most Unix systems, this uses 
the iconv library; on Macs it uses the CoreFoundation stuff (because 
Mac OS/X lacks a builtin iconv library). This is a simple-minded 
scheme that does not change the fact that characters are represented 
in Squeak using single bytes. So if you copy text into Squeak that 
contains characters that aren't convertable you get '?' characters in 
those places. However, you can tell the VM what encoding you would 
like to use for the internal characters (so for instance you could 
use all Latin-1 fonts and things would work fine).

And then I realized that we're translating bits and pieces of data 
(file names, clipboard data, keystrokes) in the VM, but we already 
have a number of similar bits of support in the image:

	#withSqueakLineEndings
	#squeakToIso/isoToSqueak
	CrLfFileStream
	the TrueType font mapping code (which I think uses #isoToSqueak to 
translate certain Windows character maps to MacRoman)

and probably others.

And of course we already have several different font encodings in 
current use:
	modified MacRoman (^ and _ glyphs changed)
	straight MacRoman (produced by the TTF reader if there is a Mac 
character map)
	Latin-1 (produced by the X11 Font reader package)
	whatever isoToSqueak produces (used by the TTF reader in the absence 
of a Mac character map)

Plus there's the ambitious work that Yoshiki Ohshima and Kazuhiro Abe 
have done (MultiCharacter, etc.) that really makes the image able to 
deal with more than 256 different characters, including font sets, 
input methods, etc. They added annotated characters, strings, fonts, 
etc. and also used the OS input method support.

I think that we could benefit by having (soon):

- a standard way to access character conversion routines as primitives 
(with graceful fallback if they're not present, of course)

- standard representations of encodings on file streams (with the 
default being the Squeak native encoding, whatever that is declared 
as).

and (probably later):

- a wider keystroke event stream from the VM

- proper representations of characters and streams beyond the Squeak 
single-byte character set.

-- 
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE



More information about the Squeak-dev mailing list