m17n ready to go

Ned Konz ned at bike-nomad.com
Thu Jul 29 18:16:35 UTC 2004


On Wednesday 28 July 2004 10:45 pm, Michael Rueger wrote:
> Hi all,
> the m17n stuff is finally ready for prime time.
> I've uploaded the change sets plus a install do-it to:

I have a couple of problems with some of these changes:

* They change the default behavior of (non-binary) low-level streams, assuming 
that they contain text. However, that's not always the case.

As an example, the SqueakMap checkpoints are stored as compressed text. The 
SqueakMap loader does something like:

	contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
	stream := (RWBinaryOrTextStream with: contents) reset.

With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which 
would be OK if its converter was the Latin1TextConverter (which maps bytes to 
characters 1:1), but it's not. It is, instead, a UTF8TextConverter.

This causes the compressed data in the gzipped file to be interpreted as 
UTF-8, which it isn't. And so the load fails.

How can we assume at open time that an arbitrary file does, in fact, contain 
text? Sure, many do, but not all.

This would seem to be knowledge that only the user of that file would have.

Similarly,

	s _ MultiByteBinaryOrTextStream on: String new.
	s converter
		=> an UTF8TextConverter

Again, the default assumption is that the String will hold text -- even though 
there's nothing in it yet! It seems to me that the default converter for this 
stream should be the Latin1TextConverter. If a particular user of a String 
has a need for or knowledge of a particular encoding, they can change the 
converter.

If there are cases where we're using files *as text* and this policy doesn't 
work, then they should be changed to specify their preferred encoding.

However, I don't think it's right to introduce new  and incompatible character 
conversion semantics on the existing file API.

-- 
Ned Konz
http://bike-nomad.com




More information about the Squeak-dev mailing list