m17n ready to go
Ned Konz
ned at bike-nomad.com
Thu Jul 29 18:16:35 UTC 2004
On Wednesday 28 July 2004 10:45 pm, Michael Rueger wrote:
> Hi all,
> the m17n stuff is finally ready for prime time.
> I've uploaded the change sets plus a install do-it to:
I have a couple of problems with some of these changes:
* They change the default behavior of (non-binary) low-level streams, assuming
that they contain text. However, that's not always the case.
As an example, the SqueakMap checkpoints are stored as compressed text. The
SqueakMap loader does something like:
contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
stream := (RWBinaryOrTextStream with: contents) reset.
With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which
would be OK if its converter was the Latin1TextConverter (which maps bytes to
characters 1:1), but it's not. It is, instead, a UTF8TextConverter.
This causes the compressed data in the gzipped file to be interpreted as
UTF-8, which it isn't. And so the load fails.
How can we assume at open time that an arbitrary file does, in fact, contain
text? Sure, many do, but not all.
This would seem to be knowledge that only the user of that file would have.
Similarly,
s _ MultiByteBinaryOrTextStream on: String new.
s converter
=> an UTF8TextConverter
Again, the default assumption is that the String will hold text -- even though
there's nothing in it yet! It seems to me that the default converter for this
stream should be the Latin1TextConverter. If a particular user of a String
has a need for or knowledge of a particular encoding, they can change the
converter.
If there are cases where we're using files *as text* and this policy doesn't
work, then they should be changed to specify their preferred encoding.
However, I don't think it's right to introduce new and incompatible character
conversion semantics on the existing file API.
--
Ned Konz
http://bike-nomad.com
More information about the Squeak-dev
mailing list
|