Ned Konz wrote:
On Wednesday 28 July 2004 10:45 pm, Michael Rueger wrote:
Hi all, the m17n stuff is finally ready for prime time. I've uploaded the change sets plus a install do-it to:
I have a couple of problems with some of these changes:
- They change the default behavior of (non-binary) low-level streams, assuming
that they contain text. However, that's not always the case.
As an example, the SqueakMap checkpoints are stored as compressed text. The SqueakMap loader does something like:
contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped. stream := (RWBinaryOrTextStream with: contents) reset.
With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which would be OK if its converter was the Latin1TextConverter (which maps bytes to characters 1:1), but it's not. It is, instead, a UTF8TextConverter.
Same thing happens in ChangeList when trying to read a gzipped file.
zipped _ GZipReadStream on: (FileStream readOnlyFileNamed: fullName). unzipped _ ReadStream on: zipped contents asString. ChangeList browseStream: unzipped
FileStream readOnlyFileNamed: returns a MultiByteFileStream and GZipReadStream fails.
Karl
This causes the compressed data in the gzipped file to be interpreted as UTF-8, which it isn't. And so the load fails.
How can we assume at open time that an arbitrary file does, in fact, contain text? Sure, many do, but not all.
This would seem to be knowledge that only the user of that file would have.
Similarly,
s _ MultiByteBinaryOrTextStream on: String new. s converter => an UTF8TextConverter
Again, the default assumption is that the String will hold text -- even though there's nothing in it yet! It seems to me that the default converter for this stream should be the Latin1TextConverter. If a particular user of a String has a need for or knowledge of a particular encoding, they can change the converter.
If there are cases where we're using files *as text* and this policy doesn't work, then they should be changed to specify their preferred encoding.
However, I don't think it's right to introduce new and incompatible character conversion semantics on the existing file API.
-- Ned Konz http://bike-nomad.com