Hello,
As an example, the SqueakMap checkpoints are stored as compressed text. The SqueakMap loader does something like:
contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped. stream := (RWBinaryOrTextStream with: contents) reset.
With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which would be OK if its converter was the Latin1TextConverter (which maps bytes to characters 1:1), but it's not. It is, instead, a UTF8TextConverter.
Same thing happens in ChangeList when trying to read a gzipped file.
zipped _ GZipReadStream on: (FileStream readOnlyFileNamed: fullName). unzipped _ ReadStream on: zipped contents asString. ChangeList browseStream: unzipped
FileStream readOnlyFileNamed: returns a MultiByteFileStream and GZipReadStream fails.
You can always specify your converter. In this case, something like
contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped. stream := (MultiByteBinaryOrTextStream with: contents) reset. stream converter: Latin1TextConverter new.
should do it.
This would seem to be knowledge that only the user of that file would have.
And the user can specify it.
Again, the default assumption is that the String will hold text -- even though there's nothing in it yet! It seems to me that the default converter for this stream should be the Latin1TextConverter. If a particular user of a String has a need for or knowledge of a particular encoding, they can change the converter.
No. If the default is Latin1TextConverter, there would be more problems.
However, I don't think it's right to introduce new and incompatible character conversion semantics on the existing file API.
The rule of thumb is that if you open a file, you should think about it is text or binary, and if it is text, you should think about how it is interpreted.
-- Yoshiki