Re: m17n ready to go

29 Jul 2004


      Hello,
...
...
As an example, the SqueakMap checkpoints are stored as compressed text. The
SqueakMap loader does something like:
    contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
    stream := (RWBinaryOrTextStream with: contents) reset.


With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which
would be OK if its converter was the Latin1TextConverter (which maps bytes to
characters 1:1), but it's not. It is, instead, a UTF8TextConverter.
Same thing happens in ChangeList when trying to read a gzipped file.
zipped _ GZipReadStream on: (FileStream readOnlyFileNamed: fullName).
   unzipped _ ReadStream on: zipped contents asString.
   ChangeList browseStream: unzipped
FileStream readOnlyFileNamed: returns a MultiByteFileStream and
GZipReadStream fails.
You can always specify your converter.  In this case, something like
contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
         stream := (MultiByteBinaryOrTextStream with: contents) reset.
         stream converter: Latin1TextConverter new.
should do it.
...
...
This would seem to be knowledge that only the user of that file
would have.
And the user can specify it.
...
...
Again, the default assumption is that the String will hold text -- even though
there's nothing in it yet! It seems to me that the default converter for this
stream should be the Latin1TextConverter. If a particular user of a String
has a need for or knowledge of a particular encoding, they can change the
converter.
No.  If the default is Latin1TextConverter, there would be more
problems.
...
...
However, I don't think it's right to introduce new  and incompatible character
conversion semantics on the existing file API.
The rule of thumb is that if you open a file, you should think about
it is text or binary, and if it is text, you should think about how
it is interpreted.
-- Yoshiki