m17n ready to go
Yoshiki.Ohshima at acm.org
Thu Jul 29 21:14:41 UTC 2004
> > As an example, the SqueakMap checkpoints are stored as compressed text. The
> > SqueakMap loader does something like:
> > contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
> > stream := (RWBinaryOrTextStream with: contents) reset.
> > With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which
> > would be OK if its converter was the Latin1TextConverter (which maps bytes to
> > characters 1:1), but it's not. It is, instead, a UTF8TextConverter.
> Same thing happens in ChangeList when trying to read a gzipped file.
> zipped _ GZipReadStream on: (FileStream readOnlyFileNamed: fullName).
> unzipped _ ReadStream on: zipped contents asString.
> ChangeList browseStream: unzipped
> FileStream readOnlyFileNamed: returns a MultiByteFileStream and
> GZipReadStream fails.
You can always specify your converter. In this case, something like
contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
stream := (MultiByteBinaryOrTextStream with: contents) reset.
stream converter: Latin1TextConverter new.
should do it.
> > This would seem to be knowledge that only the user of that file
> > would have.
And the user can specify it.
> > Again, the default assumption is that the String will hold text -- even though
> > there's nothing in it yet! It seems to me that the default converter for this
> > stream should be the Latin1TextConverter. If a particular user of a String
> > has a need for or knowledge of a particular encoding, they can change the
> > converter.
No. If the default is Latin1TextConverter, there would be more
> > However, I don't think it's right to introduce new and incompatible character
> > conversion semantics on the existing file API.
The rule of thumb is that if you open a file, you should think about
it is text or binary, and if it is text, you should think about how
it is interpreted.
More information about the Squeak-dev