m17n ready to go
Karl Ramberg
karl.ramberg at chello.se
Thu Jul 29 19:38:58 UTC 2004
Ned Konz wrote:
>
> On Wednesday 28 July 2004 10:45 pm, Michael Rueger wrote:
> > Hi all,
> > the m17n stuff is finally ready for prime time.
> > I've uploaded the change sets plus a install do-it to:
>
> I have a couple of problems with some of these changes:
>
> * They change the default behavior of (non-binary) low-level streams, assuming
> that they contain text. However, that's not always the case.
>
> As an example, the SqueakMap checkpoints are stored as compressed text. The
> SqueakMap loader does something like:
>
> contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
> stream := (RWBinaryOrTextStream with: contents) reset.
>
> With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which
> would be OK if its converter was the Latin1TextConverter (which maps bytes to
> characters 1:1), but it's not. It is, instead, a UTF8TextConverter.
Same thing happens in ChangeList when trying to read a gzipped file.
zipped _ GZipReadStream on: (FileStream readOnlyFileNamed: fullName).
unzipped _ ReadStream on: zipped contents asString.
ChangeList browseStream: unzipped
FileStream readOnlyFileNamed: returns a MultiByteFileStream and
GZipReadStream fails.
Karl
> This causes the compressed data in the gzipped file to be interpreted as
> UTF-8, which it isn't. And so the load fails.
>
> How can we assume at open time that an arbitrary file does, in fact, contain
> text? Sure, many do, but not all.
>
> This would seem to be knowledge that only the user of that file would have.
>
> Similarly,
>
> s _ MultiByteBinaryOrTextStream on: String new.
> s converter
> => an UTF8TextConverter
>
> Again, the default assumption is that the String will hold text -- even though
> there's nothing in it yet! It seems to me that the default converter for this
> stream should be the Latin1TextConverter. If a particular user of a String
> has a need for or knowledge of a particular encoding, they can change the
> converter.
>
> If there are cases where we're using files *as text* and this policy doesn't
> work, then they should be changed to specify their preferred encoding.
>
> However, I don't think it's right to introduce new and incompatible character
> conversion semantics on the existing file API.
>
> --
> Ned Konz
> http://bike-nomad.com
More information about the Squeak-dev
mailing list
|