m17n ready to go

Karl Ramberg karl.ramberg at chello.se
Thu Jul 29 19:38:58 UTC 2004



Ned Konz wrote:
> 
> On Wednesday 28 July 2004 10:45 pm, Michael Rueger wrote:
> > Hi all,
> > the m17n stuff is finally ready for prime time.
> > I've uploaded the change sets plus a install do-it to:
> 
> I have a couple of problems with some of these changes:
> 
> * They change the default behavior of (non-binary) low-level streams, assuming
> that they contain text. However, that's not always the case.
> 
> As an example, the SqueakMap checkpoints are stored as compressed text. The
> SqueakMap loader does something like:
> 
>         contents := (self directory oldFileNamed: fname) ascii upToEnd unzipped.
>         stream := (RWBinaryOrTextStream with: contents) reset.
> 
> With these changes, though, oldFileNamed: returns a MultiByteFileStream. Which
> would be OK if its converter was the Latin1TextConverter (which maps bytes to
> characters 1:1), but it's not. It is, instead, a UTF8TextConverter.

Same thing happens in ChangeList when trying to read a gzipped file.

	zipped _ GZipReadStream on: (FileStream readOnlyFileNamed: fullName).
	unzipped _ ReadStream on: zipped contents asString.
	ChangeList browseStream: unzipped

FileStream readOnlyFileNamed: returns a MultiByteFileStream and
GZipReadStream fails.

Karl

> This causes the compressed data in the gzipped file to be interpreted as
> UTF-8, which it isn't. And so the load fails.
> 
> How can we assume at open time that an arbitrary file does, in fact, contain
> text? Sure, many do, but not all.
> 
> This would seem to be knowledge that only the user of that file would have.
> 
> Similarly,
> 
>         s _ MultiByteBinaryOrTextStream on: String new.
>         s converter
>                 => an UTF8TextConverter
> 
> Again, the default assumption is that the String will hold text -- even though
> there's nothing in it yet! It seems to me that the default converter for this
> stream should be the Latin1TextConverter. If a particular user of a String
> has a need for or knowledge of a particular encoding, they can change the
> converter.
> 
> If there are cases where we're using files *as text* and this policy doesn't
> work, then they should be changed to specify their preferred encoding.
> 
> However, I don't think it's right to introduce new  and incompatible character
> conversion semantics on the existing file API.
> 
> --
> Ned Konz
> http://bike-nomad.com



More information about the Squeak-dev mailing list