[BUG]UndefinedObject(Object)>>doesNotUnderstand: #<

Yoshiki Ohshima yoshiki at squeakland.org
Wed Feb 23 23:24:15 UTC 2005


  Ned,

> I believe that what was happening may have been that the UTF-8 text converter 
> attached to the stream may have been trying to interpret the bytes.

  Yes, could be.

> Is it just me, or does anyone else think that the current way we open files 
> that we're going to use as simple streams of bytes is a bit
> inefficient?

  A bit, yes.

> Right now, if someone just says 'readOnlyFileNamed:', this is what happens:
> 
> * get the full name from the given name
> * convert the full name into a local name
> * compare that with the local name for the sources file
> * if the names are the same (that is, for any file named 'SqueakV3.sources') 
> then set the converter to a new MacRomanTextConverter, otherwise set it to a 
> new UTF8TextConverter.

  Yes.

> But what happens if we just want a stream of uninterpreted bytes? Well, we 
> have change the converter to a Latin1TextConverter, which does nothing to the 
> incoming text, other than slow down the reading. So we go through the 
> construction of a UTF8TextConverter, then throw it away.
> 
> Look at the time comparison between two ways of opening a file and reading 
> 1000 uninterpreted binary bytes:
> 
> Time millisecondsToRun: [ 1000 timesRepeat: [ | s | s := StandardFileStream 
> readOnlyFileNamed: '/dev/zero'. s binary. s next: 1000. s close ] ] 
>  => 119
> 
> Time millisecondsToRun: [ 1000 timesRepeat: [ | s | s := FileStream 
> readOnlyFileNamed: '/dev/zero'. s converter: Latin1TextConverter new. s 
> binary.  s next: 1000. s close ] ] 
>  => 251
> 
> 40% of the second example's time is spent doing the above comparison of local 
> names with the sources file.

  Hmm, not sure if this benchmark really reflect the real world apps.

> I'm curious as to why this comparison with the local name of the sources file 
> is actually needed; don't we generally know how we're using a file, and so 
> could just set the text converter to the proper one in the case of the 
> sources file?

  My feeling is that once we recteate the .sources file, we can
eliminate the check.

> And it still seems to me that having 'binary' flavors of the file opening 
> routines would help in the cases where we know that we're not dealing with 
> text.

  Yes.

  Again, the rule of thumb is to be careful about you are dealing with
uninterpreted bytes or text and use #binary and #text properly.  I
think it is wise to suggest to use #converter: only when it really
matters.

-- Yoshiki



More information about the Squeak-dev mailing list