[BUG]UndefinedObject(Object)>>doesNotUnderstand: #<
Yoshiki Ohshima
yoshiki at squeakland.org
Wed Feb 23 23:24:15 UTC 2005
Ned,
> I believe that what was happening may have been that the UTF-8 text converter
> attached to the stream may have been trying to interpret the bytes.
Yes, could be.
> Is it just me, or does anyone else think that the current way we open files
> that we're going to use as simple streams of bytes is a bit
> inefficient?
A bit, yes.
> Right now, if someone just says 'readOnlyFileNamed:', this is what happens:
>
> * get the full name from the given name
> * convert the full name into a local name
> * compare that with the local name for the sources file
> * if the names are the same (that is, for any file named 'SqueakV3.sources')
> then set the converter to a new MacRomanTextConverter, otherwise set it to a
> new UTF8TextConverter.
Yes.
> But what happens if we just want a stream of uninterpreted bytes? Well, we
> have change the converter to a Latin1TextConverter, which does nothing to the
> incoming text, other than slow down the reading. So we go through the
> construction of a UTF8TextConverter, then throw it away.
>
> Look at the time comparison between two ways of opening a file and reading
> 1000 uninterpreted binary bytes:
>
> Time millisecondsToRun: [ 1000 timesRepeat: [ | s | s := StandardFileStream
> readOnlyFileNamed: '/dev/zero'. s binary. s next: 1000. s close ] ]
> => 119
>
> Time millisecondsToRun: [ 1000 timesRepeat: [ | s | s := FileStream
> readOnlyFileNamed: '/dev/zero'. s converter: Latin1TextConverter new. s
> binary. s next: 1000. s close ] ]
> => 251
>
> 40% of the second example's time is spent doing the above comparison of local
> names with the sources file.
Hmm, not sure if this benchmark really reflect the real world apps.
> I'm curious as to why this comparison with the local name of the sources file
> is actually needed; don't we generally know how we're using a file, and so
> could just set the text converter to the proper one in the case of the
> sources file?
My feeling is that once we recteate the .sources file, we can
eliminate the check.
> And it still seems to me that having 'binary' flavors of the file opening
> routines would help in the cases where we know that we're not dealing with
> text.
Yes.
Again, the rule of thumb is to be careful about you are dealing with
uninterpreted bytes or text and use #binary and #text properly. I
think it is wise to suggest to use #converter: only when it really
matters.
-- Yoshiki
More information about the Squeak-dev
mailing list
|