[squeak-dev] About generating compiled method(s)

Fri Dec 11 04:52:03 UTC 2009

2009/12/11 Eliot Miranda <eliot.miranda at gmail.com>:
>
>
> On Thu, Dec 10, 2009 at 4:57 PM, Igor Stasenko <siguctua at gmail.com> wrote:
>>
>> 2009/12/11 Eliot Miranda <eliot.miranda at gmail.com>:
>> > Hi Igor,
>> >
>> > On Thu, Dec 10, 2009 at 3:40 PM, Igor Stasenko <siguctua at gmail.com>
>> > wrote:
>> >>
>> >> This question is mostly to Eliot,
>> >> since he the last who dare to touch the Compiler deeply..
>> >> My concern is use pattern:
>> >>
>> >>        method := methodNode generate: #(0 0 0 0).
>> >>        ^method copyWithTempsFromMethodNode: methodNode
>> >>
>> >> the point of nitpicking is the need of generating a dummy
>> >> CompiledMethod instance,
>> >> which used for nothing else than making a copy and attachment of temp
>> >> names.
>> >>
>> >> And, the question, is there a way to skip generating the dummy
>> >> compiled method and just do:
>> >>
>> >>  method := methodNode generateWithTempNames.
>> >>
>> >> or:
>> >>
>> >>  method := methodNode generate: (methodNode tempsTrailer).
>> >>
>> >> ?
>> >>
>> >
>> > I think so, but is it worth-while?  The temp names can only be generated
>> > correctly once the closure analysis is done, see
>> > ensureClosureAnalysisDone
>> > in generate:.  So you'd have to pass in a flag to generate: saying you
>> > wanted to append temps, compute the temps
>> > after ensureClosureAnalysisDone,
>> > derive how many bytes they would compress to, and add that to the size
>> > of
>> > the method being computed.  But that would open up details of the
>> > compression scheme to the generate: method.  But saving one
>> > instantiation
>> > amongst hundreds, perhaps thousands, in an average compile is probably
>> > not
>> > worth it.
>> > I expect that more worth-while would be to rip out my hack of Dan's hack
>> > compression algorithm for temp names and replace it with the use of
>> > gzip,
>> > which is built into the system and does a far better job than my
>> > modification of Dan's scheme.
>>
>> I doubt that gzip will be able to compress well anything which is less
>> than 100 bytes long.Most of smalltalk methods contain very few temps,
>> and gzip compression is not working well with very small portions of
>> data
>
> OK, this is with Nicholas' stream testing doit that crahsed the Cog JIT
> because it wasn't ignoring trailing bytes:
> self tempNamesString 'timing (strm lineCount)[(strm lineCount)[(strm
> lineCount)]][(strm lineCount)][(strm lineCount)[(strm lineCount)]][(strm
> lineCount)][(strm lineCount)[(strm lineCount)]][(strm lineCount)][(strm
> lineCount)[(strm lineCount)]][(strm lineCount)]'
> ((ZipWriteStream on: (ByteArray new: 1024))
> nextPutAll: self tempNamesString asByteArray;
> close;
> encodedStream) contents size 35
> self size - self endPC 167
> (ZipReadStream on: ((ZipWriteStream on: (ByteArray new: 1024))
> nextPutAll: self tempNamesString asByteArray;
> close;
> encodedStream) contents) contents asString 'timing (strm lineCount)[(strm
> lineCount)[(strm lineCount)]][(strm lineCount)][(strm lineCount)[(strm
> lineCount)]][(strm lineCount)][(strm lineCount)[(strm lineCount)]][(strm
> lineCount)][(strm lineCount)[(strm lineCount)]][(strm lineCount)]'
> i.e. my adaptation of Dan's algorithm uses 167 bytes but ZipRead/WriteStream
> uses 35 bytes.  I note that the empty string takes 2 bytes.  So pretty good
> :)
>

Okay, then maybe there should be 2 forms - compressed and
uncompressed, selected
by comparing the resulting sizes. +1 byte indicating the kind of compression.

>>
>> |str|
>> str := 'a b c d e f g h '.
>> { str size.  str zipped size } "print it"
>>
>> okay, as you said its not possible to determine the right number of
>> temps without generating the method (a more correct would be - hard ,
>> not impossible ;) ) however , i think that hiding this behavior inside
>> method node is worthwhile, because usage pattern, like:
>>
>> result := something foo.
>> result := result bar: something.
>>
>> crying for being replaced with just:
>> result := something zork.
>>
>> because otherwise it brings unnecessary detail out of scope of method
>> generation, which forcing users to repeat same elaborate pattern in
>> different places.
>>
>> The trailing bytes generation is a cryptic stuff.. for instance, why i
>> see (#0 0 0 0) sent to #generate: everywhere?
>> Any person who sees it, start asking questions, what if i send #(1 2 3
>> 4) instead , or send #(1).. and generally,
>> is it safe to pass something else than #(0 0 0 0)?
>> So, in order to reduce confusion, it is worthwhile to hide
>> implementation detail from eyes of user.
>>
>> If one wants to put a method source pointer, then he should tell
>> method node to generate method with it:
>>
>> methodNode generateWithSourcePointer: anInteger
>> and if user wants to generate method with temp names, then he can do
>> it by telling:
>> methodNode generateWithTempNames
>>
>> and if user wants to generate method with arbitrary trailing bytes,
>> then user should be exterminated and replaced by advanced version
>> (iUser) :)
>>
>
> Seems eminently reasonable :)
>

Good, then i can produce the changeset for it, which then can be
adopted in trunk and Pharo both.

But its not all of my problems.
During my today's explorations i discovered, that if i want to keep a
method's sources somewhere else than in default location,
i have no any means to tell the debugger where to retrieve them.
Debugger instead of asking class about method's source, talks directly
to method.  :(

Actually debugger sends #methodNode to compiled method and then to get
the sources it sends #sourceText to method node,
which leaves the class completely out of control.

But what if by chance, the class has own ways to retrieve the method's
source code? Otherwise, why do we need the #sourceCodeAt: at all then,
if we can't allow more that one ways to store the sources?

Of course, there's no guarantees that class will answer correct source
code , which semantically equal to method ,
but this can be checked by comparing between a method node constructed
from source code aswered by class and method node reconstructed by
decompiler  , or by simply compiling the suggested source and
comparing it to method's bytecode.

Moreover, we need to check that anyways, even for sources, retrieved
using today's ways , since there is no guarantees that code not
affected by some buggy layer which delivering the wrong sources. I
think this is the case, where debugger should not trust anyone.

Some of the Pharo guys, btw, already had an idea of removing the
.sources and keep everything within an image. Allowing the class to
control where to store the sources is one of the ways to make such
transition less painful.

Please, tell me what you think about it.

-- 
Best regards,
Igor Stasenko AKA sig.