[squeak-dev] Decompiler buggy (was: AW: [Etoys, Compiler] Help wanted: Trying to embed SyntaxMorphs into other tiles)

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sun Mar 29 15:14:50 UTC 2020


Le dim. 29 mars 2020 à 15:31, Thiede, Christoph <
Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :

> Hey Nicolas,
>
>
> > > Seriously, did the Decompiler ever reliably produce re-generatable
> parse trees in the past? But it should do so, shouldn't it? :-)
> >
> > Maybe it did (see below). But I'm not sure that is was a feature...
> > Isn't it mostly used for replacing absent source code... that will
> eventually be repasrsed ? (!)
>
> Well, it may be disputable whether decompiled trees should be optimized,
> but returning trees from anywhere that do not satisfy particular validity
> conditions (such as index being only set iff a special selector should be
> encoded) definitively appears wrong and buggy to me.
>
> > As you see, index i passed as argument to #code: keyword (? it's
> because it's documenting the output, not the input); then code: parameter
> shadowing the index instance variable...
>
> So would you agree to patch DecompilerConstructor >> #codeAnyLiteral:,
> too? :)
>
> > After parsing, there are other compilation phases, for analyzing
> variable scope, clean blocks, etc...
>
> Hm ... the Compiler divides the compilation phase into two main
> stages (see #evaluateCue:ifFail:): The first stage is actual "compilation",
> that is translating the source into a parse tree in the parser. The second
> stage is to generate a compiled method, which is done by simply passing
> #generate(WithTempNames) to the parse tree. For me, this appears to be a
> good logical separation.
> Things like scope analysis are done, as you say, in the second stage, of
> course. But I would not expect that optimizations such as special selectors
> are already applied in the first stage (this was also kind of confusing
> when I tried to debug certain optimizations such as of #caseOf:). Isn't it
> the general idea of a parse tree to have an intermediary representation
> between a primitive code string and a VM-specific set of bytecodes? Certain
> optimizations are not even relevant for other parser clients, for example,
> any code analysis tools.
> It would be great to decouple these stages even more - let's say, we can
> move all the #noteSpecialSelector: and #transform: senders and apply it
> directly before, or inside of #generate, only. A visitor sounds like a good
> pattern for this, I did not yet have many opportunities to apply this
> pattern in practice.
>
> > Pharo team did a complete re-engineering of compiler (OpalCompiler)
> that you culd study.
>
> Wow, I read some <https://de.slideshare.net/jressia/opal-compiler> slides
> <https://de.slideshare.net/jressia/opal-compiler> about OpalCompiler and
> it sounds great! Allow me one question, why didn't we already adapt this
> concept in Squeak, what are the disadvantages of this redesign? We could
> achieve so much more if everyone was pulling in the same direction (I know
> that it was the Pharo people to fork Squeak, however ...).
>
>
One disadvantage is that it's about twice slower.
The is because byte code instructions are reified.
Thus instead of source -> parse tree -> compiledMethod
the flow is source -> parse tree -> instructions -> compiledMethod

It would be possible to make the instructions intermediate representation
optional though.

The second disadvantage is that, IMO, it's a bit over engineered.
One consequence is that patching the compiler for accepting methods > 15
arguments (required for Smallapack),
or for accepting legacy FFI syntax took me more efforts than patching the
legacy squeak compiler.

However, it's a nice piece of code.
It should not be too hard to port to Squeak.
Though, it relies on revamped parse tree nodes (those of refactoring
browser, with slight evolutions...).

Best,
> Christoph
>
> ------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
> *Gesendet:* Samstag, 28. März 2020 14:09:15
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] Decompiler buggy (was: AW: [Etoys, Compiler]
> Help wanted: Trying to embed SyntaxMorphs into other tiles)
>
> Hi Christoph,
>
> Le sam. 28 mars 2020 à 01:12, Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :
>
>> Hi Eliot, hi all,
>>
>>
>> ah, I finally found the bug, but this was a really hard hunt! :D
>>
>>
>> The solution is absolutely simple, again:
>>
>>
>> codeAnySelector: selector
>>
>>
>> ^SelectorNode new
>>
>> key: selector
>>
>> + index: nil
>>
>> - index: 0
>>
>> type: SendType
>>
>>
>> Good find!
>
>> Seriously, did the Decompiler ever reliably produce re-generatable parse
>> trees in the past? But it should do so, shouldn't it? :-)
>>
>> Maybe it did (see below). But I'm not sure that is was a feature...
> Isn't it mostly used for replacing absent source code... that will
> eventually be repasrsed ? (!)
>
> Before the above patch, the following example was broken, too:
>>
>> class := Object newSubclass.
>> class compile: 'foo ^ 1 + 1'.
>> (class >> #foo) decompile generate valueWithReceiver: class new
>> arguments: #(). "SmallInteger does not understand #foo"
>>
>>
>> Now I'm wondering what are the actual semantics of the index variable.
>> Its method comment about "various uses depending on the class of the
>> receiver" is quite generic - do you know some more details about this?
>> Should we also use nil instead of 0 in DecompilerConstructor >> #
>> codeAnyLiteral:? At first glance, senders of #encodeLiteral: do not
>> appear to set it to zero manually (so they leave it nil), but unless there
>> is any documentation of the index meaning, this is speculation only, as I
>> could not find any other example where decompilation + regeneration produce
>> a method that cannot be executed properly.
>>
>> It's very low level, some kind of reflexion of byteCode encoding.
> Once upon a time (< Squeak4.0), the code was even more horrible to follow!
>
> LeafNode>>key: object index: i type: type
>     self key: object code: (self code: i type: type)
>
> LeafNode>>code: index type: type
>     index isNil
>          ifTrue: [^type negated].
>      (CodeLimits at: type) > index
>          ifTrue: [^(CodeBases at: type) + index].
>      ^type * 256 + index
>
> As you see, index i passed as argument to #code: keyword (? it's because
> it's documenting the output, not the input);
> then code: parameter shadowing the index instance variable...
> And the index instance variable was not set... Kind of brainfuck.
>
> We still have code:type: and index variable shadowing in current trunk...
>
> By the way, here is another interesting one-liner:
>>
>> (Object newSubclass environment: self environment; compile: 'foo
>> ^(ObjectTracer on: nil) class'; >> #foo) decompile generate
>> valueWithReceiver: nil arguments: #()
>>
>>
>> Interestingly, it opens a debugger - in other words, #class is sent as a
>> regular selector. The decompiler does not know anything about special
>> selectors at the moment. Is this desired behavior? I wonder whether it
>> should be the parse tree's responsibility to install such kind of
>> optimizations, rather than the responsibility of the Compiler.
>> Because in reality, Compiler is not the only client that requests code
>> generation from parse trees. Etoys is a good example for a client from
>> another domain that uses this service, too. Should all these other clients
>> be withheld these important optimizations of Smalltalk expressions?
>>
>> After parsing, there are other compilation phases, for analyzing variable
> scope, clean blocks, etc...
> It's possible to scatter the implementation of various phases in the nodes
> themselves, but the trend is rather to use a visitor pattern;
> it gather the handling in some specialized classes that hold all the
> states (rather than pass them as message arguments).
> Pharo team did a complete re-engineering of compiler (OpalCompiler) that
> you culd study.
>
> Best,
>> Christoph
>>
>> ------------------------------
>> *Von:* Thiede, Christoph
>> *Gesendet:* Freitag, 27. März 2020 23:16 Uhr
>> *An:* The general-purpose Squeak developers list
>> *Betreff:* AW: [squeak-dev] [Etoys, Compiler] Help wanted: Trying to
>> embed SyntaxMorphs into other tiles
>>
>>
>> Hi Eliot,
>>
>>
>> > It looks correct.  Can you check it against the old bytecode set too?
>> We don’t want it to break old-style blocks.
>>
>> Good point. I ran
>>
>> (Object >> #asOrderedCollection) decompile generate valueWithReceiver: 42
>> arguments: #().
>>
>>
>> for both bytecode sets, and both were fine.
>>
>> But:
>>
>> (Collection >> #asArray) decompile generate valueWithReceiver: {42}
>> asOrderedCollection arguments: #().
>>
>>
>> breaks - in both bytecode sets. This is weird.
>> I will have a look into it, maybe I can discover what's wrong.
>>
>> In addition, I propose to write tests for this. But it's not the goal of
>> the decompiler to yield exactly the same parse tree or source code as the
>> original method consisted of? In this case, we will need to write a lot of
>> fixtures for the tests.
>>
>> Best,
>> Christoph
>>
>>
>>
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>> *Gesendet:* Freitag, 27. März 2020 21:33 Uhr
>> *An:* The general-purpose Squeak developers list
>> *Betreff:* Re: [squeak-dev] [Etoys, Compiler] Help wanted: Trying to
>> embed SyntaxMorphs into other tiles
>>
>> Hi Christoph,
>>
>> On Mar 27, 2020, at 12:45 PM, Thiede, Christoph <
>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>
>> 
>>
>> Hi all! :-)
>>
>> Just an update of the decompilation question:
>>
>> Christoph Thiede wrote
>> I don't know how to use #generate: exactly, but other senders usually
>> appear to recompile a method before passing it to #generate:.
>> For comparison:
>>
>> [ (Collection >> #asArray) decompile generate: CompiledMethodTrailer
>> empty ] fails, but
>>
>> [ m := (Collection >> #asArray) decompile.
>>
>>   m := Compiler new compile: m in: Collection notifying: nil ifFail: #foo.
>>   m generate: CompiledMethodTrailer empty ] works.
>> Why is that recompilation required but decompilation is insufficient? Is
>> this some bug, or is it expected behavior?
>>
>> The general approach seems to be correct, but I think I found an error in
>> the decompilation of literal variables such as Array. I sent
>> Compiler-ct.425 to the inbox which should fix this issue.
>>
>>
>> I moved this to inbox.  It looks correct.  Can you check it against the
>> old bytecode set too?  We don’t want it to break old-style blocks.
>>
>> <http://www.hpi.de/>
>>
>> I am going to complete the implementation of SyntaxMorph >> #parseNode :-)
>>
>> Best,
>> Christoph
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von Thiede, Christoph
>> *Gesendet:* Dienstag, 15. Oktober 2019 21:08:24
>> *An:* squeak-dev at lists.squeakfoundation.org
>> *Betreff:* [squeak-dev] [Etoys, Compiler] Help wanted: Trying to embed
>> SyntaxMorphs into other tiles
>>
>>
>> Hi all,
>>
>>
>> I'm currently trying to implement #parseNodeWith: on SyntaxMorph, in
>> order to embed SyntaxMorphs into regular tiles. (Did this ever work in
>> past?)
>>
>> I'm afraid the attempt in the commit below does not work yet; you can
>> create a script editor, but parsing is erroneous, so you cannot execute the
>> script.
>>
>>
>> To reproduce:
>>
>> Compile the following:
>>
>> MyPlayer >> examplePlayerCode
>>
>> self forward: 6 * 7.
>>
>> self turn: (11 raisedTo: 13 modulo: 97)
>>
>> and evaluate:
>>
>> | e p |
>> p := Morph new openInWorld assuredPlayer.
>> e := (MyPlayer >> #examplePlayerCode) decompile asScriptEditorFor: p.
>> e openInHand.
>>
>>
>> In Player>>#acceptScript:for:, #generate: is called on node, and when I
>> decompile the result, I get a strange result:
>>
>>
>> examplePlayerCodeTest
>>
>> self forward: 6 * 7.
>>
>> self
>>
>> forward: (#forward: forward: #forward:).
>>
>>
>> I don't know how to use #generate: exactly, but other senders
>> usually appear to recompile a method before passing it to #generate:.
>>
>> For comparison:
>>
>> [ (Collection >> #asArray) decompile generate: CompiledMethodTrailer
>> empty ] fails, but
>>
>> [ m := (Collection >> #asArray) decompile.
>>   m := Compiler new compile: m in: Collection notifying: nil ifFail: #foo.
>>   m generate: CompiledMethodTrailer empty ] works.
>>
>> Why is that recompilation required but decompilation is insufficient? Is
>> this some bug, or is it expected behavior?
>>
>>
>> However, in the case of SyntaxMorph, I don't know how to recompile the
>> node before, as a SyntaxMorph should be able to represent a node of an
>> arbitrary type that must not be constrained to a MessageNode. So how could
>> I solve the problem to generate code from SyntaxMorphs?
>>
>>
>> tl;dr: What is the full story of #generate: and how can it be made to
>> work in this example?
>>
>> Many thanks in advance! :-)
>>
>>
>> Best,
>>
>> Christoph
>>
>>
>> ------------------------------
>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>> Auftrag von commits at source.squeak.org <commits at source.squeak.org>
>> *Gesendet:* Dienstag, 15. Oktober 2019 14:46 Uhr
>> *An:* squeak-dev at lists.squeakfoundation.org
>> *Betreff:* [squeak-dev] The Inbox: EToys-ct.367.mcz
>>
>> A new version of EToys was added to project The Inbox:
>> http://source.squeak.org/inbox/EToys-ct.367.mcz
>>
>> ==================== Summary ====================
>>
>> Name: EToys-ct.367
>> Author: ct
>> Time: 15 October 2019, 2:46:24.862129 pm
>> UUID: 1394344f-b1e3-5640-a13a-70c5dffd51f4
>> Ancestors: EToys-mt.361
>>
>> Allow for embedding SyntaxMorphs into test tiles.
>>
>> =============== Diff against EToys-mt.361 ===============
>>
>> Item was added:
>> + ----- Method: SyntaxMorph>>parseNodeWith:asStatement: (in category
>> '*Etoys-Squeakland-code generation') -----
>> + parseNodeWith: encoder asStatement: aBoolean
>> +
>> +        ^ self parseNode!
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200329/9b8da018/attachment-0001.html>


More information about the Squeak-dev mailing list