[squeak-dev] Decompiler buggy (was: AW: [Etoys, Compiler] Help wanted: Trying to embed SyntaxMorphs into other tiles)

Eliot Miranda eliot.miranda at gmail.com
Sun Mar 29 19:57:02 UTC 2020


Hi Christoph,

On Sun, Mar 29, 2020 at 11:21 AM Thiede, Christoph <
Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:

> Hi Eliot, this sounds like a reasonable piece of work. I'll need to
> reverse-engineer all the relevant stuff first, but it will put it onto my
> list with a priority above average :)
>

Thank you!


> One question in general, both index and code appear to be referenced by
> LeafNode itself mainly for accessing and initialization purposes. Why can't
> we define these inst vars per subclass and use an abstract getter in
> LeafNode (if necessary at all)? I have the feeling that this could simplify
> explanation and understanding of the several meanings of index.
>

Sounds reasonable to me. What ever seems best to you. But look at the
BytecodeEncoder API before you introduce too much abstraction.  And I'm
eager to review code, help, etc.


>
> Best,
> Christoph
> ------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
> *Gesendet:* Sonntag, 29. März 2020 19:49:33
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] Decompiler buggy (was: AW: [Etoys, Compiler]
> Help wanted: Trying to embed SyntaxMorphs into other tiles)
>
> Hi Christoph,
>
>     please read what I'm about to say carefully.  This message is aimed at
> you :-)
>
> On Sat, Mar 28, 2020 at 6:09 AM Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> wrote:
>
>> Hi Christoph,
>>
>> Le sam. 28 mars 2020 à 01:12, Thiede, Christoph <
>> Christoph.Thiede at student.hpi.uni-potsdam.de> a écrit :
>>
>>> Hi Eliot, hi all,
>>>
>>>
>>> ah, I finally found the bug, but this was a really hard hunt! :D
>>>
>>>
>>> The solution is absolutely simple, again:
>>>
>>>
>>> codeAnySelector: selector
>>>
>>>
>>> ^SelectorNode new
>>>
>>> key: selector
>>>
>>> + index: nil
>>>
>>> - index: 0
>>>
>>> type: SendType
>>>
>>>
>>> Good find!
>>
>>> Seriously, did the Decompiler ever reliably produce re-generatable parse
>>> trees in the past? But it should do so, shouldn't it? :-)
>>>
>>> Maybe it did (see below). But I'm not sure that is was a feature...
>> Isn't it mostly used for replacing absent source code... that will
>> eventually be repasrsed ? (!)
>>
>> Before the above patch, the following example was broken, too:
>>>
>>> class := Object newSubclass.
>>> class compile: 'foo ^ 1 + 1'.
>>> (class >> #foo) decompile generate valueWithReceiver: class new
>>> arguments: #(). "SmallInteger does not understand #foo"
>>>
>>>
>>> Now I'm wondering what are the actual semantics of the index variable.
>>> Its method comment about "various uses depending on the class of the
>>> receiver" is quite generic - do you know some more details about this?
>>> Should we also use nil instead of 0 in DecompilerConstructor >> #
>>> codeAnyLiteral:? At first glance, senders of #encodeLiteral: do not
>>> appear to set it to zero manually (so they leave it nil), but unless there
>>> is any documentation of the index meaning, this is speculation only, as I
>>> could not find any other example where decompilation + regeneration produce
>>> a method that cannot be executed properly.
>>>
>>> It's very low level, some kind of reflexion of byteCode encoding.
>> Once upon a time (< Squeak4.0), the code was even more horrible to follow!
>>
>> LeafNode>>key: object index: i type: type
>>     self key: object code: (self code: i type: type)
>>
>> LeafNode>>code: index type: type
>>     index isNil
>>          ifTrue: [^type negated].
>>      (CodeLimits at: type) > index
>>          ifTrue: [^(CodeBases at: type) + index].
>>      ^type * 256 + index
>>
>
> Exactly.  This is actually obsolete genius by Dan Ingalls.  If you have a
> look at the original Smalltalk-80 bytecode compiler you'll see that the
> parse tree nodes both represent the parse tree *and* generate the output
> bytecodes,  This was really important on 16-bit Smalltalk-80 since it meant
> that the bytecode compiler was extremely compact and concise.  Objects were
> in extremely short supply, 32k objects in a normal implementation (with
> 15-bit SmallIntegers), and 48k objects in a "stretch" implementation that
> had 14-bit SmallIntegers.
>
> Now we have 32-bit and 64-bit implementations this concision is obsolete
> and what we need is flexibility and clarity.
>
> I had done some reimplementation work on the bytecode compiler in 2009 to
> add the closure bytecodes, and to add a proper code generation back end in
> the BytecodeEncoder framework, but I never finished the cleanup. The index
> and code inst vars in the LeafNode hierarchy are vestiges of the old
> implementation.  It would be really good to get rid of the code inst var
> altogether and to be left only with index, and index being the literal
> index for literal nodes (perhaps negative indices being used for special
> selectors), index being the inst var index for inst var nodes, and index
> being the temp var offset for temp var nodes, etc.
>
> But this really needs someone with fresh eyes and energy.  My plate is
> full.  When I did think of doing this I realized that it is probably wise
> to clone the compiler altogether and do the development and testing work in
> the clone before moving it back to LeafNode et al for the first functional
> commit.  This to avoid breaking the compiler while trying to fix it.
>
> So Christoph, do you accept my challenge and will you try and eliminate
> the code inst var from LeafNode?
>
>
>
>>
>> As you see, index i passed as argument to #code: keyword (? it's because
>> it's documenting the output, not the input);
>> then code: parameter shadowing the index instance variable...
>> And the index instance variable was not set... Kind of brainfuck.
>>
>> We still have code:type: and index variable shadowing in current trunk...
>>
>> By the way, here is another interesting one-liner:
>>>
>>> (Object newSubclass environment: self environment; compile: 'foo
>>> ^(ObjectTracer on: nil) class'; >> #foo) decompile generate
>>> valueWithReceiver: nil arguments: #()
>>>
>>>
>>> Interestingly, it opens a debugger - in other words, #class is sent as a
>>> regular selector. The decompiler does not know anything about special
>>> selectors at the moment. Is this desired behavior? I wonder whether it
>>> should be the parse tree's responsibility to install such kind of
>>> optimizations, rather than the responsibility of the Compiler.
>>> Because in reality, Compiler is not the only client that requests code
>>> generation from parse trees. Etoys is a good example for a client from
>>> another domain that uses this service, too. Should all these other clients
>>> be withheld these important optimizations of Smalltalk expressions?
>>>
>>> After parsing, there are other compilation phases, for analyzing
>> variable scope, clean blocks, etc...
>> It's possible to scatter the implementation of various phases in the
>> nodes themselves, but the trend is rather to use a visitor pattern;
>> it gather the handling in some specialized classes that hold all the
>> states (rather than pass them as message arguments).
>> Pharo team did a complete re-engineering of compiler (OpalCompiler) that
>> you culd study.
>>
>> Best,
>>> Christoph
>>>
>>> ------------------------------
>>> *Von:* Thiede, Christoph
>>> *Gesendet:* Freitag, 27. März 2020 23:16 Uhr
>>> *An:* The general-purpose Squeak developers list
>>> *Betreff:* AW: [squeak-dev] [Etoys, Compiler] Help wanted: Trying to
>>> embed SyntaxMorphs into other tiles
>>>
>>>
>>> Hi Eliot,
>>>
>>>
>>> > It looks correct.  Can you check it against the old bytecode set
>>> too?  We don’t want it to break old-style blocks.
>>>
>>> Good point. I ran
>>>
>>> (Object >> #asOrderedCollection) decompile generate valueWithReceiver:
>>> 42 arguments: #().
>>>
>>>
>>> for both bytecode sets, and both were fine.
>>>
>>> But:
>>>
>>> (Collection >> #asArray) decompile generate valueWithReceiver: {42}
>>> asOrderedCollection arguments: #().
>>>
>>>
>>> breaks - in both bytecode sets. This is weird.
>>> I will have a look into it, maybe I can discover what's wrong.
>>>
>>> In addition, I propose to write tests for this. But it's not the goal of
>>> the decompiler to yield exactly the same parse tree or source code as the
>>> original method consisted of? In this case, we will need to write a lot of
>>> fixtures for the tests.
>>>
>>> Best,
>>> Christoph
>>>
>>>
>>>
>>> ------------------------------
>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
>>> *Gesendet:* Freitag, 27. März 2020 21:33 Uhr
>>> *An:* The general-purpose Squeak developers list
>>> *Betreff:* Re: [squeak-dev] [Etoys, Compiler] Help wanted: Trying to
>>> embed SyntaxMorphs into other tiles
>>>
>>> Hi Christoph,
>>>
>>> On Mar 27, 2020, at 12:45 PM, Thiede, Christoph <
>>> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>>>
>>> 
>>>
>>> Hi all! :-)
>>>
>>> Just an update of the decompilation question:
>>>
>>> Christoph Thiede wrote
>>> I don't know how to use #generate: exactly, but other senders usually
>>> appear to recompile a method before passing it to #generate:.
>>> For comparison:
>>>
>>> [ (Collection >> #asArray) decompile generate: CompiledMethodTrailer
>>> empty ] fails, but
>>>
>>> [ m := (Collection >> #asArray) decompile.
>>>
>>>   m := Compiler new compile: m in: Collection notifying: nil ifFail:
>>> #foo.
>>>   m generate: CompiledMethodTrailer empty ] works.
>>> Why is that recompilation required but decompilation is insufficient? Is
>>> this some bug, or is it expected behavior?
>>>
>>> The general approach seems to be correct, but I think I found an error
>>> in the decompilation of literal variables such as Array. I sent
>>> Compiler-ct.425 to the inbox which should fix this issue.
>>>
>>>
>>> I moved this to inbox.  It looks correct.  Can you check it against the
>>> old bytecode set too?  We don’t want it to break old-style blocks.
>>>
>>> <http://www.hpi.de/>
>>>
>>> I am going to complete the implementation of SyntaxMorph >>
>>> #parseNode :-)
>>>
>>> Best,
>>> Christoph
>>> ------------------------------
>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>> Auftrag von Thiede, Christoph
>>> *Gesendet:* Dienstag, 15. Oktober 2019 21:08:24
>>> *An:* squeak-dev at lists.squeakfoundation.org
>>> *Betreff:* [squeak-dev] [Etoys, Compiler] Help wanted: Trying to embed
>>> SyntaxMorphs into other tiles
>>>
>>>
>>> Hi all,
>>>
>>>
>>> I'm currently trying to implement #parseNodeWith: on SyntaxMorph, in
>>> order to embed SyntaxMorphs into regular tiles. (Did this ever work in
>>> past?)
>>>
>>> I'm afraid the attempt in the commit below does not work yet; you can
>>> create a script editor, but parsing is erroneous, so you cannot execute the
>>> script.
>>>
>>>
>>> To reproduce:
>>>
>>> Compile the following:
>>>
>>> MyPlayer >> examplePlayerCode
>>>
>>> self forward: 6 * 7.
>>>
>>> self turn: (11 raisedTo: 13 modulo: 97)
>>>
>>> and evaluate:
>>>
>>> | e p |
>>> p := Morph new openInWorld assuredPlayer.
>>> e := (MyPlayer >> #examplePlayerCode) decompile asScriptEditorFor: p.
>>> e openInHand.
>>>
>>>
>>> In Player>>#acceptScript:for:, #generate: is called on node, and when I
>>> decompile the result, I get a strange result:
>>>
>>>
>>> examplePlayerCodeTest
>>>
>>> self forward: 6 * 7.
>>>
>>> self
>>>
>>> forward: (#forward: forward: #forward:).
>>>
>>>
>>> I don't know how to use #generate: exactly, but other senders
>>> usually appear to recompile a method before passing it to #generate:.
>>>
>>> For comparison:
>>>
>>> [ (Collection >> #asArray) decompile generate: CompiledMethodTrailer
>>> empty ] fails, but
>>>
>>> [ m := (Collection >> #asArray) decompile.
>>>   m := Compiler new compile: m in: Collection notifying: nil ifFail:
>>> #foo.
>>>   m generate: CompiledMethodTrailer empty ] works.
>>>
>>> Why is that recompilation required but decompilation is insufficient? Is
>>> this some bug, or is it expected behavior?
>>>
>>>
>>> However, in the case of SyntaxMorph, I don't know how to recompile the
>>> node before, as a SyntaxMorph should be able to represent a node of an
>>> arbitrary type that must not be constrained to a MessageNode. So how could
>>> I solve the problem to generate code from SyntaxMorphs?
>>>
>>>
>>> tl;dr: What is the full story of #generate: and how can it be made to
>>> work in this example?
>>>
>>> Many thanks in advance! :-)
>>>
>>>
>>> Best,
>>>
>>> Christoph
>>>
>>>
>>> ------------------------------
>>> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
>>> Auftrag von commits at source.squeak.org <commits at source.squeak.org>
>>> *Gesendet:* Dienstag, 15. Oktober 2019 14:46 Uhr
>>> *An:* squeak-dev at lists.squeakfoundation.org
>>> *Betreff:* [squeak-dev] The Inbox: EToys-ct.367.mcz
>>>
>>> A new version of EToys was added to project The Inbox:
>>> http://source.squeak.org/inbox/EToys-ct.367.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: EToys-ct.367
>>> Author: ct
>>> Time: 15 October 2019, 2:46:24.862129 pm
>>> UUID: 1394344f-b1e3-5640-a13a-70c5dffd51f4
>>> Ancestors: EToys-mt.361
>>>
>>> Allow for embedding SyntaxMorphs into test tiles.
>>>
>>> =============== Diff against EToys-mt.361 ===============
>>>
>>> Item was added:
>>> + ----- Method: SyntaxMorph>>parseNodeWith:asStatement: (in category
>>> '*Etoys-Squeakland-code generation') -----
>>> + parseNodeWith: encoder asStatement: aBoolean
>>> +
>>> +        ^ self parseNode!
>>>
>>>
>>>
>>>
>>>
>>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200329/4cc971d9/attachment.html>


More information about the Squeak-dev mailing list