[Newcompiler] Properties for AST
Stéphane Ducasse
stephane.ducasse at univ-savoie.fr
Mon Apr 9 13:42:58 UTC 2007
On 9 avr. 07, at 10:23, Marcus Denker wrote:
>
> On 09.04.2007, at 09:00, Stéphane Ducasse wrote:
>
>> Hi
>>
>> I have a question is the information retained useful for other tools.
>>
>
> In case of the Tokens, I think we don't need them later. The whole
> idea of providing a high-level
> representation is to be able to throw away the low level one. What
> we need from the Tokens
> is meta data: comments and possitions in source. The position data
> we then can flush if it is identical
> to the one the pretty printer would re-create. This way, we don't
> need any position/whitespace data
> on the AST of any methods of the system other then newly created
> ones by people not using the
> pretty printer.
Ok but for example the debugger does not use such information to
highlight the messages?
>
> On the other hand, information from semantic analysis is indeed
> useful later (e.g. which variables with
> the same name are indeed accessing the same variable taking name
> resolution (shadowing) into account))
>
> One example for this is that before throwing out the semantic
> analysis data, I change the class of RBVariableNodes
> to RBInstVariableNode, RBTempVariableNode, .... so I can later know
> what kind of variable I deal with.
yes and the visitor can be much better using these class
>> I was wondering if it would not make sense to have multiple
>> representations of the tree.
>
> In the end, this is what properties now provide... but properties
> are not that nice, as they
> make a lot of things implicit that should be explicit, e.g. there
> is no way to know which properties
> exist, and which user of the tree adds which properties.
Exact.
>
> It's very easy to completely mess up the design of a software with
> properties, Morphic is a good
> example, especially the BookMorph abuses properties to an extend
> that nobody understands the
> code. One of the problems of properties is that they are added by
> evaluating code at runtime, like
> dynamically scoped variables there is no way to statically know if
> a variable if declared or not.
Exact this is why I asked :)
> So I think we should experiment a bit... having properties is the
> right thing for now, but I think
> I will not change the AST representation too much, but instead do
> an additional pass over it
> that transformes is into the representation I need (e.g. no Tokens).
>
>> Then I was wondering if this is not a clear indication that we
>> would need
>> hashBasedSubclass: in addition to the one we have
>>
>
> indeed. this would solve one problem: iVars taking space. Another
> thing they don't solve
> is that this way, all possible properties of all clients (e.g.
> compiler phases) would need
> to be declared in the AST classes. Here I like Wide Classes...
> where a Compiler Phase could
> declare "I need to extend the node object with this state".
>
> And in addition, properties remain useful for annotating data
> structures: Annotations should be
> possible even if the framework author does not anticipate the need
> for the annotation, and they
> have an extend that is possibly unlimited in time, so wide classes
> are not a solution for attaching
> meta data.
True.
I hate the term wide class when this is the objects that are changing
class :)
> Marcus
>
>
>> Stef
>>
>>> Hello,
>>>
>>> Yesterday I merged some work done earlier into the current
>>> version of the AST package.
>>>
>>> The problem with the RB AST ist size: Just using the AST and
>>> NewCompiler as is, the AST
>>> is quite huge. An image that keeps the complete AST of all
>>> methods is ca. 800MB in size.
>>>
>>> The reason for that is that far too much information is retained:
>>> the AST holds on the original text,
>>> all the scanner tokens, the Intermediate representation,
>>> information from semantic analysis... and
>>> all this information is saved in instance variables of the Node
>>> objects, which in turn are nil most
>>> of the time. (once you have 1.5 Million objects of something, an
>>> instance variable that is nil
>>> does indeed cost memory ;-))
>>>
>>> Over the next weeks/months I will slowly work on making the
>>> representation far more compact.
>>>
>>> As a first step, there is now a property interface on
>>> RBProgramNode. This allows to reduce the amount
>>> of instance variables: The idea is to have the AST encode just
>>> what realy is the AST directly (including
>>> names of variables), keeping all the transient data (e.g.
>>> semantic analysis data of the compiler) or
>>> meta-data (formatting, comments) as properties.
>>>
>>> The next step will be to get rid of the Scanner Tokens.
>>>
>>> Marcus
>>
>
More information about the Newcompiler
mailing list