Here are two short repliees condensed into one, to try and save a bit of mailing list traffic.
Peter van Rooijen peter@vanrooijen.com wrote:
Of course I'm curious how Chuck got its name :-).
Mostly, it is just a humble name a la Smalltalk itself. Also, though, it works by "chucking" ideas at a wall and seeing what sticks.
I have just a suggestion : could this AutoChuck feature be turned on only on certain packages, such as Kernel and not Morphic for example ? This could spare some time and quit a bit of space ...
I'm working on the space requirements, but wanted to go ahead and post the system as is for the diehards. Memory usage is not the only area that is lacking I'm afraid!
The major memory requirement, it turns out, is that AutoChuck is keeping around parse trees for every method. So I'm trying to eithre get rid of that requirement or make the parse trees take less memory.
The particular idea of caching part of the system seems difficult to implement. If you ask what the type of Morph>>width is, it will end up tracing into methods in Rectangle, which will in turn trace into methods in the numeric hierarchy. So what parts do you *really* need if you just want to analyze morphic stuff? It's hard to tell.
Romain Robbes rrobbes@info.unicaen.fr wrote:
A while ago I started (I mean, played 3 hours ;-)) a little program to store type information which could then be used by several tools, such as a type inferencer, an the toy I was doing, which was collecting run-time types, so they could "collaborate". This could then be used by a code completion system, or various navigation systems (how about clicking on a variable and being automagically teleported to it's class ?).
Oh yes, this is a great tool too, and it complements what Chuck does.
Chuck examines the program and gives you an upper bound on what the types might ever be; your tool examines executions and thus finds a *lower* bound on the types. Your tool is, I'm sure, consistently fast; on the other hand, Chuck will find a good many answers even in code that has yet to be executed. This is nice if you have not yet figured out how to run the code you are studying!
Also, upper bounds on the types have different applications in downstream tools such as compilers and dead code removers; a lower bound can inform the compiler that a case is worth optimizing (e.g. this method really does get run with FloatArray's), and an upper bound can let the compiler *avoid* dealing with certain cases (e.g., this message send can only ever invoke one particular method, so inline it).
Lex
"Lex Spoon" lex@cc.gatech.edu wrote:
The major memory requirement, it turns out, is that AutoChuck is keeping around parse trees for every method. So I'm trying to eithre get rid of that requirement or make the parse trees take less memory.
It might be useful to recall one of Eliot's battle cries - "CompiledMethods are flattened parsetrees"
tim -- Tim Rowledge, tim@sumeru.stanford.edu, http://sumeru.stanford.edu/tim Klingon Code Warrior:- 10) "This code is a piece of crap! You have no honor!"
Tim Rowledge tim@sumeru.stanford.edu wrote:
"Lex Spoon" lex@cc.gatech.edu wrote:
The major memory requirement, it turns out, is that AutoChuck is keeping around parse trees for every method. So I'm trying to eithre get rid of that requirement or make the parse trees take less memory.
It might be useful to recall one of Eliot's battle cries - "CompiledMethods are flattened parsetrees"
It is very tempting to try and use the compiled methods themselves as the compressed parse trees. That would mean there is zero overhead from "keeping around parse trees". However, I don't know if I will implement this idea myself. There is one very annoying issue: the decompiler does not always produce *exactly* the parse tree that was compiled to begin with.
One option that could be taken is to treat the decompressed parse trees as the "real" parse trees. However, then one faces difficulties in hooking up to the user-level tools. If a user highlights some code and asks for the type of it, then the highlighted expression had better actually exist in the parse trees! Also, when the inferencer reports a result, it gives a derivation of that result. All of the expressions mentioned in the derivation should, again, be expressions that exist in the code. In short, I want Chuck to be useful for program understanding tools, and such tools need to operate in the same universe of code that the programmers are in.
It may be possible to tweak the compiler and decompiler until they are symmetric. I really don't know; it would take a careful analysis of the compiler and decompiler to do it, though, and I'd rather avoid this.
That leaves one major option: include a tree diff alongside each compiled method. To get the user-visible parse tree, one could then decompile the compiled method, and then apply the tree diff to the result. The diffs should be small; in fact, most methods will probably decompile exactly, and thus the diffs would not require much memory. This looks like an excellent solution that will very clearly work, except for one tiny detail: I know absolutely nothing about tree diff algorithms and would have to go do some hefty studying before implementing one!
In summary, if anyone wants to try one of the above ideas to make compiled methods behave *exactly* like compressed parse trees, I will heartily try it out. For now, I plan to simply do a dumb flattening of the trees into a binary form, however, which seems to produce plenty of space savings by itself after just a couple of hours of coding.
-Lex
On May 28, 2004, at 4:46 PM, Lex Spoon wrote:
Tim Rowledge tim@sumeru.stanford.edu wrote:
"Lex Spoon" lex@cc.gatech.edu wrote:
The major memory requirement, it turns out, is that AutoChuck is keeping around parse trees for every method. So I'm trying to eithre get rid of that requirement or make the parse trees take less memory.
It might be useful to recall one of Eliot's battle cries - "CompiledMethods are flattened parsetrees"
That's an interesting position given his work on the AOStA bytecode optimizer. My guess would be that the more you optimize the bytecodes, the less they look like the parse tree that created them.
It is very tempting to try and use the compiled methods themselves as the compressed parse trees. That would mean there is zero overhead from "keeping around parse trees". However, I don't know if I will implement this idea myself. There is one very annoying issue: the decompiler does not always produce *exactly* the parse tree that was compiled to begin with.
This is something I've been working on addressing for the next version of OmniBrowser. Essentially, the next version of OB will operate on an AST that begins with Package at the top and extends down through Class and Method to sub-method parse nodes like MessageSend and Literal.
One of the goals with this is to get consistency between different representations of the code, and so the AST is designed to be as lossless as possible when converting back and forth between representations. The SourcePrinter for example, is essentially a null pretty-printer; it reconstructs the source code almost exactly the way the programmer entered it. (And this, incidentally, is the reason I didn't just reuse the RB parse nodes. They don't preserve enough formatting information.)
Some things I plan to do with this:
- Hook it up to the closure compiler and generate bytecodes from the AST
- Adapt the closure decompiler to generate an AST from bytecodes (exactly)
- Create a binary serialization format. There is some interesting literature on compressing ASTs. Essentially the abstract grammar is used as a statistical model for an arithmetic encoder. This gives better compression than, for example, gzipped bytecodes. This could be a very nice way to distribute code - more compact and faster to load than changesets.
- Enhance Monticello. This representation of code is much richer than Monticello's current list of definitions, and let us implement features that are currently difficult or impossible. Tree-diffing will be an important part of this, and might allow us to implement RB-like features in Monticello. (For example, you could compare two versions of a package and get a diff expressed in terms of refactorings: "Method A extracted from B, Method X pushed up to class Y.")
- Allow structured annotations and additions to the AST. This could be used for all sorts of things. Stéphane's idea of attaching type information, for example, might be interesting in relation to Chuck.
So this could be useful for Chuck in terms of either decompilation from byte codes or AST compression.
Going in the other direction, I'm interested in exploring ways that tools like OmniBrowser and Monticello can take advantage of the type information provided by Chuck. The thing that leaps immediately to mind is inter-package dependencies. I bet it would be feasible to deduce the minimal set of packages required to run, say, the unit tests for a package. (And the minimal code from the "as yet unpackaged" base image too!)
I'm also imagining a panel in OB for static type analysis. It could be a little like the test runner - buttons for analysing the code in various scopes - method, class, package etc and a list for displaying errors. Click on an error and the brower jumps to the relevant method and highlights the dangerous send. Or how about enhancing SLint with type analysis?
All in all, I'm pretty excited by the possibilities that Chuck opens up. Thanks Lex!
Colin
Colin Putney cputney@wiresong.ca wrote:
This is something I've been working on addressing for the next version of OmniBrowser. Essentially, the next version of OB will operate on an AST that begins with Package at the top and extends down through Class and Method to sub-method parse nodes like MessageSend and Literal.
Yep, I would love to have something around like this and not maintain my own. Already I have stopped using my custom parser in favor of RB's, and I could probably have dumped my parse nodes though that is more trouble than its worth so far. I still am having to deal with keeping an object model of the code in the image (or *not* in the image, depending on what you are analyzing). A lot of this object model could surely be offloaded into a tool like OmniBrowser... but of course, it doesn't actually exist yet. :)
The thing that leaps immediately to mind is inter-package dependencies. I bet it would be feasible to deduce the minimal set of packages required to run, say, the unit tests for a package. (And the minimal code from the "as yet unpackaged" base image too!)
Yeah, that's an interesting application of a dead code remover. Dead code removal is an interesting application, if nothing else because Agesen did it with his inferencer. :) Unfortunately it is not trivial with Chuck being demand-driven; it takes some thought to make it work, and I am not completely sure whether it will be effective. It will certainly improve things on the current image-shrinking scripts, but it might not be good enough to give you good dependency information between packages.
-Lex
On Jun 1, 2004, at 2:28 PM, Lex Spoon wrote:
Colin Putney cputney@wiresong.ca wrote:
This is something I've been working on addressing for the next version of OmniBrowser. Essentially, the next version of OB will operate on an AST that begins with Package at the top and extends down through Class and Method to sub-method parse nodes like MessageSend and Literal.
Yep, I would love to have something around like this and not maintain my own. Already I have stopped using my custom parser in favor of RB's, and I could probably have dumped my parse nodes though that is more trouble than its worth so far. I still am having to deal with keeping an object model of the code in the image (or *not* in the image, depending on what you are analyzing). A lot of this object model could surely be offloaded into a tool like OmniBrowser... but of course, it doesn't actually exist yet. :)
Well, it does exist, it's just that OmniBrowser doesn't use it yet. The latest version is here:
http://monticello.wiresong.ca/ob/CodeModel-cwp.41.mcz
To run the tests, you'll also need these three packages, which have dummy code:
http://monticello.wiresong.ca/ob/BogusInfo-cwp.5.mcz http://monticello.wiresong.ca/ob/Bogus-cwp.13.mcz http://monticello.wiresong.ca/ob/BogusExt-cwp.1.mcz
I'll put this one SM once I get OmniBrowser using it and things stabilize a bit.
Another possible use of Chuck that occurred to me is in translation. With type information one could do a version of Slang that was a lot closer to Smalltalk, using actual objects and polymorphic messages.
Colin
Colin Putney cputney@wiresong.ca wrote: A lot of this object model could
surely be offloaded into a tool like OmniBrowser... but of course, it doesn't actually exist yet. :)
Well, it does exist, it's just that OmniBrowser doesn't use it yet. The latest version is here:
Sorry, I meant *didn't* exist, when I was implementing that part of Chuck.
Another possible use of Chuck that occurred to me is in translation. With type information one could do a version of Slang that was a lot closer to Smalltalk, using actual objects and polymorphic messages.
Yeah, good idea. Note, of course, that translation and compilation are really the same thing!
-Lex
Am 01.06.2004 um 16:07 schrieb Colin Putney:
On May 28, 2004, at 4:46 PM, Lex Spoon wrote:
Tim Rowledge tim@sumeru.stanford.edu wrote:
"Lex Spoon" lex@cc.gatech.edu wrote:
The major memory requirement, it turns out, is that AutoChuck is keeping around parse trees for every method. So I'm trying to eithre get rid of that requirement or make the parse trees take less memory.
It might be useful to recall one of Eliot's battle cries - "CompiledMethods are flattened parsetrees"
That's an interesting position given his work on the AOStA bytecode optimizer. My guess would be that the more you optimize the bytecodes, the less they look like the parse tree that created them.
Yes, but that would be totaly transparent for the user (e.g. the developer). A dynamic optimizing system always de-optimizes as soon as you try to look harder (e.g with the debugger).
So all the magic of AOStA is pretty much invisible. On the other hand: Having a Jitter around by default means that we don't care about how bad the non-optimized code is wrt. to execution. We could use the AST directly. (Or better, a compressed version like franz' Slim Binaries...).
So "CompiledMethods are flattened parsetrees" does not mean that you would use the cm we have now as the AST, but use the AST as the compiled methods...
Marcus
On Jun 5, 2004, at 11:45 AM, Marcus Denker wrote:
Yes, but that would be totaly transparent for the user (e.g. the developer). A dynamic optimizing system always de-optimizes as soon as you try to look harder (e.g with the debugger).
So all the magic of AOStA is pretty much invisible. On the other hand: Having a Jitter around by default means that we don't care about how bad the non-optimized code is wrt. to execution. We could use the AST directly. (Or better, a compressed version like franz' Slim Binaries...).
Yes, I've been thinking about this as well. There's an interesting paper on compressing syntax trees using the abstract grammar as a statistical model, which I'm going to implement at some point. http://www.ics.uci.edu/~cstork/ire2001.pdf. It's designed so that most of the work is done by the compressor - decompression is fast.
The first step would be to create a binary format for distributing Squeak code that would be faster to load than fileOuts - ie, decompress the AST and generate byte code without having to do any parsing. Beyond that would be modifying the interpreter to execute the binary format directly.
So "CompiledMethods are flattened parsetrees" does not mean that you would use the cm we have now as the AST, but use the AST as the compiled methods...
Yes, "Parse trees are structured CompiledMethods". I'd love to see Squeak executing ASTs with the hotspots optimized by Exupery.
Colin
Hi colin
Alex has been working on a package format: (loading byte-code). I will ask him to release soon. We were a bit disappointing by the gain. The difference in Visualworks between loading fileout and the parcel loader made a really big difference. There are certainly some tricks we missed.
Stef
On 5 juin 04, at 20:22, Colin Putney wrote:
The first step would be to create a binary format for distributing Squeak code that would be faster to load than fileOuts - ie, decompress the AST and generate byte code without having to do any parsing. Beyond that would be modifying the interpreter to execute the binary format directly.
Colin Putney cputney@wiresong.ca wrote:
Yes, I've been thinking about this as well. There's an interesting paper on compressing syntax trees using the abstract grammar as a statistical model, which I'm going to implement at some point. http://www.ics.uci.edu/~cstork/ire2001.pdf. It's designed so that most of the work is done by the compressor - decompression is fast.
Sounds neat. Decompression speed is very important if you are doing it to save space in an image.
The first step would be to create a binary format for distributing Squeak code that would be faster to load than fileOuts - ie, decompress the AST and generate byte code without having to do any parsing. Beyond that would be modifying the interpreter to execute the binary format directly.
Note that there are differences between compressing for distribution and compressing to save space in an image. In the former case you need to go all the way down to bytes, but in the latter you can choose to use object pointers in some places. One place this would likely help is in encoding symbols; you can store #nextPut: as a 4-byte object pointer instead of an 8+ byte string "n e x t P u t :".
-Lex
On Jun 5, 2004, at 9:17 PM, Lex Spoon wrote:
The first step would be to create a binary format for distributing Squeak code that would be faster to load than fileOuts - ie, decompress the AST and generate byte code without having to do any parsing. Beyond that would be modifying the interpreter to execute the binary format directly.
Note that there are differences between compressing for distribution and compressing to save space in an image. In the former case you need to go all the way down to bytes, but in the latter you can choose to use object pointers in some places. One place this would likely help is in encoding symbols; you can store #nextPut: as a 4-byte object pointer instead of an 8+ byte string "n e x t P u t :".
Agreed. The paper I cited earlier mentions replacing strings in the AST with indexes into a string table, as the same strings tend to appear in several places in the tree. So the only real difference would be the table used to intern symbols - for distribution a local symbol table would be included in the file; within the image, we'd just use the regular symbol table.
I suppose there would be some processing required to load a slim binary method from a file into the equivalent of a CompiledMethod. References to shared variables would be resolved, literals converted to objects, symbols re-interned etc.
Colin
Hi Collin,
I am currently working on a binary mechanism saving for Monticello. I am about to release it.
Cheers, Alexandre
On Sun, Jun 06, 2004 at 08:32:55PM -0500, Colin Putney wrote:
On Jun 5, 2004, at 9:17 PM, Lex Spoon wrote:
The first step would be to create a binary format for distributing Squeak code that would be faster to load than fileOuts - ie, decompress the AST and generate byte code without having to do any parsing. Beyond that would be modifying the interpreter to execute the binary format directly.
Note that there are differences between compressing for distribution and compressing to save space in an image. In the former case you need to go all the way down to bytes, but in the latter you can choose to use object pointers in some places. One place this would likely help is in encoding symbols; you can store #nextPut: as a 4-byte object pointer instead of an 8+ byte string "n e x t P u t :".
Agreed. The paper I cited earlier mentions replacing strings in the AST with indexes into a string table, as the same strings tend to appear in several places in the tree. So the only real difference would be the table used to intern symbols - for distribution a local symbol table would be included in the file; within the image, we'd just use the regular symbol table.
I suppose there would be some processing required to load a slim binary method from a file into the equivalent of a CompiledMethod. References to shared variables would be resolved, literals converted to objects, symbols re-interned etc.
Colin
On SqueakMap I created a new entry named "BinarySavingMC" You can try it by looking at the class MCBinaryWriterTest
Actually, I do not have the gain expected. It is fast to load methds, and not really to create the classes. There should be something wrong, but I do not know why...
For instance #testCreatingVersionSavingLoading show gains with big method and few classes Test 1 ==> Time to load 100 methods using ascii file: 23298 Test 1 ==> Time to load 100 methods using binary file: 7957
But #testCreatingVersionSavingLoading2 say that classes are really costly: Test 2 ==> Time to load 3 methods using ascii file: 13705 Test 2 ==> Time to load 3 methods using binary file: 14803
Note that running the test should be performed without any monticello browser open. And the result of the bench are shown on the transcript.
Cheers, Alexandre
On Tue, Jun 08, 2004 at 11:17:13AM -0500, Colin Putney wrote:
On Jun 8, 2004, at 7:14 AM, Alexandre Bergel wrote:
Hi Collin,
I am currently working on a binary mechanism saving for Monticello. I am about to release it.
Excellent. I look forward to seeing it.
Colin
On 5 Jun 2004, at 19:22, Colin Putney wrote:
On Jun 5, 2004, at 11:45 AM, Marcus Denker wrote:
Yes, but that would be totaly transparent for the user (e.g. the developer). A dynamic optimizing system always de-optimizes as soon as you try to look harder (e.g with the debugger).
So all the magic of AOStA is pretty much invisible. On the other hand: Having a Jitter around by default means that we don't care about how bad the non-optimized code is wrt. to execution. We could use the AST directly. (Or better, a compressed version like franz' Slim Binaries...).
Yes, I've been thinking about this as well.
Me 2. ;-)
However, I think it would be even better if we decoupled the execution-object-tree (whatever you want to call it) from the syntax tree generated by the compiler. After all, one particular syntax is just another view onto the code (MVC), right? ;-)
Yes, "Parse trees are structured CompiledMethods".
"ExpressionObjects" can be structured CompiledMethods and much, much more...
I'd love to see Squeak executing ASTs with the hotspots optimized by Exupery.
:-)
Marcel
On Jun 7, 2004, at 4:35 PM, Marcel Weiher wrote:
However, I think it would be even better if we decoupled the execution-object-tree (whatever you want to call it) from the syntax tree generated by the compiler. After all, one particular syntax is just another view onto the code (MVC), right? ;-)
Hmm, I'm not sure I buy that. If there's some representation of a Smalltalk method that is a more general high-level abstraction than an AST, I'd like to know what it is. (Note that I'm talking about an abstract, not concrete, syntax tree).
The AST for a Smalltalk method is pretty clean. There isn't a whole lot in the syntax that isn't related to the semantics of the method's execution. So it wouldn't be difficult to create parsers and pretty printers to support alternate syntax. But really, why bother? Does anyone actually use the alternate syntax we already have? I am interested in alternate views of the code, just not alternate *syntax.*
Also, keeping the AST reasonably equivalent to it's textual representation has a lot of advantages - you can write tools like the refactoring browser or the debugger that present the code the programmer textually, but operate on the AST internally.
"ExpressionObjects" can be structured CompiledMethods and much, much more...
Like what?
Colin
Colin Putney writes:
Yes, "Parse trees are structured CompiledMethods". I'd love to see Squeak executing ASTs with the hotspots optimized by Exupery.
I do like the idea, and Exupery is progressing slowly towards being useable. In fact, my current efforts are aimed solely at making Exupery useful for hot leaf methods. It's easier to make a compiler perform for leaf methods than for sends. Sends are next, after tolerably good leaf method performance. But...
An AOStA style approach with two levels of byte-code could make a lot of sense. The first bytecode is highlevel, the flattened AST, basically Squeak bytecode without jumps so no inlining of conditional (ifTrue:ifFalse: etc) and loop control flow methods. With a second low level bytecode that's closer to Squeaks current full byte-code, with maybe a type check bytecode to make inlining fast, to inline down to all done at the Smalltalk above bytecode level could provide a workable fast execution engine for what, I think, you are describing.
The double bytecode inlining interpreter would, probably, be faster for normal code than our current engine. Sends are very expensive so removing them should provide a decent speed improvement. It's a design, as Marcus suggested, that is very much in the AOStA spirit, with both a high level bytecode and a low level bytecode.
Of course it would be nice to use a compiler that went all the way down to machine code but it's always best to know that you can succeed without relying on other projects meeting their milestones. A backup plan doesn't need to be executed, it just needs to reduce the total risk. Once everything is done and working together there is no risk left.
How is AOStA going? Is there a decent bytecode level inliner that could be borrowed for other projects?
Bryce Who does hope, and is working to make, Exupery ready before it is needed for your work.
On Jun 7, 2004, at 5:58 PM, Bryce Kampjes wrote:
Colin Putney writes:
Yes, "Parse trees are structured CompiledMethods". I'd love to see Squeak executing ASTs with the hotspots optimized by Exupery.
An AOStA style approach with two levels of byte-code could make a lot of sense. The first bytecode is highlevel, the flattened AST, basically Squeak bytecode without jumps so no inlining of conditional (ifTrue:ifFalse: etc) and loop control flow methods. With a second low level bytecode that's closer to Squeaks current full byte-code, with maybe a type check bytecode to make inlining fast, to inline down to all done at the Smalltalk above bytecode level could provide a workable fast execution engine for what, I think, you are describing.
Well, not quite. I'm actually suggesting that the abstract syntax tree would be executed directly. The interpreter would traverse the tree, updating the state of the activation context as it goes. This would be slower than bytecode interpretation, but not unreasonably so - this is how the Ruby interpreter works, for example.
The double bytecode inlining interpreter would, probably, be faster for normal code than our current engine. Sends are very expensive so removing them should provide a decent speed improvement. It's a design, as Marcus suggested, that is very much in the AOStA spirit, with both a high level bytecode and a low level bytecode.
I just had a bizarre idea. An AST interpreter with PICs!
PICs actually make even more sense in an AST, since all you have to do is point to the method you're inlining. No need to rewrite the send site, and deoptimization is easier.
The dual byte code idea is certainly workable, and would probably be an improvement over the status quo, but I don't think AOStA would be ideal for Squeak. Optimizing bytecode makes a lot of sense in VW, where they have a non-optimizing JIT compiler to native code at the VM level. But let's go Blue Sky here. Let's ignore, for the moment, that the only thing we actually have working right now is a somewhat optimized byte-code interpreter, and imagine that we have the resources to implement whatever we want.
The simplest, cleanest, most OO, way of executing code would be to have the VM execute an AST directly. This eliminates the need for a compiler - a parser is sufficient to translate source into something executable. Now that's kind of slow, so let's optimize the code that gets run frequently. The most effective way would be to compile to native code. Bytecode is a nice compromise, in that it's fairly high-level semantically, and fairly efficient to execute, but if you're going to introduce multiple levels of executable forms, why not differentiate them more?
In general I like the idea of progressive levels of optimization, triggered by an execution counter for each method. The first few rounds of optimization would be tree-based: PICs, dead code removal, whatever can be done at the semantic level that the AST represents. Then at some level, native code generation kicks in, and we generate machine code that reflects the optimizations we've already done. As the execution counter hits new thresholds, we progressively optimize that until we run out of ways to speed things up.
Of course it would be nice to use a compiler that went all the way down to machine code but it's always best to know that you can succeed without relying on other projects meeting their milestones. A backup plan doesn't need to be executed, it just needs to reduce the total risk. Once everything is done and working together there is no risk left.
Well, the backup plan is the status quo. The bytecode interpreter we have is fast enough for most purposes, and we can certainly improve the dev tools, implement slim binaries for distribution, do type inference and so on without changing that.
And frankly, this is all just speculation on what might be possible. My priority is to make OmniBrowser and Monticello ready for production use with features that exploit the AST I've been working on. I figure that will take will take me well into next year. I may get into messing with the interpreter after that.
How is AOStA going? Is there a decent bytecode level inliner that could be borrowed for other projects?
Markus mentioned off list that he had translation to SSA form and back more-or-less working, but I don't think it does inlining.
Bryce Who does hope, and is working to make, Exupery ready before it is needed for your work.
Bryce, please don't mistake my enthusiasm about Exupery for pressure to get it done faster. I know you only have so much spare time, and Exupery is a non-trivial project, to say the least. Not only that, but it's a lot further along than any of the stuff I've been talking about in this thread.
So consider this congratulations and encouragement, rather than pressure.
Colin
On Jun 8, 2004, at 9:15 AM, Colin Putney wrote:
The simplest, cleanest, most OO, way of executing code would be to have the VM execute an AST directly.
I'm not convinced this is true: bytecode is a much simpler interface between the image and the VM than an executable AST would be. I also wouldn't want the VM to depend on even the abstract syntax of Smalltalk, since this would make it much more difficult to experiment with targeting other languages to the Squeak VM - they'd have to be translated to a Smalltalk AST rather than compiled to a neutral bytecode. My "Sorrow" concatenative compiler, for example, would have been much harder to write.
The stack machine is a valuable abstraction, I don't think we want to throw it away.
Avi
On Jun 8, 2004, at 11:38 AM, Avi Bryant wrote:
On Jun 8, 2004, at 9:15 AM, Colin Putney wrote:
The simplest, cleanest, most OO, way of executing code would be to have the VM execute an AST directly.
I'm not convinced this is true: bytecode is a much simpler interface between the image and the VM than an executable AST would be. I also wouldn't want the VM to depend on even the abstract syntax of Smalltalk, since this would make it much more difficult to experiment with targeting other languages to the Squeak VM - they'd have to be translated to a Smalltalk AST rather than compiled to a neutral bytecode. My "Sorrow" concatenative compiler, for example, would have been much harder to write.
The stack machine is a valuable abstraction, I don't think we want to throw it away.
Yeah, that probably should have read "way of executing *Smalltalk* code."
I agree that the stack machine is a valuable abstraction, and bytecode forms a simple and general interface between the image and VM, which also has value.
However, it's not clear to me that an interpreter for an AST is any more complicated than a bytecode interpreter. And looking strictly at the image, I do think my assertion is correct - executable ASTs are simpler and more object oriented. So I'll take your comment to mean that there other considerations than simplicity in the image - decoupling the image from the VM is valuable as well. Agreed.
Colin
On 28 May 2004, at 03:17, Tim Rowledge wrote:
"Lex Spoon" lex@cc.gatech.edu wrote:
The major memory requirement, it turns out, is that AutoChuck is keeping around parse trees for every method. So I'm trying to eithre get rid of that requirement or make the parse trees take less memory.
It might be useful to recall one of Eliot's battle cries - "CompiledMethods are flattened parsetrees"
But I don't want parse-trees. And I don't want them flattened, either ;-) Which would explain why I don't want CompiledMethods...
Marcel
I have just a suggestion : could this AutoChuck feature be turned on only on certain packages, such as Kernel and not Morphic for example ? This could spare some time and quit a bit of space ...
I'm working on the space requirements, but wanted to go ahead and post the system as is for the diehards. Memory usage is not the only area that is lacking I'm afraid!
Right, it's always best to get early feedback.
The major memory requirement, it turns out, is that AutoChuck is keeping around parse trees for every method. So I'm trying to eithre get rid of that requirement or make the parse trees take less memory.
The particular idea of caching part of the system seems difficult to implement. If you ask what the type of Morph>>width is, it will end up tracing into methods in Rectangle, which will in turn trace into methods in the numeric hierarchy. So what parts do you *really* need if you just want to analyze morphic stuff? It's hard to tell.
Maybe just being lazy could work ... But I was rather thinking of splitting the images into chunks, using PAckageInfo for example, and maybe include package dependency information (I don't know how far TFNR went on this), so that you could use dependencies to analyse packages one at a time (assuming Kernel doesn't depend on someting else ...)
Romain Robbes rrobbes@info.unicaen.fr wrote:
A while ago I started (I mean, played 3 hours ;-)) a little program to store type information which could then be used by several tools, such as a type inferencer, an the toy I was doing, which was collecting run-time types, so they could "collaborate". This could then be used by a code completion system, or various navigation systems (how about clicking on a variable and being automagically teleported to it's class ?).
Oh yes, this is a great tool too, and it complements what Chuck does.
Chuck examines the program and gives you an upper bound on what the types might ever be; your tool examines executions and thus finds a *lower* bound on the types. Your tool is, I'm sure, consistently fast; on the other hand, Chuck will find a good many answers even in code that has yet to be executed. This is nice if you have not yet figured out how to run the code you are studying!
Does that mean you don't program test-first ;-) ? Seriously, do you think, several tools could collaborate using a common knowledge base (I guess I should check Chuck ASAP) ?
Also, upper bounds on the types have different applications in downstream tools such as compilers and dead code removers; a lower bound can inform the compiler that a case is worth optimizing (e.g. this method really does get run with FloatArray's), and an upper bound can let the compiler *avoid* dealing with certain cases (e.g., this message send can only ever invoke one particular method, so inline it).
This kind of optimizations seems great, but also harder to do in Squeak, where a method could be added anywhere at any time ... but there is sure a lot of work to do in this field ...
Romain
Lex
"rrobbes" rrobbes@info.unicaen.fr wrote:
But I was rather thinking of splitting the images into chunks, using PAckageInfo for example, and maybe include package dependency information (I don't know how far TFNR went on this), so that you could use dependencies to analyse packages one at a time (assuming Kernel doesn't depend on someting else ...)
It's an interesting idea but I won't be going down this path any time soon. A huge question is the same one I mentioned in Stephane's talk of type annotations: are the annotations trustworthy? I don't think that automatic dependencies will help more than what Chuck is already doing. However, it may well be possible to come up with a module system that does
Keep in mind that even though modules are *statically* disjoint, they may end up holding lots of pointers to each other *dynamically*. OrderedCollection's hold references to all kinds of things *dynamically*, even though statically the OrderedCollection class might not have many external references. A type inferencer is trying to predict this dynamic behavior, so it may be interested in these dependencies that might not visible as normal package inter-dependencies.
Does that mean you don't program test-first ;-) ? Seriously, do you think, several tools could collaborate using a common knowledge base (I guess I should check Chuck ASAP) ?
I use test-first but not all code I might download from the net does.
Anyway, *humans* can certainly use multiple sources of information. And tools can be written to provide an integrated display of all the information they can derive. There is nothing stopping a browser from saying "Integer < x < Number++" in order to show both lower and upper bounds that it has found through different mechanisms.
Chuck does have a use for knowing some dynamic types, by the way, because it can use them as a starting point. It turns out that it is good to start optimistic and then make the types progressively worse; if one already has found dynamic types, then one may as well start there instead of starting with the "Nothing" type.
This kind of optimizations seems great, but also harder to do in Squeak, where a method could be added anywhere at any time ... but there is sure a lot of work to do in this field ...
Yeah, whenever the code changes you have to throw away a lot of your optimizations. This adds some challenges.
-Lex
squeak-dev@lists.squeakfoundation.org