Colin Putney cputney@wiresong.ca wrote:
Yes, I've been thinking about this as well. There's an interesting paper on compressing syntax trees using the abstract grammar as a statistical model, which I'm going to implement at some point. http://www.ics.uci.edu/~cstork/ire2001.pdf. It's designed so that most of the work is done by the compressor - decompression is fast.
Sounds neat. Decompression speed is very important if you are doing it to save space in an image.
The first step would be to create a binary format for distributing Squeak code that would be faster to load than fileOuts - ie, decompress the AST and generate byte code without having to do any parsing. Beyond that would be modifying the interpreter to execute the binary format directly.
Note that there are differences between compressing for distribution and compressing to save space in an image. In the former case you need to go all the way down to bytes, but in the latter you can choose to use object pointers in some places. One place this would likely help is in encoding symbols; you can store #nextPut: as a 4-byte object pointer instead of an 8+ byte string "n e x t P u t :".
-Lex