Cheap updates

Richard A. O'Keefe ok at atlas.otago.ac.nz
Tue Jun 6 00:07:06 UTC 2000


A recent issue of SIGMOD Record had an article about XMill,
a compressor for XML.  It gets rather better compression than
gzip does, despite being built atop gzlib.

How come?

To the extent that I understand it, by splitting the input stream into
tree and contents, then partitioning the contents by context, then
compressing the contextualised contents with gzlib.  The key idea here
is that similar parts of the tree can be moved together to compress
better.

While FileOuts and ChangeSets are not XML, they are none-the-less
structured, and it is plausible that e.g. splitting structure, method
bodies, and comments and compressing them separately might do very well.

What would happen, for example, if the compression dictionary were
"primed" with class and instance variable names before compressing
each method?





More information about the Squeak-dev mailing list