Squeak as Metaverse reminds me of something concrete...

Raab, Andreas Andreas.Raab at disney.com
Wed Jul 19 02:36:17 UTC 2000


Tom,

Interesting idea. I think that there's one major difference between XML and
Squeak ImageSegments here. XML is a "well defined" (yeah, sort of)
externalized description of objects. ImageSegments (in particular) are a
partial object memory.  The "largest" difference (read that literally) is
that XML is 90% overhead. So I think that rearranging groups of objects
mainly helps compressing the overhead better :) 

I'm omitting tons of repeated tags here where it's an obvious thing why
rearranging helps, but another interesting thing to note about XML is that
it uses limited byte ranges so it's a good idea to group objects that use
the same ranges closely together to improve compression, such as textual
representation of Integers or Floats - remember, SmallInteger maxVal (the
max value we can represent in 4 bytes) is 1073741823 (10 bytes). Although
this does not count the cases where a printed representation is smaller than
the Squeak object representation I think it gives you an idea of where (I
believe :) the xmill compression improvements come from.

As for Squeak and ImageSegments, you have to keep in mind that most objects
in ImageSegments are just OOPs (that is rather arbitrary 32bit pointers)
which have to be kept pointing to the right objects. While this may seem a
maintenance issue at first it is a critical thing for finding out whether
rearranging is likely to help. If you can for instance increase object
locality (that is keep pointer references so that they span 'minimal'
distances) compression will improve (more zeros in the 32bit pointers).

On the other hand, if you can't improve locality and have large numbers of
pointers it's unlikely that compression improves by any vertical alignment.
In this case it should be much better to arrange by the *class* of the
object. Why?! Because all objects that are written have a header and the
header points to the class so that if you store all instances of a class
together you'll end up having more (local) redundancy (e.g., better
compression for any LZ compressor).

Oh well, there is so much stuff one can do when it comes to compression ;-)
I'll just stop the rant here...

  - Andreas

> -----Original Message-----
> From: Tom Morgan [mailto:tmorgan at acm.org]
> Sent: Tuesday, July 18, 2000 6:15 PM
> To: squeak at cs.uiuc.edu
> Cc: recipient list not shown
> Subject: Squeak as Metaverse reminds me of something concrete...
> 
> 
> The recent 'Squeak as Metaverse' posting made me think of something
> that might be useful for the image segmentation/project sharing over
> the network.
> 
> It will take me a minute to get where I am going.
> 
> I have been experimenting with something called 'xmill',
> which does a remarkable job of squishing XML documents.
> It offers a very nice processing time vs space trade off.
> 
> See:
> 
>   http://www.research.att.com/sw/tools/xmill/
> 
> I think the trick(s) it uses are potentially applicable to
> compressing image segments for network transmission.
> 
> The first thing that comes to mind to make things flow
> fast over the network is to gzip the blob that represents
> the segment.
> 
> In effect, the compression would go 'horizontally' across
> the bits of the image, encountering object pointers,
> manifest data and what not in whatever order.
> 
> The possibly applicable trick from 'xmill' is to use the
> fact that we are sitting on an object graph here, and
> to use the graph to build clusters of data, 'vertically' so to speak,
> putting all of the occurences of similar instance variables 'next
> to each other', and applying gzip or another suitable compressor to
> each 'vertical' column of object data.  (Sketch attached). It would be
> possible to have object class specific compressors to make 
> for even better
> compression.
> 
> For example, delta encoding should work well for oops for anything
> other than an enormous segment.  For lots of examples, the
> 'vertically' arranged data has more nearly alike bits, closer
> to each other.
> 
> The processor vs network time trade offs would all have
> to be thought through, but my money is on having
> more cycles than bandwidth for the forseeable future.
> 
>    ...Tom M
> 





More information about the Squeak-dev mailing list