New SqueakMap on the air... and we got problems Houston!

Wed Apr 5 07:52:32 UTC 2006

Hi Andreas!

Andreas Raab <andreas.raab at gmx.de> wrote:
> goran at krampe.se wrote:
> >> How about some sort of well-formed output, like (dare I use the word?) 
> >> XML?
> > 
> > Hehe, well, I did consider that (a year or two back) but avoided it
> > because I didn't want to take on the "luggage" that serialization code
> > is (to be maintained for shape changes etc when the model evolves). Back
> > in the beginning of SM I used Smalltalk as format and it was cumbersome
> > to maintain it - but of course, then I also used an incremental model
> > which made it even more complex.
> 
> Interesting arguments. In my experience, maintaining shape changes with 
> image segments gets very hard very quickly.

Yes, I agree - in *general*. But in the case of SM it has been different
because I didn't have the issue of having to be able read *old*
ImageSegments into new code. I only have the issue of being able to read
ImageSegments into the same code that produced it, *but* living in
different Squeak versions.

But if I use my own serialization format then I need to maintain that
code when the model evolves. Using ImageSegments I haven't had to do
that, since there is no code to maintain.

> Utilizing well-defined 
> external formats has (again, in my experience) proven to be much more 
> robust when it comes to changes.

Yes, again I agree *in general*, especially if you take care and design
the format so that it doesn't reflect the actual
implementation/representation you use in your code. But in this case,
given how it works, it doesn't matter. Until now of course when the
Squeak versions have diverged so much that the leaf base classes my
domain model uses have changed.

> Updating incrementally is certainly 
> something that's a bit non-trivial, but if you look at the data records 
> generated I think it'd be straightforward to merge them incrementally 
> (discounting deletions but there are ways to deal with that, too).

Well, with the plan to actually store SMAccounts separately we will not
need to go the incremental route to get good performance (bandwidth
performance that is - there are quite a few people on modems that don't
like the ~1Mb download penalty when updating map) - when updating we
will only need to download those accounts that have changed their
content - and it will at a given point in time be very few.

> >> I've attached a quick hack (well, okay it's been about an hour of 
> >> work ;-)
> > 
> > Don't you have better things to do? Like getting Crocodile out the door?
> > ;) I can't help wondering why you ended up coding on this? Just curious.
> > :)
> 
> Oh, I just needed an hour of clean, mindless fun to relax from some of 
> the harder things I'm working on ;-) Usually I play a game of mine 
> sweeper but writing some XML exporter code works just as well. And 
> besides, I *do* think this is an important issue and I don't think SM 
> should be 3.8+ only for a reason as simple to solve as this.

I agree.

> > Mmmm, let me say that I would like to hear more views on this issue
> > before deciding for the future. As I said XML has been on my radar but
> > my main idea had actually been to use SmartRefStreams and not just a
> > single one but instead "split" the model into accounts and enable them
> > to be stored on multiple servers (along the lines previously discussed
> > enabling personal, department, company and global wide maps etc).
> > 
> > But on the other hand - SmartRefStream and friends is probably just as
> > sensitive to these things as ImageSegment is. So...
> 
> SmartRefStream is less sensitive to shape changes than image segments 
> but it would be affected by the Byte vs. WideString problem in the same 
> way (I think; with some retrofitting of the earlier Squeak versions you 
> also may be able to change it to read ByteString and do something 
> sensible about WideString but that seems a harder problem than just 
> using a well-defined format).

Yes, a harder problem when the base classes evolve - but it would still
be less pain to maintain otherwise. What I mean by that is that the
manually written XML saving/loading code needs to be maintained when the
SM model evolves. But using a non-manual serialization scheme would not
need maintenance for that.

But as you say, when the base classes evolve like this then a manual
approach is simpler to adapt. I wonder how the various automatic XML
seralizers we have could cope with this situation - if they have hooks
and/or if they use "reasonable code" to instantiate base classes then we
might be able to use one of those. This one comes to mind:

	http://map.squeak.org/packagebyname/sixx

Hmmm, well, no - not very goood out of the box at least. :) It looks
kinda neat but... it generated 19Mb of XML for the map, which compressed
to 900kb. ;) And no, it is too low level and verbose at the same time
for my taste.

> >> PPS. If you're interested in a solution like this I should be able to 
> >> spend another hour or two on the read back side as well.
> > 
> > Let us see what others say in the community but yes, after typing this
> > email I am probably leaning towards this route - even though it "hurts"
> > going back to coding these things - Magma or ImageSegments etc are so
> > darn nice (when they work). ;)
> 
> Sure. And I have no problem at all throwing that solution away. But I 
> thought it'd be interesting to see what the speed/space tradeoffs are 
> since the classic discussion goes along the lines of "oh, but it's so 
> large and so slow"

No, I have never considered "large and slow" to be a problem with XML
when *I* am in control of the XML. :) But as SIXX shows it *can* be
large and slow. I have built quite advanced XML export/imports in Java
(for saving loading newspaper pages in a DTP program similar to QXPress
written in Java) so I know how to do it.

> and I wanted to be able to see if that's true (it is 
> not - with the three most obvious optimization applied space goes down 
> to 600k and speed to 3secs and that is roughly on par with the current 
> format).

Yes, good. Ok, send me your latest code and I will start working on
this. Having peeked at your code I might also want to generalize this a
bit and make it slightly more high level too (It looks to me there is a
bit of code redundancy going on when it comes to serailizing the
collections).

> Cheers,
>    - Andreas

regards, Göran