[Seaside] htmlcssparser package/discovering size of objects

Wed Jul 30 14:38:52 UTC 2008

Hello everyone, a slightly involved and multi-part question:
I'm using the package at http://www.squeaksource.com/htmlcssparser (HTML/CSS
Parser, or "the parser") to scrape multiple pages (in fact about two or
three a day, and about a thousand existing pages), so I can extract parts of
them to put into an rss feed. If I let the root object for a parse (the
Validator's dom object) be garbage collected, none of the rest of the parse
tree really works (because then other objects only referred to weakly get
collected, AFAICT).

So, my first question is whether there's a way to assess what kind of memory
overhead there would be for keeping each of these objects hanging around
indefinitely.
My second is whether anyone has any advice for another way to do it - by
using a different parser, or by copying the data into a different structure
somehow, or something else.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/seaside/attachments/20080730/80d3713e/attachment.htm