<div dir="ltr">Hello everyone, a slightly involved and multi-part question:<br>I'm using the package at <a href="http://www.squeaksource.com/htmlcssparser">http://www.squeaksource.com/htmlcssparser</a> (HTML/CSS Parser, or "the parser") to scrape multiple pages (in fact about two or three a day, and about a thousand existing pages), so I can extract parts of them to put into an rss feed. If I let the root object for a parse (the Validator's dom object) be garbage collected, none of the rest of the parse tree really works (because then other objects only referred to weakly get collected, AFAICT).<br>
<br>So, my first question is whether there's a way to assess what kind of memory overhead there would be for keeping each of these objects hanging around indefinitely.<br>My second is whether anyone has any advice for another way to do it - by using a different parser, or by copying the data into a different structure somehow, or something else.<br>
</div>