<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Well, as the perpetrator of that bit of hackery, I can certainly explain why it gets broken if you let the head object go away.<div><br><div>A node knows its parent through a weak reference, and its offset/length in the original parsed string. &nbsp;The top object owns the parsed string.</div><div><br></div><div>When a node tries to print itself it traverses the parents to get the original text buffer and then takes the appropriate substring out of it and prints that.</div><div>This was really useful during debugging since I could see exactly what hunk of text each node thought it represented (especially since the nodes parse themselves). &nbsp;Reprinting the document should reproduce the original text buffer or something is wrong somewhere. &nbsp;So that makes for a cheap and cheerful integrity check.</div><div><br></div><div>Anyhow, it is possible that making the parent weak was perhaps not a great choice but it was meant to make some DOM editing operations easier in the future (anticipating possible javascript integration).</div><div><br></div><div>Two fixes/workarounds. &nbsp;Either never let go of the root, or change the parent code in parsed node to use strong references. &nbsp;It amounts to the same thing.</div><div><br></div><div><br><div><div>On Jul 30, 2008, at 7:38 AM, Marcin Tustin wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div><div dir="ltr">Hello everyone, a slightly involved and multi-part question:<br>I'm using the package at <a href="http://www.squeaksource.com/htmlcssparser">http://www.squeaksource.com/htmlcssparser</a> (HTML/CSS Parser, or "the parser") to scrape multiple pages (in fact about two or three a day, and about a thousand existing pages), so I can extract parts of them to put into an rss feed. If I let the root object for a parse (the Validator's dom object) be garbage collected, none of the rest of the parse tree really works (because then other objects only referred to weakly get collected, AFAICT).<br> <br>So, my first question is whether there's a way to assess what kind of memory overhead there would be for keeping each of these objects hanging around indefinitely.<br>My second is whether anyone has any advice for another way to do it - by using a different parser, or by copying the data into a different structure somehow, or something else.<br> </div> _______________________________________________<br>Beginners mailing list<br><a href="mailto:Beginners@lists.squeakfoundation.org">Beginners@lists.squeakfoundation.org</a><br>http://lists.squeakfoundation.org/mailman/listinfo/beginners<br></div></blockquote></div><br></div></div></body></html>