[squeak-dev] Re: HTML parser (again) (again)

Peter Kenny peter at pbkresearch.co.uk
Fri Oct 29 13:46:14 UTC 2010



Sean P. DeNigris wrote:
> 
> All the threads in the mailing list seem to die off unresolved.  What are
> the options available in current Squeak, and what are the differences?
> 
> 2. HTML & CSS Validating Parser (Squeaksource) - It loads, but I don't
> have the slightest clue how to use it.  I found references to people using
> it.  They must be Alan Kay's close relatives, or live in machine world
> like the Lawnmower Man because I couldn't find a shred of documentation or
> even one class that looked plausible as a starting point.
> 
> 
Sean

I can't tell you about the others, but in my opinion this is the most
brilliant parser ever. The starting point is the class HtmlValidator; read
the class comment to see how to begin it. Maybe the clue is meant to be in
the name; this is a *validating* parser.

Just a couple of points:

1. It will work fine if you are loading from the web; it will load any
relevant CSS and take it into account. This uses the onUrl: method. It will
parse a string of HTML from your system, using the on: method, but this
falls over if the HTML is a downloaded web page including a reference to CSS
on the web. There may be a way round this, but I haven't found it.

2. It will fail in the current Squeak 4.1, because this version has fouled
up the concatenation of strings. I have been arguing with the Squeak
maintainers that what they have done is nuts, but they are sticking with
their changes. The details are on Mantis issue no.7564, if you have access
to that. There are two possible work rounds:

a. If you have access to Mantis
(http://bugs.squeak.org/view_all_bug_page.php), go to the details of issue
7564, download the change set posted by Andreas Raab and file it into Squeak
4.1.

b. Having loaded the parser, find the method HtmlDOMNode>>parseContents: and
edit it as follows: find the two occurrences of the expression ('/',
Character separators) and change each of them to (Character separators,
'/'). I know each of these is valid Smalltalk and they should have the same
effect, but in Squeak 4.1 they don't; that's why I say it's nuts.

Your rant may have some validity, but we must be realistic; if you have
written a package, you know how to use it, and writing detailed instructions
for someone else is a pain. Todd Blanchard wrote this in about 2006, and it
has been on Squeaksource ever since. I downloaded it then, adapted it to run
in Dolphin Smalltalk, and have used it ever since.

Any questions, ask again.

Peter Kenny


-- 
View this message in context: http://forum.world.st/HTML-parser-again-again-tp3018595p3019125.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.



More information about the Squeak-dev mailing list