[squeak-dev] HTML parser (again)

Andrei Stebakov lispercat at gmail.com
Wed Aug 18 05:50:36 UTC 2010


I've been looking for a nice and fast HTML parser.
I've found Zulq Alam's Soup
(http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
but it's way too slow for me (takes 5 sec to parse the page, my
current lisp parser takes about 1 sec for that.)
I found another one, Todd Blanchard's HTML and CSS parser
(http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
couldn't load it into Pharo 1.1 or Squeak 4.1.
It complains about some syntax error and leaves the progress bar which
I can't kill...
I wonder if anyone (Todd?) can take a look at the parser and figure
out how to fix it?

What other options I have for an HTML parser?
Looking at Pharo speed I wonder if there is any way to optimize it? Is
JIT or some other speed optimization in plans for Pharo/Squeak?

Thank you,
Andrei



More information about the Squeak-dev mailing list