[ANN] HTML and CSS Parser on SqueakSource

Todd Blanchard tblanchard at mac.com
Sat Apr 15 05:39:20 UTC 2006


I've released the underlying technology behind http:// 
www.badpage.info and placed it on squeaksource.

http://www.squeaksource.com/htmlcssparser

Project Description
This is an HTML and CSS parser and DOM that handles rotten HTML and  
broken CSS quite well. I wrote it to provide validation of web pages  
and it is the underlying technology behind http://www.badpage.info.  
The tag nesting and attribute rules are determined by interpreting  
the DTD's at the W3C. Hopefully this will make it fairly future  
proof. The CSS parser understands most of CSS 2 and some CSS 3 and  
the CSS selectors can tell if they match a DOM node. There is no  
visual rendering and no calculation of layout.

I hearby license it free for almost any use with the understanding  
that it may not be used to provide website QA software or services  
such as might compete with http://badpage.info.

Otherwise, do whatever you like with it. I think it would make a  
dandy base for a real web browser. I also find it quite useful for  
scraping web pages.
-----
SqueakMap is not presently responding to requests to send me a new  
password and I can't remember my old one.  When it regains its  
senses, I'll put it up there as well.

-Todd Blanchard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20060414/1232ee1e/attachment.htm


More information about the Squeak-dev mailing list