[squeak-dev] Re: HTML parser (again) (again)

Peter Kenny peter at pbkresearch.co.uk
Sat Oct 30 16:22:54 UTC 2010



Sean P. DeNigris wrote:
> 
> 
> Peter Kenny wrote:
>> 
>> It will parse a string of HTML from your system, using the on: method,
>> but this falls over if the HTML is a downloaded web page including a
>> reference to CSS on the web. There may be a way round this, but I haven't
>> found it.
>> 
> Of course this is what i needed to use it for, lol.
> 

Sean

Because this is obviously a deal-breaker for you, I have looked a bit more
closely at the effect of the on: method. My comments may have been coloured
by my earlier experiences with my Dolphin adaptation; I have only recently
started experimenting with the parser on Squeak/Pharo, and I have not tried
it as widely. I have today experimented on Pharo (I'm sure Squeak would show
the same), and I can say that my statement above is too strong. The parser
*may* fall over in some circumstances, but it will work in many cases.
Specifically, it should work OK if any CSS references in the text are to a
full absolute URL (i.e. everything from the http: onward). This makes sense
to me; if it is a relative address, the parser would not know the absolute
root to base it on.

I think it would be worth your while to try out some of your HTML strings
with '(HtmlValidator on: aString) dom' and see if they work. If you are
using relative addresses for CSS files, it could be worth while editing them
to the full version just to get it to parse.

BTW, if you think of trying the parser in Pharo, note that you will first
have to patch Pharo as suggested in
http://code.google.com/p/pharo/issues/detail?id=2797.

As to your point on documentation, I agree that it is useful to give a basic
pointer to how to use the package. My Dolphin version of the parser has a
package comment which quotes the full description from Squeaksource and adds
a pointer to the HtmlValidator class comment. There does not seem to be the
same adoption of package comments with Squeak packages; the ones I have seen
seem to be just a few words explaining what is changed in this version. Is
there any way to give an extended description in a Squeak package?

Hope this helps.

Peter Kenny


-- 
View this message in context: http://forum.world.st/HTML-parser-again-again-tp3018595p3020429.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.



More information about the Squeak-dev mailing list