RSS Reader in 10 lines of code

James Robertson jarober at gmail.com
Tue Oct 4 13:08:31 UTC 2005


It's even worse than that, Markus.  There are often character encoding 
issues, and illegal characters to deal with.  The bottom line - you can't 
use a fully strict parser and expect to deal with syndicated content.  I've 
done a fair bit of work in this area in BottomFeeder...

<snip>

>It's even worse: Inside the <description> there can not be any HTML,
>as RSS is not a superset of HTML.
>RSS readers are very forgiving (nobody checks the DTD, and they even
>tolerate non-well formed XML in
>many cases).
>
>But in general, to make this really work, the HTML inside the RSS
>needs to be encoded as CDATA:
>
><description>
><![CDATA[
>
>now the html
>
>]]>
>
>    Marcus

<Talk Small and Carry a Big Class Library>
James Robertson, Product Manager, Cincom Smalltalk
http://www.cincomsmalltalk.com/blog/blogView




More information about the Squeak-dev mailing list