[Newbies] Html Parser
bert at freudenbergs.de
Mon Oct 11 17:54:22 UTC 2010
On 09.10.2010, at 12:27, Levente Uzonyi wrote:
> On Sat, 9 Oct 2010, Sayth Renshaw wrote:
>> I was wondering if there was a html parser for squeak. I want to
>> capture data from website and then convert these to xml and export
>> into an excel program I have.
>> Is this possible in squeak?
> Yes it is, we are using Soup (http://www.squeaksource.com/Soup.html ) to parse html files. It's pretty good, though not perfect. There are also 2-3 other html parsers for Squeak. We're using this one because it's designed to be able to parse not standard compilant html files (which are very common) The tools for xml building are in the Squeak image, look for XMLNode and it's subclasses (XMLDocument, XMLNodeWithElements, XMLString, etc).
Oh great, I had no idea there was a Beautiful Soup port for Squeak. It's excellent for scraping web pages.
- Bert -
More information about the Beginners