[squeak-dev] Interesting retrieval error with XMLDOMParser

Marcel Taeumel marcel.taeumel at hpi.de
Thu Sep 23 08:59:19 UTC 2021


Hi --

It would be nice to have a streamlined support for loading XML documents from URLs. I think that WebClient would be the way to go for HTTP downloads:
(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml') contentStream


"XML-Parser" currently depends on "Files" because of SAXHandler class >> #parserOnFileNamed:. A dependency on "WebClient" might not hurt ... maybe through the "Tools" package? Hmmm...

Well, I don't like the class-side interface of SAXHandler and XMLDOMParser. Instead, I prefer the reader/writer notion like this:

XMLDOMParser onFileNamed: '...'
XMLDOMParser onURL: '...'


Considering the actual result, I like the interface of Form (for PNGs, JPEGs, etc.):

XMLDocument fromFileNamed: '...'
XMLDocument fromURL: '...'

Please find attached a change set that makes the proposed interface work in Squeak Trunk.

Best,
Marcel

Am 22.09.2021 22:30:08 schrieb gettimothy via Squeak-dev <squeak-dev at lists.squeakfoundation.org>:
Hi Folks,


No biggie, just interesting...



    |tree url |

    url := 'https://w1.weather.gov/xml/current_obs/index.xml [https://w1.weather.gov/xml/current_obs/index.xml]'.

    tree := (XMLDOMParser on:  (HTTPLoader default retrieveContentsFor: url) contents

    ) parseDocument.

    tree explore.


(XMLDOMParser onURL: 'https://w1.weather.gov/xml/current_obs/index.xml [https://w1.weather.gov/xml/current_obs/index.xml]') parseDocument; explore.  "throws an error"



The first form works on squeak.

The latter form works on pharo.

The latter does NOT Squeak because the XMLHTTPException: Forbidden (403) error gets thrown.


It is interesting that a platform difference would cause that error.


No biggie, but I would like to include a common retrieval method for both platforms in the XPathHelp (May turn it into a book down the road,too)


cheers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210923/48266f74/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: xml-parsing.1.cs
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210923/48266f74/attachment.ksh>


More information about the Squeak-dev mailing list