[squeak-dev] Interesting retrieval error with XMLDOMParser

gettimothy gettimothy at zoho.com
Thu Sep 23 13:21:02 UTC 2021


Hi Marcel



Thank you for your time.



Bear with me, I am a bit slow on the uptake...



I found where the Squeak and Pharo differ as I am debugging side-by-side. (btw, 'self break' does not appear to work in Pharo, but it does throw an error, so it is at least an inadvertant break point!)



In 



DTDExternalEntityResolver >>resolveNonLocalExternalEntityURI: https://w1.weather.gov/xml/current_obs/index.xml upToLimit:nil






the anXMLURIOrURIString asXMLHTTPRequest return two diffent things.



The squeak returns a XMLHTTPWebClientRequest while the pharo returns a XMLHTTPZincRequest



This means the problem lays somewhere in the XMLHTTPWebClientRequest.



So.....





I am going to reread your changeset and then see if I can fix the XMLHTTPWebClientRequest.





ok...here is what I see as a problem with your changeset proposal bypasses the progression towards the DTDExternalEntityResolver  where the correct branch to the ZincRequest is made for pharo . so that changeset on pharo would break pharo.



imho, it merits to poke around in XMLHTTPWebClientRequest and see if I can fix that.



cheers.






...









---- On Thu, 23 Sep 2021 08:59:58 -0400 Marcel Taeumel <marcel.taeumel at hpi.de> wrote ----



Hi --



Not sure, which error your mean. Well, I had this little "contentStream" typo. This is the right call:



(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml') content.



Yet, I understand your "interop" concerns.



Best,

Marcel



Am 23.09.2021 14:29:42 schrieb gettimothy <mailto:gettimothy at zoho.com>:

Hi Marcel.



It throws a Mal Formed Error on Squeak and on Pharo, the dependency on WebClient is not respected.



Thank you for trying, I appreciate the effort.



I am going to look deeper into the original error and see if I can fix it there.



The reason interop is important, is because it makes the Help portable and  I want to turn my work into a Pillar booklet as a first effort towards a full book on using the XML package on Squeak or Pharo.



Thanks again.



t











---- On Thu, 23 Sep 2021 04:59:19 -0400 Marcel Taeumel <mailto:marcel.taeumel at hpi.de> wrote ----



Hi --



It would be nice to have a streamlined support for loading XML documents from URLs. I think that WebClient would be the way to go for HTTP downloads:

(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml') contentStream



"XML-Parser" currently depends on "Files" because of SAXHandler class >> #parserOnFileNamed:. A dependency on "WebClient" might not hurt ... maybe through the "Tools" package? Hmmm...



Well, I don't like the class-side interface of SAXHandler and XMLDOMParser. Instead, I prefer the reader/writer notion like this:



XMLDOMParser onFileNamed: '...'

XMLDOMParser onURL: '...'



Considering the actual result, I like the interface of Form (for PNGs, JPEGs, etc.):



XMLDocument fromFileNamed: '...'

XMLDocument fromURL: '...'



Please find attached a change set that makes the proposed interface work in Squeak Trunk.



Best,

Marcel





Am 22.09.2021 22:30:08 schrieb gettimothy via Squeak-dev <mailto:squeak-dev at lists.squeakfoundation.org>:

Hi Folks,



No biggie, just interesting...





    |tree url |

    url := 'https://w1.weather.gov/xml/current_obs/index.xml'.

    tree := (XMLDOMParser on:  (HTTPLoader default retrieveContentsFor: url) contents

    ) parseDocument.

    tree explore.



(XMLDOMParser onURL: 'https://w1.weather.gov/xml/current_obs/index.xml') parseDocument; explore.  "throws an error"







The first form works on squeak.

The latter form works on pharo.

The latter does NOT Squeak because the XMLHTTPException: Forbidden (403) error gets thrown.



It is interesting that a platform difference would cause that error.



No biggie, but I would like to include a common retrieval method for both platforms in the XPathHelp (May turn it into a book down the road,too)



cheers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210923/78fc7cd4/attachment.html>


More information about the Squeak-dev mailing list