[squeak-dev] Interesting retrieval error with XMLDOMParser
gettimothy
gettimothy at zoho.com
Thu Sep 23 14:20:44 UTC 2021
Ok....
I believe this is the problem.
Set a breakpoint In WebClient sendRequest: request contentBlock: contentBlock
for the simple
(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml')
the WebRequest object looks like this:
WebRequest(GET /xml/current_obs/index.xml HTTP/1.1
User-Agent: WebClient/1.5 (WebClient-Core-dtl.127; Squeak6.0alpha-20633; unix)
Accept-Encoding: gzip
Host: w1.weather.gov
)
For the
(XMLDOMParser onURL: 'https://w1.weather.gov/xml/current_obs/index.xml' upToLimit:nil)
the WebRequest looks like this:
WebRequest(GET /xml/current_obs/index.xml HTTP/1.1
Host: w1.weather.gov
)
looking a little deeper, the headers are different...
So...will try to figure out where that is set in one and not the other
---- On Thu, 23 Sep 2021 08:59:58 -0400 Marcel Taeumel <marcel.taeumel at hpi.de> wrote ----
Hi --
Not sure, which error your mean. Well, I had this little "contentStream" typo. This is the right call:
(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml') content.
Yet, I understand your "interop" concerns.
Best,
Marcel
Am 23.09.2021 14:29:42 schrieb gettimothy <mailto:gettimothy at zoho.com>:
Hi Marcel.
It throws a Mal Formed Error on Squeak and on Pharo, the dependency on WebClient is not respected.
Thank you for trying, I appreciate the effort.
I am going to look deeper into the original error and see if I can fix it there.
The reason interop is important, is because it makes the Help portable and I want to turn my work into a Pillar booklet as a first effort towards a full book on using the XML package on Squeak or Pharo.
Thanks again.
t
---- On Thu, 23 Sep 2021 04:59:19 -0400 Marcel Taeumel <mailto:marcel.taeumel at hpi.de> wrote ----
Hi --
It would be nice to have a streamlined support for loading XML documents from URLs. I think that WebClient would be the way to go for HTTP downloads:
(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml') contentStream
"XML-Parser" currently depends on "Files" because of SAXHandler class >> #parserOnFileNamed:. A dependency on "WebClient" might not hurt ... maybe through the "Tools" package? Hmmm...
Well, I don't like the class-side interface of SAXHandler and XMLDOMParser. Instead, I prefer the reader/writer notion like this:
XMLDOMParser onFileNamed: '...'
XMLDOMParser onURL: '...'
Considering the actual result, I like the interface of Form (for PNGs, JPEGs, etc.):
XMLDocument fromFileNamed: '...'
XMLDocument fromURL: '...'
Please find attached a change set that makes the proposed interface work in Squeak Trunk.
Best,
Marcel
Am 22.09.2021 22:30:08 schrieb gettimothy via Squeak-dev <mailto:squeak-dev at lists.squeakfoundation.org>:
Hi Folks,
No biggie, just interesting...
|tree url |
url := 'https://w1.weather.gov/xml/current_obs/index.xml'.
tree := (XMLDOMParser on: (HTTPLoader default retrieveContentsFor: url) contents
) parseDocument.
tree explore.
(XMLDOMParser onURL: 'https://w1.weather.gov/xml/current_obs/index.xml') parseDocument; explore. "throws an error"
The first form works on squeak.
The latter form works on pharo.
The latter does NOT Squeak because the XMLHTTPException: Forbidden (403) error gets thrown.
It is interesting that a platform difference would cause that error.
No biggie, but I would like to include a common retrieval method for both platforms in the XPathHelp (May turn it into a book down the road,too)
cheers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210923/a79eae13/attachment.html>
More information about the Squeak-dev
mailing list
|