[squeak-dev] Interesting retrieval error with XMLDOMParser (solution found, needs guru attention)

gettimothy gettimothy at zoho.com
Thu Sep 23 17:16:21 UTC 2021

I think I found a fix.


(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml')

httpGet: calls httpGet: do:  where we see the WebClient setting paramaters for the WebRequest thing in bold below.

httpGet: urlString do: aBlock

"GET the response from the given url"

"(WebClient httpGet: 'http://www.squeak.org') content"

| request |

self initializeFromUrl: urlString.

request := self requestWithUrl: urlString.

request method: 'GET'.

userAgent ifNotNil:[:ua | request headerAt: 'User-Agent' put: ua].

self contentDecoders ifNotNil: [:decoders | request headerAt: 'Accept-Encoding' put: decoders].

aBlock value: request.

^self sendRequest: request

If I take some of that stuff and put the userAgent part into XMLHTTPWebWebClientRequest>>basicSend


self webClientClient userAgent ifNotNil:[:ua | webClientRequest headerAt: 'User-Agent' put: ua].

"	self webClientClient contentDecoders ifNotNil: [:decoders | webClientRequest headerAt: 'Accept-Encoding' put: decoders]."   <-- note commented out

^ self responseClass

      request: self


            (self webClientClient

                  "#sendRequest: unfortunately requires #initializeFromUrl:

                  to be sent first"

                  initializeFromUrl: self url;

                  sendRequest: self webClientRequest)

The following calls work:

(XMLDOMParser parseURL: 'https://www.w3schools.com/xml/simple.xml')  explore.

(XMLDOMParser onURL: 'https://w1.weather.gov/xml/current_obs/index.xml' upToLimit:nil) parseDocument; explore.

(XMLDOMParser onURL: 'https://w1.weather.gov/xml/current_obs/index.xml') parseDocument; explore.

(XMLDOMParser parseURL: 'https://w1.weather.gov/xml/current_obs/index.xml')  explore.

To summarize.


(WebClient httpGet: 'https://w1.weather.gov/xml/current_obs/index.xml') content.

The WebClient correctly sets the headers for the WebRequest.

For the XMLHTTPWebClientRequest which wraps both WebClient and WebRequest  as instance variables,

the WebRequest headers are not properly set.

While I do not know enough about these systems to say definitively, it seems to me that the place to set them correctly is when the instance variable for WebRequest is created.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210923/735521f0/attachment.html>

More information about the Squeak-dev mailing list