[squeak-dev] Possible change/fix for WebMessage>>streamDirectlyFrom:to:size:progress: in 5.2

Tim Johnson digit at sonic.net
Sat Jun 20 03:52:01 UTC 2020


Hi,

I've been running a Squeak web server on the wide-open internet for over a year now (WebClient/WebServer/Seaside/nginx proxy).  Within a few days it will always stop answering the phone, begin eating ~99% of the CPU, and eventually segfault (though it has become less segfaulty on this May 2020 VM).  I've tried the recommended Unix tweaks for it (file handle limits, etc) but I believe there is something going on inside the image.  Today's issue may be related (at least) to some code in WebMessage.

tl;dr:

I'm proposing a change to WebMessage>>#streamDirectlyFrom:to:size:progress: 
...but my proposed change could very well be wrong, or be poor logic.  So I'm writing up my lengthy rationale below.

What do you think about changing:

	[(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[

to:

	[sizeOrNil == nil or: [totalRead < size and: [srcStream atEnd not]]] whileTrue:[

?


My logic could be wrong, and I don't understand the sizeOrNil check here.  But after I made this change, the infinite loop went away, GC destroyed hundreds of stale Sockets and Processes, and my Seaside app began answering again (note that this is the first time I've been successful at bringing it back to life).  

If the original logic is correct as intended, it seems to be a trade-off between continuing to read from the stream if we haven't yet read #size characters, versus not trying to read from a stream that is #atEnd.  My version breaks out of the loop if the stream is #atEnd even if the method believes there are characters left to read.  The reason I think this is worthwhile, is that #totalRead will (endlessly) be 0 if the Socket underneath the SocketStream is closed, and there are no more bytes to read, thus causing an infinite loop.  It is *possible* (and mentioned below) that the design is supposed to catch a #ConnectionClosed exception, but in my case one never got thrown.  (note the Socket's state was #otherEndClosedButNotThisEnd)

It's also possible that the method is designed to continue reading from the SocketStream in case it is expecting more data which just hasn't arrived yet (as I think I saw in a method comment somewhere), but this may not make sense when the underlying Socket has been closed and the method keeps reading zero characters from it.  

Of course, actually, anything is possible.  What do I know?  :D

The rest of this message contains my exploration, debugging session, and further rationale. 

Thanks,
a Tim

-------

After running for eight days, Squeak had 300 socket connections open.  This is far too many for my very, very unpopular website:

$ sudo netstat -a --program | awk '/http.*squeak/ { print $6 }' | sort | uniq -c
    297 CLOSE_WAIT
      6 ESTABLISHED
      1 LISTEN

A scan of Processes belonging to WebServer returned over 500, and climbing:

Process allSubInstances select: [:proc | 
	proc suspendedContext
		ifNil: [ false ]
		ifNotNil: [:context | 
			context closure
				ifNil: [ false ]
				ifNotNil: [:closure | | receiver |
					receiver := closure outerContext receiver.
					(receiver respondsTo: #outerContext)
						ifTrue: [ receiver outerContext receiver class == WebServer ] 
						ifFalse: [ false ] ] ] ]


Note that none of them are terminated:

Bag newFrom: (self collect: [:proc | proc isTerminated])
-> a Dictionary(false->507 true->1 )

Most/all of them seem to be hanging out in this method:

method: 	(BlockClosure>>#newProcess "a CompiledMethod(3871190)")
closureOrNil: 	[closure] in BlockClosure>>newProcess
receiver: 	[closure] in WebServer>>asyncHandleConnectionFrom:

It seems that WebServer uses SocketStream on a Socket.  So how many SocketStreams exist and where are they?

Bag newFrom: (SocketStream allInstances collect: [:ea | ea socket statusString asSymbol])
-> a Dictionary(#invalidSocketHandle->32 #otherEndClosedButNotThisEnd->1 )

So, there's one SocketStream that's closed but we're holding on to it.  Why?

I opened a Process Browser, turned on CPU Watcher, turned on Auto-Update and found WebServer's listener process eating > 90% CPU.  I opened a debugger on it.  It seemed it was in an infinite loop centered around WebMessage>>streamDirectlyFrom:to:size:progress:  on this very request.

[] in SocketStream>>beSignalingWhile:
BlockClosure>>ensure:
SocketStream>>beSignalingWhile:
[] in SocketStream>>next:into:startingAt:
BlockClosure>>on:do:
SocketStream>>next:into:startingAt:
WebRequest(WebMessage)>>streamDirectlyFrom:to:size:progress:
WebRequest(WebMessage)>>streamFrom:to:size:progress:
[] in WebRequest(WebMessage)>>getContentWithProgress:
ByteString class(SequenceableCollection class)>>new:streamContents:
WebRequest(WebMessage)>>getContentWithProgress:
WebRequest(WebMessage)>>getContent
WebRequest(WebMessage)>>content
WAWebServerAdaptor>>requestBodyFor:
WAWebServerAdaptor(WAServerAdaptor)>>requestFor:
WAWebServerAdaptor(WAServerAdaptor)>>contextFor:
WAWebServerAdaptor(WAServerAdaptor)>>process:
WAWebServerAdaptor>>process:
MessageSend>>valueWithArguments:
WebServer>>invokeAction:request:

Some exploit attempt sending an HTTP POST request to /tmUnblock.cgi and with a Content-Length of 227 was causing WebMessage to lose its mind.  (This is when I realized that my nginx proxy configuration might be wrong, and some obvious exploit attempts could be getting through to Squeak.  But that's a different matter.)

Now I believe any of the following could have been happening:

* WebMessage & friends seem to take the supplied Content-Length header in an HTTP request as being literal and true, even though a client could be supplying a wrong or misleading value.  I don't know if there are tests against this but I might write one.

* If the Socket is closed, the code on the stack trace above will keep trying to read from it.  I can't tell if:

a) there is an arithmetic problem somewhere, or
b) if #ConnectionClosed isn't being raised or caught in the proper way, or 
c) if there's a logic error in WebMessage>>streamDirectlyFrom:to:size:progress: , or 
d) this is totally intentional, or
e) all of the above, none of the above, or something else.  

I opted to fix (c).

* This portion of WebMessage>>streamDirectlyFrom:to:size:progress: is troublesome to me, and my change seems to make the infinite loop end:

	[(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[

In this case, totalRead was always less than size, because the connection was closed and the methods would always return empty strings for what had been read (thus, totalRead = 0).  So, [totalRead < size] was always True, which would override the check for [srcStream atEnd not] and it would continue to try reading from a SocketStream on a closed Socket.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200619/8a499a45/attachment.html>


More information about the Squeak-dev mailing list