[squeak-dev] Possible change/fix for WebMessage>>streamDirectlyFrom:to:size:progress: in 5.2

Sat Jun 20 15:18:42 UTC 2020

Hi Tim,

On Fri, 19 Jun 2020, Tim Johnson wrote:

> Hi,
> I've been running a Squeak web server on the wide-open internet for over a year now (WebClient/WebServer/Seaside/nginx proxy).  Within a few days it will always stop answering the phone, begin eating ~99% of the CPU, and
> eventually segfault (though it has become less segfaulty on this May 2020 VM).  I've tried the recommended Unix tweaks for it (file handle limits, etc) but I believe there is something going on inside the image.  Today's
> issue may be related (at least) to some code in WebMessage.
> 
> tl;dr:
> 
> I'm proposing a change to WebMessage>>#streamDirectlyFrom:to:size:progress: 
> ...but my proposed change could very well be wrong, or be poor logic.  So I'm writing up my lengthy rationale below.
> 
> What do you think about changing:
> 
> [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[

I read the above line (combined with the assignment to size and totalRead) 
as: if no size was specified (sizeOrNil == nil, size = 0), then read while 
there's data available from the source stream (srcStream atEnd not).
If there was a size specified, assume that its the actual number of bytes 
available, and read that many bytes (totalRead < size).
The last line is problematic, as you pointed out in your analysis, when 
the source stream and the size are both provided by the other party.

> 
> to:
> 
> [sizeOrNil == nil or: [totalRead < size and: [srcStream atEnd not]]] whileTrue:[

The problem with the above is that it'll become and infinite loop if 
sizeOrNil is nil.

So, if you want to read while there's data available from the stream, then 
I think the following is the right choice:

[ srcStream atEnd not and: [ sizeOrNil isNil or: [ totalRead < sizeOrNil ] ] whileTrue: [

The above always takes into account whether there's data available from 
the stream and directly uses sizeOrNil for size comparison to make it 
more legible. However, there seem to be a few more things to clean up in 
that method.

Levente

> 
> ?
> 
> 
> My logic could be wrong, and I don't understand the sizeOrNil check here.  But after I made this change, the infinite loop went away, GC destroyed hundreds of stale Sockets and Processes, and my Seaside app began answering
> again (note that this is the first time I've been successful at bringing it back to life).  
> 
> If the original logic is correct as intended, it seems to be a trade-off between continuing to read from the stream if we haven't yet read #size characters, versus not trying to read from a stream that is #atEnd.  My version
> breaks out of the loop if the stream is #atEnd even if the method believes there are characters left to read.  The reason I think this is worthwhile, is that #totalRead will (endlessly) be 0 if the Socket underneath the
> SocketStream is closed, and there are no more bytes to read, thus causing an infinite loop.  It is *possible* (and mentioned below) that the design is supposed to catch a #ConnectionClosed exception, but in my case one never
> got thrown.  (note the Socket's state was #otherEndClosedButNotThisEnd)
> 
> It's also possible that the method is designed to continue reading from the SocketStream in case it is expecting more data which just hasn't arrived yet (as I think I saw in a method comment somewhere), but this may not make
> sense when the underlying Socket has been closed and the method keeps reading zero characters from it.  
> 
> Of course, actually, anything is possible.  What do I know?  :D
> 
> The rest of this message contains my exploration, debugging session, and further rationale. 
> 
> Thanks,
> a Tim
> 
> -------
> 
> After running for eight days, Squeak had 300 socket connections open.  This is far too many for my very, very unpopular website:
> 
> $ sudo netstat -a --program | awk '/http.*squeak/ { print $6 }' | sort | uniq -c
>     297 CLOSE_WAIT
>       6 ESTABLISHED
>       1 LISTEN
> 
> A scan of Processes belonging to WebServer returned over 500, and climbing:
> 
> Process allSubInstances select: [:proc | 
> proc suspendedContext
> ifNil: [ false ]
> ifNotNil: [:context | 
> context closure
> ifNil: [ false ]
> ifNotNil: [:closure | | receiver |
> receiver := closure outerContext receiver.
> (receiver respondsTo: #outerContext)
> ifTrue: [ receiver outerContext receiver class == WebServer ] 
> ifFalse: [ false ] ] ] ]
> 
> 
> Note that none of them are terminated:
> 
> Bag newFrom: (self collect: [:proc | proc isTerminated])
> -> a Dictionary(false->507 true->1 )
> 
> Most/all of them seem to be hanging out in this method:
> 
> method:  (BlockClosure>>#newProcess "a CompiledMethod(3871190)")
> closureOrNil:  [closure] in BlockClosure>>newProcess
> receiver:  [closure] in WebServer>>asyncHandleConnectionFrom:
> 
> It seems that WebServer uses SocketStream on a Socket.  So how many SocketStreams exist and where are they?
> 
> Bag newFrom: (SocketStream allInstances collect: [:ea | ea socket statusString asSymbol])
> -> a Dictionary(#invalidSocketHandle->32 #otherEndClosedButNotThisEnd->1 )
> 
> So, there's one SocketStream that's closed but we're holding on to it.  Why?
> 
> I opened a Process Browser, turned on CPU Watcher, turned on Auto-Update and found WebServer's listener process eating > 90% CPU.  I opened a debugger on it.  It seemed it was in an infinite loop centered
> around WebMessage>>streamDirectlyFrom:to:size:progress:  on this very request.
> 
> [] in SocketStream>>beSignalingWhile:
> BlockClosure>>ensure:
> SocketStream>>beSignalingWhile:
> [] in SocketStream>>next:into:startingAt:
> BlockClosure>>on:do:
> SocketStream>>next:into:startingAt:
> WebRequest(WebMessage)>>streamDirectlyFrom:to:size:progress:
> WebRequest(WebMessage)>>streamFrom:to:size:progress:
> [] in WebRequest(WebMessage)>>getContentWithProgress:
> ByteString class(SequenceableCollection class)>>new:streamContents:
> WebRequest(WebMessage)>>getContentWithProgress:
> WebRequest(WebMessage)>>getContent
> WebRequest(WebMessage)>>content
> WAWebServerAdaptor>>requestBodyFor:
> WAWebServerAdaptor(WAServerAdaptor)>>requestFor:
> WAWebServerAdaptor(WAServerAdaptor)>>contextFor:
> WAWebServerAdaptor(WAServerAdaptor)>>process:
> WAWebServerAdaptor>>process:
> MessageSend>>valueWithArguments:
> WebServer>>invokeAction:request:
> 
> Some exploit attempt sending an HTTP POST request to /tmUnblock.cgi and with a Content-Length of 227 was causing WebMessage to lose its mind.  (This is when I realized that my nginx proxy configuration might be wrong, and
> some obvious exploit attempts could be getting through to Squeak.  But that's a different matter.)
> 
> Now I believe any of the following could have been happening:
> 
> * WebMessage & friends seem to take the supplied Content-Length header in an HTTP request as being literal and true, even though a client could be supplying a wrong or misleading value.  I don't know if there are tests
> against this but I might write one.
> 
> * If the Socket is closed, the code on the stack trace above will keep trying to read from it.  I can't tell if:
> 
> a) there is an arithmetic problem somewhere, or
> b) if #ConnectionClosed isn't being raised or caught in the proper way, or 
> c) if there's a logic error in WebMessage>>streamDirectlyFrom:to:size:progress: , or 
> d) this is totally intentional, or
> e) all of the above, none of the above, or something else.  
> 
> I opted to fix (c).
> 
> * This portion of WebMessage>>streamDirectlyFrom:to:size:progress: is troublesome to me, and my change seems to make the infinite loop end:
> 
> [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[
> 
> In this case, totalRead was always less than size, because the connection was closed and the methods would always return empty strings for what had been read (thus, totalRead = 0).  So, [totalRead < size] was always True,
> which would override the check for [srcStream atEnd not] and it would continue to try reading from a SocketStream on a closed Socket.
> 
> 
> 
> 
>