[squeak-dev] Possible change/fix for WebMessage>>streamDirectlyFrom:to:size:progress: in 5.2
Levente Uzonyi
leves at caesar.elte.hu
Sat Jun 20 15:18:42 UTC 2020
Hi Tim,
On Fri, 19 Jun 2020, Tim Johnson wrote:
> Hi,
> I've been running a Squeak web server on the wide-open internet for over a year now (WebClient/WebServer/Seaside/nginx proxy). Within a few days it will always stop answering the phone, begin eating ~99% of the CPU, and
> eventually segfault (though it has become less segfaulty on this May 2020 VM). I've tried the recommended Unix tweaks for it (file handle limits, etc) but I believe there is something going on inside the image. Today's
> issue may be related (at least) to some code in WebMessage.
>
> tl;dr:
>
> I'm proposing a change to WebMessage>>#streamDirectlyFrom:to:size:progress:
> ...but my proposed change could very well be wrong, or be poor logic. So I'm writing up my lengthy rationale below.
>
> What do you think about changing:
>
> [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[
I read the above line (combined with the assignment to size and totalRead)
as: if no size was specified (sizeOrNil == nil, size = 0), then read while
there's data available from the source stream (srcStream atEnd not).
If there was a size specified, assume that its the actual number of bytes
available, and read that many bytes (totalRead < size).
The last line is problematic, as you pointed out in your analysis, when
the source stream and the size are both provided by the other party.
>
> to:
>
> [sizeOrNil == nil or: [totalRead < size and: [srcStream atEnd not]]] whileTrue:[
The problem with the above is that it'll become and infinite loop if
sizeOrNil is nil.
So, if you want to read while there's data available from the stream, then
I think the following is the right choice:
[ srcStream atEnd not and: [ sizeOrNil isNil or: [ totalRead < sizeOrNil ] ] whileTrue: [
The above always takes into account whether there's data available from
the stream and directly uses sizeOrNil for size comparison to make it
more legible. However, there seem to be a few more things to clean up in
that method.
Levente
>
> ?
>
>
> My logic could be wrong, and I don't understand the sizeOrNil check here. But after I made this change, the infinite loop went away, GC destroyed hundreds of stale Sockets and Processes, and my Seaside app began answering
> again (note that this is the first time I've been successful at bringing it back to life).
>
> If the original logic is correct as intended, it seems to be a trade-off between continuing to read from the stream if we haven't yet read #size characters, versus not trying to read from a stream that is #atEnd. My version
> breaks out of the loop if the stream is #atEnd even if the method believes there are characters left to read. The reason I think this is worthwhile, is that #totalRead will (endlessly) be 0 if the Socket underneath the
> SocketStream is closed, and there are no more bytes to read, thus causing an infinite loop. It is *possible* (and mentioned below) that the design is supposed to catch a #ConnectionClosed exception, but in my case one never
> got thrown. (note the Socket's state was #otherEndClosedButNotThisEnd)
>
> It's also possible that the method is designed to continue reading from the SocketStream in case it is expecting more data which just hasn't arrived yet (as I think I saw in a method comment somewhere), but this may not make
> sense when the underlying Socket has been closed and the method keeps reading zero characters from it.
>
> Of course, actually, anything is possible. What do I know? :D
>
> The rest of this message contains my exploration, debugging session, and further rationale.
>
> Thanks,
> a Tim
>
> -------
>
> After running for eight days, Squeak had 300 socket connections open. This is far too many for my very, very unpopular website:
>
> $ sudo netstat -a --program | awk '/http.*squeak/ { print $6 }' | sort | uniq -c
> 297 CLOSE_WAIT
> 6 ESTABLISHED
> 1 LISTEN
>
> A scan of Processes belonging to WebServer returned over 500, and climbing:
>
> Process allSubInstances select: [:proc |
> proc suspendedContext
> ifNil: [ false ]
> ifNotNil: [:context |
> context closure
> ifNil: [ false ]
> ifNotNil: [:closure | | receiver |
> receiver := closure outerContext receiver.
> (receiver respondsTo: #outerContext)
> ifTrue: [ receiver outerContext receiver class == WebServer ]
> ifFalse: [ false ] ] ] ]
>
>
> Note that none of them are terminated:
>
> Bag newFrom: (self collect: [:proc | proc isTerminated])
> -> a Dictionary(false->507 true->1 )
>
> Most/all of them seem to be hanging out in this method:
>
> method: (BlockClosure>>#newProcess "a CompiledMethod(3871190)")
> closureOrNil: [closure] in BlockClosure>>newProcess
> receiver: [closure] in WebServer>>asyncHandleConnectionFrom:
>
> It seems that WebServer uses SocketStream on a Socket. So how many SocketStreams exist and where are they?
>
> Bag newFrom: (SocketStream allInstances collect: [:ea | ea socket statusString asSymbol])
> -> a Dictionary(#invalidSocketHandle->32 #otherEndClosedButNotThisEnd->1 )
>
> So, there's one SocketStream that's closed but we're holding on to it. Why?
>
> I opened a Process Browser, turned on CPU Watcher, turned on Auto-Update and found WebServer's listener process eating > 90% CPU. I opened a debugger on it. It seemed it was in an infinite loop centered
> around WebMessage>>streamDirectlyFrom:to:size:progress: on this very request.
>
> [] in SocketStream>>beSignalingWhile:
> BlockClosure>>ensure:
> SocketStream>>beSignalingWhile:
> [] in SocketStream>>next:into:startingAt:
> BlockClosure>>on:do:
> SocketStream>>next:into:startingAt:
> WebRequest(WebMessage)>>streamDirectlyFrom:to:size:progress:
> WebRequest(WebMessage)>>streamFrom:to:size:progress:
> [] in WebRequest(WebMessage)>>getContentWithProgress:
> ByteString class(SequenceableCollection class)>>new:streamContents:
> WebRequest(WebMessage)>>getContentWithProgress:
> WebRequest(WebMessage)>>getContent
> WebRequest(WebMessage)>>content
> WAWebServerAdaptor>>requestBodyFor:
> WAWebServerAdaptor(WAServerAdaptor)>>requestFor:
> WAWebServerAdaptor(WAServerAdaptor)>>contextFor:
> WAWebServerAdaptor(WAServerAdaptor)>>process:
> WAWebServerAdaptor>>process:
> MessageSend>>valueWithArguments:
> WebServer>>invokeAction:request:
>
> Some exploit attempt sending an HTTP POST request to /tmUnblock.cgi and with a Content-Length of 227 was causing WebMessage to lose its mind. (This is when I realized that my nginx proxy configuration might be wrong, and
> some obvious exploit attempts could be getting through to Squeak. But that's a different matter.)
>
> Now I believe any of the following could have been happening:
>
> * WebMessage & friends seem to take the supplied Content-Length header in an HTTP request as being literal and true, even though a client could be supplying a wrong or misleading value. I don't know if there are tests
> against this but I might write one.
>
> * If the Socket is closed, the code on the stack trace above will keep trying to read from it. I can't tell if:
>
> a) there is an arithmetic problem somewhere, or
> b) if #ConnectionClosed isn't being raised or caught in the proper way, or
> c) if there's a logic error in WebMessage>>streamDirectlyFrom:to:size:progress: , or
> d) this is totally intentional, or
> e) all of the above, none of the above, or something else.
>
> I opted to fix (c).
>
> * This portion of WebMessage>>streamDirectlyFrom:to:size:progress: is troublesome to me, and my change seems to make the infinite loop end:
>
> [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[
>
> In this case, totalRead was always less than size, because the connection was closed and the methods would always return empty strings for what had been read (thus, totalRead = 0). So, [totalRead < size] was always True,
> which would override the check for [srcStream atEnd not] and it would continue to try reading from a SocketStream on a closed Socket.
>
>
>
>
>
More information about the Squeak-dev
mailing list
|