[Seaside] production systems experiences

Tue May 4 19:02:26 CEST 2004

radoslav hodnicak wrote:
> I have one seaside/squeak system running in production, it ran
> trouble-free for weeks, but today I've seen some very strange behaviour
> that I can't explain from looking at my code - it looks like some action
> blocks got somehow triggered twice. I know that mixing blocks,
> continuations, exceptions etc is messy - has anyone seen similar problems?
> Any tips how to (try) to diagnose this?

Hmm... it's hard to see how that could happen although I do have 
memories of seeing it occasionaly in the past.  This is still 2.3?  I 
wonder whether you had some processes terminated which left the blocks 
marked as if there was still code executing them or something?  The 
session should be protected by WAProcessMonitor to make sure that two 
requests are never running in the same session at the same time.  There 
are a few problems with concurrency under high load in 2.3.  I fixed 
these at work and merged the fixes into 2.5 but should probably look at 
pushing them into 2.3 as well...  so if you were getting quite heavy 
load (we were testing with 20 concurrent users hitting it pretty hard) 
that could also be your problem possibly.

> Related topic - code updates on live system. I've seen some problems with
> that too (I think the stored blocks in squeak don't use the new/changed
> method)

Yeah, you would have that problem alright.  The blocks are storing the 
context where they were created including variable bindings, position in 
the executing code, etc.  The code of the block is also defined in the 
method.  Once you recompile the method, the block may have different 
code, the variables it's referring to are quite likely different, etc. 
You'd encounter the same problem if a process was currently executing 
inside a method when you recompiled it: that process would obviously 
keep running through the old method; it can't switch to the new version 
of the method midway through.

You can see this if you change the #go method of a task while party way 
through the task.  The old task code will continue to be used until you 
enter it from the beginning again.

I personally consider loading code into the live system in use a little 
risky at this point.  A user could make a request while the new code is 
half-loaded, for example (we don't have atomic fileins).  You're also 
completly exposed to something going wrong during filein and making the 
system unusable.  At work, we build a new image, and then use our load 
balancing system to bring the new image up for new users while allowing 
existing sessions to run to completion in the old image.

Julian