Prim error returns (was Re: [squeak-dev] The Primitive: I am not a number- I am a named prim! - SqueakPeople article)

John M McIntosh johnmci at smalltalkconsulting.com
Wed Jul 2 07:12:46 UTC 2008


Ok, well the discussions date from late April 2005.  Some are below.

On Jul 1, 2008, at 11:46 PM, tim Rowledge wrote:

> OK; be aware that there is a pathological case that might impact  
> your code in this area, mostly restricted to non-virtual memory  
> systems. Somewhere in the GC code it will try to grab more memory  
> for forwarding blocks and if none is provided by the OS (as in RISC  
> OS for example) then some of the reserved space will be stolen  
> *without* proper checks and notifications. This can result in the  
> system trying to handle a lowSpace with only a few hundred bytes of  
> free memory. It doesn't go so well after that.... I've been trying  
> to find relevant emails to illustrate better but no luck so far. I'm  
> reasonably sure we never came up with a good solution but the  
> problem surfaced about 4 years ago and just possibly got fixed  
> somewhere.
>
> tim
> --
> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
> "Bother" said Piglet, as Pooh smeared him in honey.

	From: 	johnmci at smalltalkconsulting.com
	Subject: 	initializeMemoryFirstFree

	Date: 	April 28, 2005 10:03:36 PM PDT (CA)

	To: 	tim at sumeru.stanford.edu

initializeMemoryFirstFree
Some thoughts.

a) We must have oh say 100,000 bytes free, reduce fwdBlockBytes if  
need be from it's optimal calculation.
b) fwdBlockBytes must be > 100,000, if not die, this is an arbitrary  
value, not sure what min really should be..

	fwdBlockBytes = foo->totalObjectCount & 4294967292U;
	if (!((foo->memoryLimit - fwdBlockBytes - 100000) >= (firstFree +  
BaseHeaderSize))) {
		fwdBlockBytes = foo->memoryLimit - (firstFree + BaseHeaderSize) -  
100000;
	}
	if (fwdBlockBytes < 0)
		error("Death no memory");

So this does allow me to see and get the dialog, but then I don't  
think the right process gets stopped.
since I don't have control and I can't click/keyboard and we die.  
error("Death no memory");
Is it suspending the UI process?

200,000 is also not much headroom, try 1,000,000 for better safety.

&

Code at bottom to test with.

freememory  fwdblocksBytes,  endOfmemory  memoryLimit

254204 759056 128479936 129238992
206044 759056 128479936 129238992
202500 759056 128479936 129238992

Fall under 200,000  gcmove moves some bytes about so we see 5  
iterations.
Note how fwdblocks changed from 759056 to  859152 and we cap that and  
give back 102404 bytes of free

102404 859152 128379840 129238992
102404 859152 128379840 129238992
102404 877184 128361808 129238992
102404 877184 128361808 129238992
102404 877184 128361808 129238992

JMMJMM TOSS signal LOWSPACE  Yes under the 200,000 so do the signal.
JMMJMM BEEEEEEEEEEEPPPPPPPPP     This is the lowspace process waking  
up and running.
JMMJMM primitiveSignalAtBytesLeft called lowspace process changed  
threshold to zero.

watch how we carve away at fwdblock bytes keeping 100K + 4 bytes around

102404 865684 128373308 129238992
102404 871032 128367960 129238992
102404 871484 128367508 129238992
102404 865364 128373628 129238992
102404 871800 128367192 129238992
102404 869176 128369816 129238992
102404 873180 128365812 129238992
102404 871452 128367540 129238992
102404 864308 128374684 129238992
102404 864120 128374872 129238992
102404 870744 128368248 129238992
102404 870164 128368828 129238992
102404 870664 128368328 129238992
102404 866160 128372832 129238992
102404 869940 128369052 129238992
102404 868352 128370640 129238992
102404 864672 128374320 129238992
102404 870100 128368892 129238992
102404 822776 128416216 129238992
102404 872888 128366104 129238992
102404 872348 128366644 129238992
102404 871832 128367160 129238992
102404 870560 128368432 129238992
102404 869212 128369780 129238992
102404 867052 128371940 129238992
102404 867372 128371620 129238992
102404 867116 128371876 129238992
102404 869252 128369740 129238992
102404 867344 128371648 129238992
102404 866688 128372304 129238992
102404 864384 128374608 129238992
102404 863436 128375556 129238992
102404 860632 128378360 129238992
102404 858804 128380188 129238992
102404 860312 128378680 129238992
102404 859980 128379012 129238992
102404 860172 128378820 129238992
102404 859692 128379300 129238992
102404 860108 128378884 129238992
102404 859524 128379468 129238992
102404 858408 128380584 129238992
102404 860200 128378792 129238992
102404 860200 128378792 129238992
102404 859804 128379188 129238992
102404 859424 128379568 129238992
102404 856880 128382112 129238992
102404 854388 128384604 129238992
102404 855116 128383876 129238992
102404 854748 128384244 129238992
102404 855036 128383956 129238992
102404 855688 128383304 129238992
102404 853028 128385964 129238992
102404 852592 128386400 129238992
102404 852292 128386700 129238992
102404 852940 128386052 129238992
102404 852552 128386440 129238992
102404 852624 128386368 129238992
102404 852732 128386260 129238992
102404 852036 128386956 129238992
102404 855304 128383688 129238992
102404 855000 128383992 129238992
102404 854744 128384248 129238992
102404 854668 128384324 129238992
102404 855340 128383652 129238992
102404 856028 128382964 129238992
102404 853628 128385364 129238992
102404 854504 128384488 129238992
102404 854248 128384744 129238992
102404 854852 128384140 129238992
102404 853556 128385436 129238992
102404 854204 128384788 129238992
102404 852532 128386460 129238992
102404 853840 128385152 129238992
102404 853564 128385428 129238992
102404 853088 128385904 129238992
102404 854196 128384796 129238992
102404 854164 128384828 129238992
102404 854176 128384816 129238992
102404 852560 128386432 129238992
102404 854456 128384536 129238992
102404 852580 128386412 129238992
102404 850528 128388464 129238992
102404 839720 128399272 129238992
102404 848844 128390148 129238992
102404 847364 128391628 129238992
102404 847684 128391308 129238992
102404 847044 128391948 129238992
102404 846788 128392204 129238992
102404 847236 128391756 129238992
102404 850816 128388176 129238992
102404 846164 128392828 129238992
102404 839576 128399416 129238992
102404 848688 128390304 129238992
102404 849004 128389988 129238992
102404 848696 128390296 129238992
102404 838656 128400336 129238992
102404 851656 128387336 129238992
102404 797844 128441148 129238992
102404 793268 128445724 129238992
102404 791064 128447928 129238992
102404 841436 128397556 129238992
102404 840376 128398616 129238992
102404 839528 128399464 129238992
102404 786560 128452432 129238992
102404 835652 128403340 129238992
102404 835652 128403340 129238992
102404 783732 128455260 129238992
102404 782336 128456656 129238992
102404 836676 128402316 129238992
102404 832308 128406684 129238992
102404 832068 128406924 129238992
102404 832636 128406356 129238992
102404 833612 128405380 129238992
102404 834320 128404672 129238992
102404 833980 128405012 129238992
102404 835372 128403620 129238992
102404 780488 128458504 129238992
102404 778388 128460604 129238992
102404 826284 128412708 129238992
102404 827288 128411704 129238992
102404 827288 128411704 129238992
102404 827416 128411576 129238992
102404 827416 128411576 129238992
102404 827544 128411448 129238992
102404 827224 128411768 129238992
102404 828588 128410404 129238992
102404 826084 128412908 129238992
102404 774280 128464712 129238992
102404 837180 128401812 129238992
102404 840692 128398300 129238992
102404 841520 128397472 129238992
102404 796540 128442452 129238992
102404 748380 128490612 129238992
102404 700220 128538772 129238992
102404 652060 128586932 129238992
102404 603900 128635092 129238992
102404 555740 128683252 129238992
102404 507580 128731412 129238992
102404 459420 128779572 129238992
102404 411260 128827732 129238992
102404 363100 128875892 129238992
102404 314940 128924052 129238992
102404 266780 128972212 129238992
102404 218620 129020372 129238992
102404 170460 129068532 129238992
102404 122300 129116692 129238992
102404 74140 129164852 129238992

Grind down forward blocks to 32K, note that free goes to 95616, then  
47456, then 2500.

95616 32768 129206224 129238992
47456 32768 129206224 129238992
2500 32768 129206224 129238992

LOTS of these....

2500 32768 129206224 129238992
2500 32768 129206224 129238992
16720 32768 129206224 129238992
2592 32768 129206224 129238992
2496 32768 129206224 129238992
2496 32768 129206224 129238992
2680 32768 129206224 129238992
2588 32768 129206224 129238992
2484 32768 129206224 129238992
2668 32768 129206224 129238992
2576 32768 129206224 129238992
2472 32768 129206224 129238992
2656 32768 129206224 129238992
2564 32768 129206224 129238992
2460 32768 129206224 129238992
2644 32768 129206224 129238992
2552 32768 129206224 129238992
2448 32768 129206224 129238992
2632 32768 129206224 129238992
2540 32768 129206224 129238992
2436 32768 129206224 129238992
2620 32768 129206224 129238992
2528 32768 129206224 129238992
2424 32768 129206224 129238992
2608 32768 129206224 129238992
2516 32768 129206224 129238992
2412 32768 129206224 129238992
2596 32768 129206224 129238992
2504 32768 129206224 129238992
2400 32768 129206224 129238992
2584 32768 129206224 129238992
2492 32768 129206224 129238992
2492 32768 129206224 129238992
2400 32768 129206224 129238992
2400 32768 129206224 129238992
2308 32768 129206224 129238992
2308 32768 129206224 129238992
2216 32768 129206224 129238992
2216 32768 129206224 129238992
2124 32768 129206224 129238992
2124 32768 129206224 129238992
2032 32768 129206224 129238992
2032 32768 129206224 129238992
1940 32768 129206224 129238992
1940 32768 129206224 129238992
1848 32768 129206224 129238992
1848 32768 129206224 129238992
1756 32768 129206224 129238992
1756 32768 129206224 129238992
1664 32768 129206224 129238992
1664 32768 129206224 129238992
1572 32768 129206224 129238992
1572 32768 129206224 129238992
1480 32768 129206224 129238992
1480 32768 129206224 129238992
1388 32768 129206224 129238992
1388 32768 129206224 129238992
1296 32768 129206224 129238992
1296 32768 129206224 129238992
1204 32768 129206224 129238992
1204 32768 129206224 129238992
1112 32768 129206224 129238992
1112 32768 129206224 129238992
1020 32768 129206224 129238992
1020 32768 129206224 129238992
928 32768 129206224 129238992
928 32768 129206224 129238992
836 32768 129206224 129238992
836 32768 129206224 129238992
744 32768 129206224 129238992
744 32768 129206224 129238992
652 32768 129206224 129238992
652 32768 129206224 129238992
560 32768 129206224 129238992
560 32768 129206224 129238992
468 32768 129206224 129238992
468 32768 129206224 129238992
376 32768 129206224 129238992
376 32768 129206224 129238992
284 32768 129206224 129238992
284 32768 129206224 129238992
192 32768 129206224 129238992
192 32768 129206224 129238992
8 32768 129206224 129238992
8 32768 129206224 129238992

And we die no space to allocate context record....

int initializeMemoryFirstFree(int firstFree) {
register struct foo * foo = &fum;
    int fwdBlockBytes;

	fwdBlockBytes = foo->totalObjectCount & 4294967292U;
	if (!((foo->memoryLimit - fwdBlockBytes) >= ((firstFree +  
BaseHeaderSize) + (100 * 1024)))) {
		fwdBlockBytes = (foo->memoryLimit - (firstFree + BaseHeaderSize)) -  
(100 * 1024);
	}
	if (fwdBlockBytes < (32 * 1024)) {
		fwdBlockBytes = 32 * 1024;
		if (!((foo->memoryLimit - fwdBlockBytes) >= (firstFree +  
BaseHeaderSize))) {
			fwdBlockBytes = foo->memoryLimit - (firstFree + BaseHeaderSize);
		}
	}
	foo->endOfMemory = foo->memoryLimit - fwdBlockBytes;
	foo->freeBlock = firstFree;
	/* begin setSizeOfFree:to: */
	longAtput(foo->freeBlock, ((foo->endOfMemory - firstFree) &  
AllButTypeMask) | HeaderTypeFree);
	/* begin setSizeOfFree:to: */
	longAtput(foo->endOfMemory, (BaseHeaderSize & AllButTypeMask) |  
HeaderTypeFree);
	if (DoAssertionChecks) {
		if (!((foo->freeBlock < foo->endOfMemory) && (foo->endOfMemory < foo- 
 >memoryLimit))) {
			error("error in free space computation");
		}
		if (!((foo->endOfMemory + (foo->headerTypeBytes[(longAt(foo- 
 >endOfMemory)) & TypeMask])) == foo->endOfMemory)) {
			error("header format must have changed");
		}
		if (!((objectAfter(foo->freeBlock)) == foo->endOfMemory)) {
			error("free block not properly initialized");
		}
	}
}

...


at 200,000 we don't see the signal, because after the full GC at the  
200K boundary we
gobble up all the memory for the fwdtable, leaving 4 or 6 bytes left.  
Then we immediately
die because we can't allocate the next context record.  For a 64MB  
memory block we've about a MB or so tied up in fwdspace,
certainly it wants over 200K after the fullGC.

As you noticed changing the limit to be larger, I think 400,000 was  
about the cutoff for my test case gave me the debugger.
Let me check at 512MB.

On Apr 29, 2005, at 3:29 PM, Tim Rowledge wrote:

> OK, I have some slightly off-to-the-side news on this.
>
> A VM with the store-errant-process-oop + an image with dtl's code to  
> make use
> of that is much more stable if the lowSpaceThreshold is raised a  
> good bit. What
> is likely happening to your system is that the lowspace is being  
> signalled but
> the long faringabout with gc attempts means the event ticler is  
> interrupted
> instead of the UI process 'at fault'. Thus the bad boy goes ahead  
> and messes
> you up. With the fix in things are a bit more sensible in that the  
> 'right'
> proces is interupted. Of course, not much help if lots of processes  
> are using
> up memory!
>
> Still, a code-recursion test - ie fill up memory with contexts - is  
> passable
> with 'only' half-meg of lowSpaceThreshold. Likewise the fill memory  
> with
> bitblts. It seems to need 1mb to survive the fill with Links though.  
> This is at
> least encouraging enouhg to maybe help us find the problem with my  
> tree walker
> changes that lead down this path in the first pace.
>
> The essence of the problem is that we really wants one byte per object
> available to the fwdTable. In the worst case, we could have almost  
> all of OM
> filled with plain Objects - ie 8byte chunks - and so would really  
> wish for ~12%
> of available memory reserved. So much for direct pointers 'saving the
> wastedspace of an object table', eh?
>
> more later but I have to dash out for the dog's massage appointment.  
> Really.
>
>
> tim
> --
> Tim Rowledge, tim at sumeru.stanford.edu, http://sumeru.stanford.edu/tim
> Strange OpCodes: SEXI: Sign EXtend Integer
>

Apr 30th,, 2005 John M McIntosh wrote:
>
> On Apr 30, 2005, at 8:00 PM, Andreas Raab wrote:
>
>> Hi Tim -
>>
>>> After having problem trying to debug some TK4 code that blew up  
>>> with lowspace
>>> problems but never let me catch and debug, I spent some time  
>>> adding the
>>> lowspace-process stuff we recently discussed. I had to make a few  
>>> alterations
>>> to match it up with the latest 64bit clean code but no problems  
>>> with that part.
>>
>> What am I missing? I don't remember low-space stuff - I only  
>> remember interrupt-related stuff.
>
> There was a mantis bug about low-space issues and some patchs to  
> record which process caused the lowspace signal. Mind this in my  
> opinion is wrong.
>
>>
>>> Depending upon the exact size of object memory in use the 200kb  
>>> used as the
>>> lowSpaceThreshold can be gobbled up in one swallow by the
>>> initializeMemoryFirstFree: method making sure there is a byte per  
>>> object that
>>> survived the markPhase. In using useUpMemory we can get to having  
>>> 4 bytes of
>>> free space when the next allocate is attempted.... Ka-Boom.
>>
>> Well, so don't eat up the memory. There is no reason why  
>> initializeMemoryFirstFree: would have to reserve that much memory -  
>> like the comment says the reserve "should" be chosen so that  
>> compactions can be done in one pass but there is absolutely no such  
>> requirement. Multi-pass compactions have happened in the past and  
>> there is nothing wrong with them (in a low-space situation).
>>
>>> This assumes that we really need to have one byte per object of  
>>> course. The
>>> original rationale was to keep the number of compact loops down to  
>>> eight (see
>>> Dan's comment in initializeMemoryFirstFree:) for Alan's large demo  
>>> image. The
>>> nicest solution would be to come up with a way to do our GC &  
>>> compacting
>>> without needing any extra space. Commence headscratching now...  
>>> John suggested
>>> making sure the fwd gets less than the byte-per-object if things  
>>> are tight, and
>>> accpting the extra compaction loops.
>>
>> Yes. That's the only reasonable way of dealing with it.
>
> What happens is the fwdblocks calculation grabs all the available  
> free memory when it's recalculated after the full GC, the check for  
> this condition actually backs it off to allow one object header  
> free, 4 or 6 bytes I believe, usually you die right away because  
> someone attempts to allocate a  new context record and we don't have  
> 98ish bytes free. I gave Tim a change set that attempts to maximise  
> freespace to 100K by reducing fwdblocks down to 32k, once you hit  
> the 32k limit freespace then heads towards zero of course.
>
> Note that once freespace goes under 200,000 we do signal the  
> lowspace semaphore btw.
>
> These changes do require a VM change, but we did notice as Tim  
> points out if you increase the lowspace threshold, say to 1MB in my  
> testing the other night we'll get the semaphore signaled with a  
> current VM, this would not occur before in an unaltered VM.
>
>>
>>> Bad news- consider Tweak. With lots of processes whizzing away,  
>>> merely stopping
>>> the one that did the allocation and triggered the lowspace is not  
>>> going to be
>>> much good. Stopping everything except the utterly essential stuff  
>>> to debug the
>>> lowspace will be needed. Probably.
>>
>> Uh, oh. Are you telling me that the "low space stuff" you are  
>> referring to above actually suspends the process that triggers the  
>> low-space condition? Bad, bad, bad idea. Ever considered that this  
>> might be the timer process? The finalization process? Low-space is  
>> *not* a per-process condition; suspending the currently running  
>> process is something that should be done with great care (if at all).
>>
>> Please, don't suspend that process - put it away for the image to  
>> examine but by all means do NOT suspend it. If you give me a nice  
>> clean semaphore signal for Tweak to handle a low-space condition I  
>> know perfectly well what to do but if you just suspend a random  
>> process which may have absolutely nothing with the low space  
>> condition, then, yes, we are in trouble (if this were a tweak  
>> scheduler process you'd be totally hosed).
>
> Tim and I were considering to suspend all user processes and others  
> we don't have knowledge of being untouchable, then I pointed out  
> Tweak spawns all these process, what do we do about them? Certainly  
> we can call something to say lowspace Mr Tweak beware...
>
> The Process Browser logic has a table identifying processes of the  
> VM,  we assume a process the user created is causing the problem.   
> The earlier fix suggested to stop the process that was running when  
> the lowspace condition occurred, but I doubt you can 100% say that  
> is the process in question and could as you know be the finalization  
> process or other critical task. Still this is not harmful because  
> the evil process in question is still running and will terminate  
> your image in short order.
>
>>
>> Cheers,
>>  - Andreas
>>
>




--
= 
= 
= 
========================================================================
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
= 
= 
= 
========================================================================


--
= 
= 
= 
========================================================================
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
= 
= 
= 
========================================================================





More information about the Squeak-dev mailing list