[Vm-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

John M McIntosh johnmci at smalltalkconsulting.com
Tue Apr 14 17:04:20 UTC 2009

Yes, so WikiServer  ( http://www.mobilewikiserver.com ) is an image  
file of 10.5 MB.

As startup only 4.5 MB of OOPS memory is faulted in, and about 700K of  
memory is altered, which reduces initial memory use by 6MB and reduces  
the startup time by 3+ seconds.
Given the slow speed of the iPhone, and the fact I've a 64MB limit,  
6MB is a lot, and 3 seconds is welcome. Unfortunately a full GC will  
fault all that 10.5 MB in, however by doing some GC tuning one
can avoid a full GC until things are quite stressed.

I note on os-x desktop machines the entire 10.5MB is read in and the  
pages marked as non-referenced, but obviously the rules for the  
virtual memory subsystem are different.

First let me suggest we change


	/* header size in bytes; do not change! */

	headerSize = 64;
	f = sqImageFileOpen(imageName, "wb");

from 64 bytes to 4096 bytes, if possible.

Now let's explore why, and what is going on.

on unix if you decide to use mmap versus malloc to allocate storage  
for oops space it does

  mmap(0, heapLimit, MAP_PROT, MAP_FLAGS, devZero, 0)

where heapLimit is usually 1GB, start location zero.

This returns a start location somewhere in memory, never zero, and  
generally we have to swizzle all the object pointer. On a save and  
restart of the squeak app the address
you get back *maybe* the same address, if it is the same we don't need  
to swizzle pointers, however OpenBSD based systems likely will always  
give a different location for security reasons.

I had then set a start location of 128MB, but found on os-x as the  
number of apps goes up you don't get 128MB, so I settled on 500MB which
seems ok. 8GB pro macs running 52 applications fail at 500MB, but the  
failure is it chooses it's own address, so we don't care...

Well yes that limits your squeak image to 3.5GB but it's doubtful that  
a 32bit system will let you allocate a contiguous chunk of memory > 2  
GB anyway.

Now the next issue was the original memory allocation logic would give  
you the 1GB, and you would read the entire image into that memory area.

In thinking about this I thought why can't you mmap to the image file  
for the size of the file rounded up to the page size, then mmap after  
that memory
to anonymous memory upto the desired heapsize.

So two mmaps, one for the file, followed by another for young space.

I implemented this for the os-x vm and the iPhone VM.

In testing with a 500 Mhz powerpc laptop, I found the startup time was  
reduced by 30% because it would fault in say a 20MB page by page as it  
did the
needless flush primitive calls logic, versus reading the 20mb into  
memory, the virtual memory pager was just more efficient at pullling  
in the data either by better
I/O processing, or faster logic in finding the free pages.


It turns out there is a flaw in the OS-X BSD mmap logic when you mmap  
files on NFS drives, it hangs, and some people with I think  
overstressed systems reported issues with the first mmap failing.
Because of this I reverted back to the old logic by default, and put a  
flag in the info.plist SqueakUseFileMappedMMAP to enable the new logic.
Obviously for Linux you have to decide if this flaw exists and has not  
been fixed?

Now the problem with headerSize

In the file mmap case the entire file is mapped into memory at 500MB,  
but the oops space starts at 64 bytes, so memory is at 500MB+64
In the anonymous mmap case memory starts at 500MB, but the oops space  
starts at 0, so memory is at 500MB.

If the headersize was 4096 we could mmap the file at 500MB-4096,
Or alter the anonymous case we could allocate at 500MB but stick the  
oops space at 500MB +64 (header size).

However by using a headerSize of 4096 we can then get the oops space  
to start on a page boundary, which may or may not have implications.
Anyway it would be good to resolve this bit of tricky logic.

I stuck the following code into ioRelinquishProcessorForMicroseconds()  
since ioRelinquishProcessorForMicroseconds will only get triggered  
once the
image finishes all it's startup logic and becomes *idle*. So that I  
could determine how each page was viewed by the virtual memory  

xtern unsigned char *memory;
	extern usqInt	sqGetAvailableMemory();
	extern size_t fileRoundedUpToPageSize;
	size_t pageSize= getpagesize();
	size_t vmpagesize=sqGetAvailableMemory()/pageSize + 1;
	char *what = malloc(vmpagesize);
	int err = mincore(memory, sqGetAvailableMemory(), what);
	int countRef=0, countMod=0,countZero=0, countOne=0, i;
	for (i=0;i<fileRoundedUpToPageSize/pageSize;i++) {
		if(what[i] == 0) countZero++;
		if(what[i] == 1) countOne++;
		if(what[i] == 3) countRef++;
		if(what[i] == 7) countMod++;
		{break for debugging here}

On 14-Apr-09, at 1:58 AM, Bert Freudenberg wrote:

> On 14.04.2009, at 07:26, John M McIntosh wrote:
>> I created a pharo entry to track the problem the VM & image has in  
>> wanting to visit every smalltalk object multiple times at startup  
>> time.
>> Athought this behavior is masked by Gigaherz processors, it's very  
>> evident as a problem on the iPhone. Fixing it results in reducing
>> MB of RAM memory usage and saves actual "seconds* of clock time at  
>> startup.
>> http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200
> Very nice. We experimented in that direction for OLPC which also is  
> comparatively slow CPU wise, and even slower loading the whole image  
> from the flash disk (which involves decompressing). Mmapping only  
> the pages needed should give a considerable boost.
> Do we have evidence that an mmap base address of 500 MB works across  
> platforms?
> - Bert -

John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com

More information about the Vm-dev mailing list