[Vm-dev] Spur Memory segment and OS allocation

Mon May 7 23:31:33 UTC 2018

Hi Clément,

On Wed, May 2, 2018 at 11:42 PM, Clément Bera <bera.clement at gmail.com>
wrote:

> Hi Eliot,
>
> I am more annoyed about using mmap to get memory at higher addresses and
> segment positioning than using mmap itself.
>
> Allocating memory at higher addresses:
> - is impossible in some platforms such as rumpkernel
> - is annoying since it relies on API such as sbrk, which is deprecated in
> SUSv2 and not present at all in POSIX.
>
> Malloc is not amazing, but it's much more portable. I would rather have
> something like:
>
> #ifdef mmap
>   mmap(...)
> #else
>   posix_memalign{...)
> #endif
>

I think the thing to do is something like
- a cross platform define, USE_MALLOC (see e.g. USE_MMAP in the Unix
sources)
- either files that implement each interface, e.g.
    sqUnixSpurMMapMemory.c, sqUnixSpurMallocMemory.c
  and a small file that includes one or the other
- or each function implemented twice3, surrounded by #if USE_MALLOC ...
#else ... #endif /* USE_MALLOC */

But before you go there, how do you hope to get posix_memalign to answer
memory where you need it?  It seems to me that if you go with malloc then
you're forced to allocate memory at start-up, because there's no guarantee
that memory will appear at higher or lower addresses than the initial
alloc.  This was the situation with the original V3 allocator; it did a
large allocation, defaulting to 512Mb, and allocated the heap from that.
One couldn't release memory back to the OS, one couldn't have a small
footprint and be allowed to grow afterwards, etc.  And the only interface I
know that allows address hints is the memory mapping one.

I get that there are platforms for which mmap doesn't work.  But I would
suggest that on these platforms one has to do something very different than
what makes sense for desktop OSs, and so one has to accommodate their
limitations somehow.

> *1. all primitive functions are above 1024*
>
> That's not a problem whatsoever.
>

I know.  I was just being precise.

>
>
> *2. New space is below all old space segments*
>
> Here there's the alignment solution, which improves performance and
> removes the constraint.
>

What, that objects are allocated on a 0 modulo 16 byte boundary or an 8
modulo 16 byte boundary?  That's fine, but it wastes 5% of the heap.  And
don't we encounter more problems using the alignment solution if we want to
do shared segments?  (I guess not; we simply choose the 0 modulo 16
alignment for old space segments).  Have you implemented the alignment
solution?

> *3. the code zone is below new space. This allows isReallyYoungObject: to
> use two comparisons, instead of three.*
>
> isReallyYoungObject: objOop
> <api>
> "Answer if obj is young. Require that obj is non-immediate. Override to
> filter-out Cog methods"
> self assert: (self isNonImmediate: objOop).
> ^(self oop: objOop isLessThan: newSpaceLimit)
>   and: [self oop: objOop isGreaterThanOrEqualTo: newSpaceStart]
>
> I don't think that method would change.
>

Right.  Sorry.

> I think the method isMachineCodeFrame: would change (2 comparisons
> instead of one).
>
> I dream of a world where young space, code zone and old space segments are
> in different segments which do not have any position requirement. That way:
> - no constraints for platforms like rumpkernel
> - no reliance on API such as sbrk.
> - quicker segment alloc since 0 can be used as the address (OS allocates
> segment wherever it wants)
> - quicker write barrier with bit check instead of cmp with constant
> - growing code zone at runtime is fairly easy (divorce allFrames, alloc
> new segment and free old one)
> - growing new space at runtime is fairly easy (do a tenureAll, alloc new
> segment and free the old new space segment)
>

Sure.  Some things to consider:
- boundary checks are frequent.  The ParcPlace code (written by David Ungar
and Frank Jackson) was very careful to have as many single boundary checks
as possible.

And my own complaint.
- sbrk is regrettable, but an interface like mmap that allows for one to
supply a position hint and then doesn't provide a convenient of finding out
what the emory map seems to me to be broken without sbrk.

> It's just details all right. I will see if I can try that someday.
>

Right.  And implementing things using subclasses (e.g. of
SpurMemoryManager) means we can mix and match and experiment.

P.S.  sorry to have replied so slowly...

> On Wed, May 2, 2018, 02:19 Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
>> Hi Clémewnt,
>>
>>    sorry for the late reply...
>>
>> On Sat, Apr 28, 2018 at 1:57 AM, Clément Bera <bera.clement at gmail.com>
>> wrote:
>>
>>> Hi Eliot, Hi all,
>>>
>>> On mac and linux, Spur uses mmap to allocate new segments. The V3
>>> memory manager used malloc instead. I've looked into many other VMs
>>> (Javascript and Java), and most of them use posix_memalign (basically
>>> malloc where you can ask for specific alignment).
>>>
>>
>> And on Windows it uses VirtualAlloc.  So it is consistent in using memory
>> mapping to allocate segments across the platforms, where available.
>>
>>
>>> I am wondering why we are using mmap over posix_memalign / malloc. The
>>> only reason I can find is that Spur always allocate new memory segments at
>>> a higher address than past segments to guarantee that young objects are on
>>> lower addresses than old objects for the write barrier. Is that correct?
>>>
>>
>> Well, I don't like using malloc because one is layering unnecessarily and
>> hence there is wastage.  Many malloc implementations are optimized for
>> small block sizes and allocating a huge block
>> - may have a segment allocated all to itself
>> - won't necessarily be on a page boundary (especially on systems with
>> very large pages)
>>
>>
>> Assuming it is correct, let's say I change Spur to implement the write
>>> barrier differently (typically, I change all objects to be aligned on 128
>>> bits instead of 64 and have different allocation alignment for young (128
>>> bits alignment) and old objects(128+64 bits alignment)). Will we be able to
>>> use posix_memalign / malloc to allocate new memory segment if I do that
>>> ?
>>>
>>
>> Sure, but why?  Given that using mmap/VirtualAlloc gives page alignment,
>> one is going to get alignment up to at least 256 bytes (ancient VAX page
>> size) and more typically 4k bytes (x86/x86_64) .
>>
>>
>>> Or does the VM rely on segments being on higher addresses for other
>>> reasons ? For example, does the VM assume CogMethods are on lower addresses
>>> than objects on heap and rely on it to check if a stack frame is mframe or
>>> iframe ?
>>>
>>
>> Well indeed being able to reply on ordering makes the boundary checks in
>> the store checks simpler.  I think you wrote a blog post on this so you;ve
>> actually captured this info before.  But to reiterate, the Cog and Stack VM
>> assumes the following memory orderings:
>>
>> 1. all primitive functions are above 1024.  This allows the quick
>> primitives to be stored in the method cache with a primitive function
>> pointer that is their index and for executeNewMethod et al to compare the
>> primitiveFunctionPointer against MaxQuickPrimitiveIndex and dispatch to
>> quickPrimitiveResponse
>>
>> 2. New space is below all old space segments, and is immediately below
>> the first old space segment.  This allows isOldObject:/isYoungObject: et al
>> to compare an oop against newSpaceLimit/oldSpaceStart/nilObj (yes we
>> have three different names for exactly the same value; we only need two;
>> the fact that nilObj = oldSpaceStart is incidental).
>>
>> 3. the code zone is below new space.  This allows isReallyYoungObject: to
>> use two comparisons, instead of three.
>>
>> So let me ask you the corollary.  Why, if mmap/VirtualAlloc provides
>> memory aligned on a page boundary, with no overhead, and control over
>> placement, why would one use posix_memalign or malloc to allocate memory?
>>
>
>>
>>> Thanks,
>>>
>>> --
>>> Clément Béra
>>> https://clementbera.github.io/
>>> https://clementbera.wordpress.com/
>>>
>>
>>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
>>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180507/8a547f6e/attachment-0001.html>