[Vm-dev] New Cog VMs available

Thu Jul 14 00:53:29 UTC 2016

Hi Henrik,

On Tue, Jul 12, 2016 at 1:07 AM, Henrik Johansen <
henrik.s.johansen at veloxit.no> wrote:

>
> Could also be something gone wonky with the interpreters method cache,
> causing excessive misses and the fallback code (lookupOrdinary... and
> addNewMethod..) to run.
> Initially I'd suspect there might've been changes that'd affect hashing
> (such as classTag - index in class table, being used instead of class
> pointer), but I guess that happened before the first version.
>
> I would note, in the generated code for spur in the git repo, there's a
> case on top of addNewMethodToCache that doesn't actually add the found
> method to cache:
> if (!(((GIV(newMethod) & (tagMask())) == 0)
> && (((((usqInt) (longAt(GIV(newMethod)))) >> (formatShift())) &
> (formatMask())) >= (firstCompiledMethodFormat())))) {
> primitiveFunctionPointer = primitiveInvokeObjectAsMethod;
> return;
> }
>
> I don't know exactly what firstCompiledMethodFormat() entails, but if it
> means found method that is a CompiledMethod, that would pretty much mean
> the MethodCache is entirely unused, no?
>

The expression is ! (not) "is it non-immediate with compiled method
format".  So this expression adds the primitiveInvokeObjectAsMethod as the
primitive whenever what the lookup finds is /not/ a compiled method.  This
is the implementation of "objects as methods".  All compiled methods are
non-immediates and have a format >= firstCompiledMethodFormat:

Behavior>>instSpec
"Answer the instance specification part of the format that defines what
kind of object
an instance of the receiver is.  The formats are
0 = 0 sized objects (UndefinedObject True False et al)
1 = non-indexable objects with inst vars (Point et al)
2 = indexable objects with no inst vars (Array et al)
3 = indexable objects with inst vars (MethodContext AdditionalMethodState
et al)
4 = weak indexable objects with inst vars (WeakArray et al)
5 = weak non-indexable objects with inst vars (ephemerons) (Ephemeron)
6 = unused
7 = immediates (SmallInteger, Character)
8 = unused
9 = 64-bit indexable
10-11 = 32-bit indexable (Bitmap) (plus one odd bit, unused in 32-bits)
12-15 = 16-bit indexable (plus two odd bits, one unused in 32-bits)
16-23 = 8-bit indexable (plus three odd bits, one unused in 32-bits)
24-31 = compiled methods (CompiledMethod) (plus three odd bits, one unused
in 32-bits)
Note that in the VM instances also have a 5 bit format field that relates
to their class's format.
Formats 11, 13-15, 17-23 & 25-31 are unused in classes but used in
instances to define the
number of elements missing up to the slot size.  For example, a 2-byte
ByteString instance
has format 18 in 32-bits, since its size is one 32-bit slot - 2 bytes ((18
bitAnd: 3) = 2), and
22 in 64 bits, since its size is one 64-bit slot - 6 bytes ((22 bitAnd: 7)
= 6)."
^(format bitShift: -16) bitAnd: 16r1F

So the format encodes both "it is a compiled method" and the number of
unused trailing elements.

> Cheers,
> Henry
>
> On 07 Jul 2016, at 10:04 , Clément Bera <bera.clement at gmail.com> wrote:
>
> Yes. Number of samples and percentage of overall samples. You're right
> it's not that many. Here is the result when I run 5 times the bench (I
> usually use the profiler just as a hint to tweak performance, then the
> in-image results matter).
>
> New VM:
>
> % of vanilla vm code (% of total) (samples) (cumulative)
> 20.72%    (  9.98%) scavengeReferentsOf (2943) (20.72%)
> *10.52%    (  5.07%) lookupOrdinaryNoMNUEtcInClass (1494) (31.24%)*
>   6.63%    (  3.19%) processWeakSurvivor (941) (37.87%)
>   5.62%    (  2.71%) copyAndForward (798) (43.49%)
>  * 4.94%    (  2.38%) addNewMethodToCache (702) (48.43%)*
>   4.86%    (  2.34%) doScavenge (690) (53.29%)
>   3.20%    (  1.54%) primitiveStringReplace (455) (56.50%)
>   3.06%    (  1.48%) moveFramesInthroughtoPage (435) (59.56%)
>   2.94%    (  1.42%) compact (418) (62.50%)
>  * 2.87%    (  1.38%) interpret (407) (65.37%)*
>  * 2.52%    (  1.21%) ceSendFromInLineCacheMiss (358) (67.89%)*
>
> Old VM:
>
> % of vanilla vm code (% of total) (samples) (cumulative)
> 27.19%    (10.97%) scavengeReferentsOf (2615) (27.19%)
>   9.80%    (  3.95%) processWeakSurvivor (943) (36.99%)
>   8.76%    (  3.54%) copyAndForward (843) (45.75%)
>   7.06%    (  2.85%) doScavenge (679) (52.81%)
>   4.81%    (  1.94%) primitiveStringReplace (463) (57.63%)
>   4.28%    (  1.73%) moveFramesInthroughtoPage (412) (61.91%)
>   *4.15%    (  1.67%) lookupOrdinaryNoMNUEtcInClass (399) (66.06%)*
>   2.44%    (  0.99%) marryFrameSP (235) (68.50%)
>   2.30%    (  0.93%) isWidowedContext (221) (70.80%)
>   2.19%    (  0.88%) handleStackOverflow (211) (72.99%)
>   1.64%    (  0.66%) primitiveCompareString (158) (74.63%)
>   1.50%    (  0.60%) findMethodWithPrimitiveFromContextUpToContext (144)
> (76.13%)
>   1.50%    (  0.60%) ceBaseFrameReturn (144) (77.63%)
>   1.45%    (  0.58%) ceStackOverflow (139) (79.07%)
>   1.16%    (  0.47%) ceNonLocalReturn (112) (80.24%)
>   1.11%    (  0.45%) allocateNewSpaceSlotsformatclassIndex (107) (81.35%)
>   1.00%    (  0.40%) isBytes (96) (82.35%)
>   0.95%    (  0.38%) fetchClassOfNonImm (91) (83.29%)
>   0.93%    (  0.37%) stackValue (89) (84.22%)
>   0.90%    (  0.36%) returnToExecutivepostContextSwitch (87) (85.12%)
>  * 0.90%    (  0.36%) interpret (87) (86.03%)*
>  * 0.84%    (  0.34%) addNewMethodToCache (81) (86.87%)*
> *  0.83%    (  0.34%) ceSendFromInLineCacheMiss (80) (87.70%)*
>
> Now if I look at the 4 methods highlighted, instead of the 3 I looked at
> before, I see ~7% overhead. I'm not sure if compact got really slower or if
> it used to be inlined somewhere during slang compilation. I still think we
> should use these results as a hit and rely on the real benchmark for
> performance evaluation.
>
> If it's really a slang inlining problem, it will have side-effect on other
> functions, many will get a little bit faster, and it really seems this is
> the problem.
>
>
>
> On Thu, Jul 7, 2016 at 8:50 PM, Holger Freyther <holger at freyther.de>
> wrote:
>
>>
>> > On 07 Jul 2016, at 19:41, Clément Bera <bera.clement at gmail.com> wrote:
>> >
>> > Hi Holger,
>>
>>
>> Hi!
>>
>>
>> > I'm sorry for the delay since you reported the bug. Everyone working on
>> the VM is very busy (In my case, my time is consumed by my phd and sista,
>> I'm trying to open an alpha of the image+runtime before ESUG running at
>> ~1.5x).
>>
>> great.
>>
>>
>> >
>> > Now that the VM profiler is fully working I was able to look into the
>> regression.
>> >
>> > Old VM vanilla code:
>> >
>> > 27.94%    (11.54%)    scavengeReferentsOf
>>                                (528)   (27.94%)
>> > 10.21%    (  4.22%)   processWeakSurvivor
>>                                (193)   (38.15%)
>> >   8.15%    (  3.36%)  copyAndForward
>>                               (154)   (46.30%)
>> >   7.41%    (  3.06%)  doScavenge
>>                                       (140)   (53.70%)
>> >   5.19%    (  2.14%)  lookupOrdinaryNoMNUEtcInClass
>>                (98)            (58.89%)
>> >   4.71%    (  1.94%)  primitiveStringReplace
>>                               (89)            (63.60%)
>> >   4.29%    (  1.77%)  moveFramesInthroughtoPage
>>                        (81)            (67.88%)
>> >   2.06%    (  0.85%)  isWidowedContext
>>                                       (39)            (69.95%)
>> > [...]
>> >   0.90%    (  0.37%)  ceSendFromInLineCacheMiss
>>                        (17)            (85.61%)
>> >   0.85%    (  0.35%)  addNewMethodToCache
>>                        (16)            (86.46%)
>> >
>> > New VM vanilla code:
>> >
>> > % of vanilla vm code (% of total)
>>                                                (samples) (cumulative)
>> > 22.41%    (10.44%)    scavengeReferentsOf
>>                                (609)   (22.41%)
>> > 14.46%    (  6.74%)   lookupOrdinaryNoMNUEtcInClass
>>                (393)   (36.87%)
>> >
>>
>>
>> > I highlighted the problematic methods. It is likely that it has to do
>> with slang inlining.
>> >
>> > The 3 methods highlighted seems to be responsible for an overhead of
>> ~8.5% in the overall runtime. You seem to have an overhead of ~14% on my
>> machine. There's quite a difference. The slang inlining overhead may have
>> impacted other functions, and C profilers are never 100% accurate, so we
>> will see when the problem is fixed if something else is also problematic.
>> Now that we know what the problem is, I believe it will be fixed in the
>> incoming weeks when someone (likely Eliot, maybe me) has time. Thanks for
>> reporting the regression.
>>
>> What does (393) and (98) mean? Is it number of samples?
>>
>> holger
>>
>>
>>
>>
>>
>
>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160713/45b26150/attachment-0001.htm