Yes. Number of samples and percentage of overall samples. You're right it's not that many. Here is the result when I run 5 times the bench (I usually use the profiler just as a hint to tweak performance, then the in-image results matter).
New VM:
% of vanilla vm code (% of total) (samples) (cumulative)
20.72% ( 9.98%) scavengeReferentsOf (2943) (20.72%)
10.52% ( 5.07%) lookupOrdinaryNoMNUEtcInClass (1494) (31.24%)
6.63% ( 3.19%) processWeakSurvivor (941) (37.87%)
5.62% ( 2.71%) copyAndForward (798) (43.49%)
4.94% ( 2.38%) addNewMethodToCache (702) (48.43%)
4.86% ( 2.34%) doScavenge (690) (53.29%)
3.20% ( 1.54%) primitiveStringReplace (455) (56.50%)
3.06% ( 1.48%) moveFramesInthroughtoPage (435) (59.56%)
2.94% ( 1.42%) compact (418) (62.50%)
2.87% ( 1.38%) interpret (407) (65.37%)
2.52% ( 1.21%) ceSendFromInLineCacheMiss (358) (67.89%)
Old VM:
% of vanilla vm code (% of total) (samples) (cumulative)
27.19% (10.97%) scavengeReferentsOf (2615) (27.19%)
9.80% ( 3.95%) processWeakSurvivor (943) (36.99%)
8.76% ( 3.54%) copyAndForward (843) (45.75%)
7.06% ( 2.85%) doScavenge (679) (52.81%)
4.81% ( 1.94%) primitiveStringReplace (463) (57.63%)
4.28% ( 1.73%) moveFramesInthroughtoPage (412) (61.91%)
4.15% ( 1.67%) lookupOrdinaryNoMNUEtcInClass (399) (66.06%)
2.44% ( 0.99%) marryFrameSP (235) (68.50%)
2.30% ( 0.93%) isWidowedContext (221) (70.80%)
2.19% ( 0.88%) handleStackOverflow (211) (72.99%)
1.64% ( 0.66%) primitiveCompareString (158) (74.63%)
1.50% ( 0.60%) findMethodWithPrimitiveFromContextUpToContext (144) (76.13%)
1.50% ( 0.60%) ceBaseFrameReturn (144) (77.63%)
1.45% ( 0.58%) ceStackOverflow (139) (79.07%)
1.16% ( 0.47%) ceNonLocalReturn (112) (80.24%)
1.11% ( 0.45%) allocateNewSpaceSlotsformatclassIndex (107) (81.35%)
1.00% ( 0.40%) isBytes (96) (82.35%)
0.95% ( 0.38%) fetchClassOfNonImm (91) (83.29%)
0.93% ( 0.37%) stackValue (89) (84.22%)
0.90% ( 0.36%) returnToExecutivepostContextSwitch (87) (85.12%)
0.90% ( 0.36%) interpret (87) (86.03%)
0.84% ( 0.34%) addNewMethodToCache (81) (86.87%)
0.83% ( 0.34%) ceSendFromInLineCacheMiss (80) (87.70%)
Now if I look at the 4 methods highlighted, instead of the 3 I looked at before, I see ~7% overhead. I'm not sure if compact got really slower or if it used to be inlined somewhere during slang compilation. I still think we should use these results as a hit and rely on the real benchmark for performance evaluation.
If it's really a slang inlining problem, it will have side-effect on other functions, many will get a little bit faster, and it really seems this is the problem.