[Vm-dev] measurements and multiple bytecode sets (was: SiliconSqueak and RISC-V J Extension)

29 Mar 2018


      Eliot,
...
Right, exactly.  We want to know what is cheap on current hardware
and what is expensive, and what could be made cheaper.
The last one is not easy. That is my complaint about Urs Hölzle's 1995
ECOOP paper where he concluded that the special hardware in Sparc didn't
help Self. Part of his results were due to how costly traps were on
Sparc 8 and Solaris (more than 1000 clock cycles) so using it for
handling tags and register windows overflow/underflow wasn't a good
idea. He compared the code generated by his Self compiler with that
generated by C and didn't see any difference (hardly surprising since
Sparc was optimized for C and his compiler was optimized for Sparc). The
problem is that he couldn't measure the effect that any hardware
features Sparc didn't include would have if they were added. He could
make his compiler use or not register windows and report less then 1%
difference. But he couldn't guess what would happen if a PIC instruction
like I proposed were added.
http://www.cs.ucsb.edu/~urs/oocsb/papers/ecoop95-arch.pdf
...
Right.  The Cog JIT generates abstract instructions in an assembly-like
style.  We can add state that the abstract instructions easily.  So if we
modified some generation routines to set a "context" flag, such as
"doing allocation", "doing dispatch", "doing store check", "doing
marshalling", etc, we could label each instruction in the sequences we're
interested in with that flag.
Great idea! I had not considered looking at the Smalltalk code executing
while generating an instruction instead of looking at the generated
instructions. And I was thinking of Slang to C, which matters for the
Intepreter but not for code generated by Cog.
...
Then, for example, when we generate we could reduce that flag to a
set of bit vectors, one bit per byte of instruction space, one bit vector
per "interesting code type", and one bit vector that ors them together.
If you have less than 8 interesting categories you could have one label
byte per one instruction byte. That would be easy but wasteful since
only the bits in the byte corresponding to the start of an instruction
would matter.
...
Then on simulating each instruction we can test its address in the bit
vectors and find out what kind it is.  It will slow down simulation, but
we're happy to pay the premium fees when we're gathering statistics.
You could increment counters corresponding to each of the bits.
...
...
About instruction counts, they are certainly very important even if less
helpful than cycle counts.
Useful enough data for not too much effort.  Very not accurate cycle
counts will be a lot more work and much harder set to prove correct.
 In any case they depend on processor implementation and given the
those implementations evolve quickly I have never thought of
worthwhile doing micro measurements to find out what are fast
instruction sequences on specific versions.  People do do this and get
great results.  I've simply never worked in a situation where I felt I could
afford the effort.
For my own project I want to focus on very simple pipelined
implementations, so instruction counts would be good enough. But for the
J Extension work group we would need to know how the proposals would
affect a more advanced implementation like BOOM (Berkeley Out of Order
Machine).
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-157.html
...
...
I have not yet looked at how multiple bytecodes set are handled in Cog.
Claus's scheme is to maintain a single variable (at interpretation and JIT
time) called bytecodeSetOffset, which has values 0, 256, 512, 768 etc,
and this is added to the byte fetched.  bytecodeSetOffset must be set
on activating a method and returning from one.  It is essentially the
same idea as maintaining BCTableBase as a variable.
It would be trivial to convert one to the other (just add or subtract a
constant). In fact, if you arrange it so that the table is at address
zero in code space then they are the same.
-- Jecel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Vm-dev] measurements and multiple bytecode sets (was: SiliconSqueak and RISC-V J Extension)