"Just Curious" VM question
andreas.raab at gmx.de
Mon Sep 15 23:50:11 UTC 2003
> Unless we want to make damn big table somewhere I'd suggest
> putting the address of the relevant prim function in the cache
> instead of the index we currently use. Slang isn't terribly
> clean for this but never mind. Sure we can come up with something.
Been there done that ;-) Turns out that some ABIs (==MacOS) have a really
stupid overhead when a function call goes "cross-fragment" and the burden to
determine this is with the caller and not the callee. In other words, if you
call a "function through pointer" you have a measurable overhead when you do
it often enough. I wanted to get this fixed back when we introduced the
external prims (my idea was to switch everything to named prims and hand out
primitive indexes dynamically) but using the table dispatch slowed down the
Mac VM measurably (nothing huge but IIRC 2-3% went away). Given that all of
the numbered prims are the "truly time-critical ones" it didn't seem worth
the effort (and also, named prims were new and we hadn't thought about how
to make them really fast so the "default" at this time was still numbered).
Also, we use some of the primitive numbers right in the interpreter loop
(like the quick prims) which would be hard to do based on primitive address.
Of course, if we wanted to get really fancy we could declare that no named
primitive must ever use the 1k lower range (which I think is reasonable) and
just store the address of the named function anyway. But then, unless
there's a _need_ for it I don't see any point wasting effort to make things
infinitely fast in places where it doesn't count ;-)
> Further advantages; could cache more specialized
> versions of byteAt/put/wordat/put/etc thus
> speeding up the frequent scanning of homogeneous lists.
Not sure I understand this. Can you elaborate?
Completely OT point here but one of the things where I _would_ dramatically
like the ability to "store a function address" in the mcache is for the FFI
- with a bit of help by some native code generation this could make
marshalling incredibly fast with only very little help by something like
Ian's ccg ;-)
More information about the Squeak-dev