[Vm-dev] VMBIGENDIAN question (was: A proposal to split VMMaker into subpackages)

Fri Mar 22 23:41:42 UTC 2013

On Fri, Mar 22, 2013 at 10:25:14PM +0100, Nicolas Cellier wrote:
>  
> 2013/3/22 Bert Freudenberg <bert at freudenbergs.de>:
> >
> > On 2013-03-22, at 05:43, David T. Lewis <lewis at mail.msen.com> wrote:
> >
> >> "It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so."
> >> -- Mark Twain
> >>
> >> An interpreter VM compiled with the normal C macros in sqMemoryAccess.h (for "performance"):
> >>
> >> 0 tinyBenchmarks. '417277913 bytecodes/sec; 14395420 sends/sec'
> >> 0 tinyBenchmarks. '414239482 bytecodes/sec; 14646769 sends/sec'
> >> 0 tinyBenchmarks. '417277913 bytecodes/sec; 14406658 sends/sec'
> >>
> >> The same interpreter VM with C macros replaced by Smalltalk slang (class MemoryAccess):
> >>
> >> 0 tinyBenchmarks. '455111111 bytecodes/sec; 14217973 sends/sec'
> >> 0 tinyBenchmarks. '451897616 bytecodes/sec; 14485815 sends/sec'
> >> 0 tinyBenchmarks. '453900709 bytecodes/sec; 14497194 sends/sec'
> >>
> >> Dave
> >
> > That is ... unexpected :)
> >
> 
> Well, that's almost the same code both in MemoryAcess and
> sqMemoryAccess.h right?
> Maybe a bit different if you define USE_INLINE_MEMORY_ACCESSORS:
> 
>   static inline sqInt byteAt(sqInt oop)				{ return
> byteAtPointer(pointerForOop(oop)); }
>   static inline sqInt byteAtPointer(char *ptr)			{ return
> (sqInt)(*((unsigned char *)ptr)); }
>   static inline char *pointerForOop(usqInt oop)			{ return sqMemoryBase + oop; }
> 
> Though I do not well see why it would not inline such simple piece,
> gcc has a license to not honour the inline request.

I think the differences here were done to support platforms that do not
have the ability to do this.

> On the other side MemoryAccess will always inline as we asked the code
> generator to (self inline: true)
> It would be worth verifying if one of the static function is generated
> in the executable (with nm -a or something).

IIRC a few of the methods must provide static functions to support linking
from some of the support code, but otherwise everything is fully inlined.

> 
> But I also see other subtle differences like this:
> 
> intAtPointer: ptr put: val
> 	self inline: true.
> 	self var: #ptr type: 'char *'.
> 	self var: #val type: 'unsigned int'.
> 	^ self cCoerce:
> 			((self cCoerce: ptr to: 'unsigned int *')
> 				at: 0
> 				put: val)
> 		to: 'sqInt'
> 
> while the header tells
>   static inline sqInt intAtPointerput(char *ptr, int val)	{ return
> (sqInt)(*((unsigned int *)ptr)= (int)val); }
> 
> OK, you might think that casting int->unsigned int is no-op on
> 2-complement machines.
> But it's a distraction, we must omit the intermediate (*(unsigned int
> *)) and just consider that the return value is assigned with the
> parameter val.
> So the header just copy an int->int, but MemoryAccess uses the
> opposite cast unsigned int->int
> It's also a no-op except that:
> - the cast can overflow, which would be UB.
> - gcc has a licence to presume you don't rely on UB and thus can
> further consider the returned int is always >= 0
> That assertion cannot be done in the case of sqMemoryAccess.h
> 
> So all I see here has nothing to do with premature optimization.
> It has to do with lack of understanding of the modern C standards, and
> the absolute casualness attitude we take with signed and unsigned
> types.
> 
> Nicolas

I'm not sure I understand the point you are making. The intent of MemoryAccess
was to reproduce the existing macros as closely as possible, while writing
them in Smalltalk. So for your example, the existing C macro is this:

  static inline sqInt longAtPointerput(char *ptr, sqInt val)	{ return (sqInt)(*((sqInt *)ptr)= (sqInt)val); }

The Smalltalk method that I wrote as a replacement is this:

intAtPointer: ptr
	"Answer the unsigned integer value at a machine address. The result is a signed
	sqInt value with binary value in the range 0 through 16rFFFFFFFF. If the
	size of sqInt is 8 bytes (64-bit object memory) and size of integer is 4, then the
	high order 4 bytes of the result are zero."

	"sqInt intAtPointer(char *ptr) { return (sqInt)(*((unsigned int *)ptr)) }"

	self inline: true.
	self var: #ptr type: 'char *'.
	^ self cCoerce:
			((self cCoerce: ptr to: 'unsigned int *') at: 0)
		to: 'sqInt'

And the generated C code is this:

/*	Answer the unsigned integer value at a machine address. The result is a signed
	sqInt value with binary value in the range 0 through 16rFFFFFFFF. If the
	size of sqInt is 8 bytes (64-bit object memory) and size of integer is 4, then the
	high order 4 bytes of the result are zero. */
/*	sqInt intAtPointer(char *ptr) { return (sqInt)(*((unsigned int *)ptr)) } */

static sqInt intAtPointer(char *ptr) {
	return ((sqInt) ((((unsigned int *) ptr))[0]));
}

The generated C code is different from the macro only insofar as it expresses
the pointer dereference as an array offset, which of course is an equivalent
expression.

I was using MemoryAccess as an example, to illustrate that using C macros
for performance is not necessarily a good idea unless you have actually
measured the performance.

MemoryAccess was written as a direct replacement for the corresponding
C macros. I was attempting to make the generated code match the macros
as closely as possible (right or wrong). I did this so I would be able
to understand what the macros were doing, and so that I could see the
actual code in a debugger and profile it at a low level with gprof. But
perhaps most importantly, I did it in order to enable the compiler to
issue warnings about type declaration problems that otherwise are hidden
by the macros.

I was really completely surprised to find that there was no performance
penalty for doing this. In fact, it is actually faster by some measurements.

I would like to think that perhaps this says more about the overall
goodness of our Smalltalk CCodeGenerator and inliner than it says about
the collective badness of our C programming skills :)

Dave

> 
> > But it again shows the importance of the 3rd rule of Optimization.
> > http://c2.com/cgi/wiki?RulesOfOptimization
> >
> > - Bert -
> >