[Vm-dev] VMBIGENDIAN question (was: A proposal to split VMMaker into subpackages)

Sat Mar 23 00:03:51 UTC 2013

2013/3/23 David T. Lewis <lewis at mail.msen.com>:
>
> On Fri, Mar 22, 2013 at 10:25:14PM +0100, Nicolas Cellier wrote:
>>
>> 2013/3/22 Bert Freudenberg <bert at freudenbergs.de>:
>> >
>> > On 2013-03-22, at 05:43, David T. Lewis <lewis at mail.msen.com> wrote:
>> >
>> >> "It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so."
>> >> -- Mark Twain
>> >>
>> >> An interpreter VM compiled with the normal C macros in sqMemoryAccess.h (for "performance"):
>> >>
>> >> 0 tinyBenchmarks. '417277913 bytecodes/sec; 14395420 sends/sec'
>> >> 0 tinyBenchmarks. '414239482 bytecodes/sec; 14646769 sends/sec'
>> >> 0 tinyBenchmarks. '417277913 bytecodes/sec; 14406658 sends/sec'
>> >>
>> >> The same interpreter VM with C macros replaced by Smalltalk slang (class MemoryAccess):
>> >>
>> >> 0 tinyBenchmarks. '455111111 bytecodes/sec; 14217973 sends/sec'
>> >> 0 tinyBenchmarks. '451897616 bytecodes/sec; 14485815 sends/sec'
>> >> 0 tinyBenchmarks. '453900709 bytecodes/sec; 14497194 sends/sec'
>> >>
>> >> Dave
>> >
>> > That is ... unexpected :)
>> >
>>
>> Well, that's almost the same code both in MemoryAcess and
>> sqMemoryAccess.h right?
>> Maybe a bit different if you define USE_INLINE_MEMORY_ACCESSORS:
>>
>>   static inline sqInt byteAt(sqInt oop)                               { return
>> byteAtPointer(pointerForOop(oop)); }
>>   static inline sqInt byteAtPointer(char *ptr)                        { return
>> (sqInt)(*((unsigned char *)ptr)); }
>>   static inline char *pointerForOop(usqInt oop)                       { return sqMemoryBase + oop; }
>>
>> Though I do not well see why it would not inline such simple piece,
>> gcc has a license to not honour the inline request.
>
>
> I think the differences here were done to support platforms that do not
> have the ability to do this.
>
>
>> On the other side MemoryAccess will always inline as we asked the code
>> generator to (self inline: true)
>> It would be worth verifying if one of the static function is generated
>> in the executable (with nm -a or something).
>
>
> IIRC a few of the methods must provide static functions to support linking
> from some of the support code, but otherwise everything is fully inlined.
>
>
>>
>> But I also see other subtle differences like this:
>>
>> intAtPointer: ptr put: val
>>       self inline: true.
>>       self var: #ptr type: 'char *'.
>>       self var: #val type: 'unsigned int'.
>>       ^ self cCoerce:
>>                       ((self cCoerce: ptr to: 'unsigned int *')
>>                               at: 0
>>                               put: val)
>>               to: 'sqInt'
>>
>> while the header tells
>>   static inline sqInt intAtPointerput(char *ptr, int val)     { return
>> (sqInt)(*((unsigned int *)ptr)= (int)val); }
>>
>> OK, you might think that casting int->unsigned int is no-op on
>> 2-complement machines.
>> But it's a distraction, we must omit the intermediate (*(unsigned int
>> *)) and just consider that the return value is assigned with the
>> parameter val.
>> So the header just copy an int->int, but MemoryAccess uses the
>> opposite cast unsigned int->int
>> It's also a no-op except that:
>> - the cast can overflow, which would be UB.
>> - gcc has a licence to presume you don't rely on UB and thus can
>> further consider the returned int is always >= 0
>> That assertion cannot be done in the case of sqMemoryAccess.h
>>
>> So all I see here has nothing to do with premature optimization.
>> It has to do with lack of understanding of the modern C standards, and
>> the absolute casualness attitude we take with signed and unsigned
>> types.
>>
>> Nicolas
>
> I'm not sure I understand the point you are making. The intent of MemoryAccess
> was to reproduce the existing macros as closely as possible, while writing
> them in Smalltalk. So for your example, the existing C macro is this:
>
>   static inline sqInt longAtPointerput(char *ptr, sqInt val)    { return (sqInt)(*((sqInt *)ptr)= (sqInt)val); }
>
>
> The Smalltalk method that I wrote as a replacement is this:
>
>
> intAtPointer: ptr
>         "Answer the unsigned integer value at a machine address. The result is a signed
>         sqInt value with binary value in the range 0 through 16rFFFFFFFF. If the
>         size of sqInt is 8 bytes (64-bit object memory) and size of integer is 4, then the
>         high order 4 bytes of the result are zero."
>
>         "sqInt intAtPointer(char *ptr) { return (sqInt)(*((unsigned int *)ptr)) }"
>
>         self inline: true.
>         self var: #ptr type: 'char *'.
>         ^ self cCoerce:
>                         ((self cCoerce: ptr to: 'unsigned int *') at: 0)
>                 to: 'sqInt'
>

No, you looked at the wrong function, it's intAtPointerput not intAtPointer
And one has signed argument val and the other an unsigned.

But never mind, it's not my day, my guess was totally wrong in this case because
- 1) (b=c) has the type of b, not the type of c
- 2) int = unsigned is not undefined behavior, it's just
implementation defined and the compiler cannot eliminate a further <0
test because the implementation is no-op bitwise.

Before giving lessons in C, one should always turn tongue in cheeks ;)
http://stackoverflow.com/questions/15581037/precedence-of-chained-assignments-and-casts

By the way, I wonder the utility of (unsigned *) cast.
It means than INT_MIN will be converted into UINT_MAX if sqInt is 64
bits and int is 32 bit long.

> And the generated C code is this:
>
> /*      Answer the unsigned integer value at a machine address. The result is a signed
>         sqInt value with binary value in the range 0 through 16rFFFFFFFF. If the
>         size of sqInt is 8 bytes (64-bit object memory) and size of integer is 4, then the
>         high order 4 bytes of the result are zero. */
> /*      sqInt intAtPointer(char *ptr) { return (sqInt)(*((unsigned int *)ptr)) } */
>
> static sqInt intAtPointer(char *ptr) {
>         return ((sqInt) ((((unsigned int *) ptr))[0]));
> }
>
> The generated C code is different from the macro only insofar as it expresses
> the pointer dereference as an array offset, which of course is an equivalent
> expression.
>
> I was using MemoryAccess as an example, to illustrate that using C macros
> for performance is not necessarily a good idea unless you have actually
> measured the performance.
>

These are not the macros I had in mind, I was thinking of two things
- named constants rather than hardcoded values (like INT_MAX)
- conditional directives with pre-process time macros

> MemoryAccess was written as a direct replacement for the corresponding
> C macros. I was attempting to make the generated code match the macros
> as closely as possible (right or wrong). I did this so I would be able
> to understand what the macros were doing, and so that I could see the
> actual code in a debugger and profile it at a low level with gprof. But
> perhaps most importantly, I did it in order to enable the compiler to
> issue warnings about type declaration problems that otherwise are hidden
> by the macros.
>
> I was really completely surprised to find that there was no performance
> penalty for doing this. In fact, it is actually faster by some measurements.
>
> I would like to think that perhaps this says more about the overall
> goodness of our Smalltalk CCodeGenerator and inliner than it says about
> the collective badness of our C programming skills :)
>
> Dave
>

Yeah, humility is the best approach with such kind of languages ;)

Nicolas

>
>>
>> > But it again shows the importance of the 3rd rule of Optimization.
>> > http://c2.com/cgi/wiki?RulesOfOptimization
>> >
>> > - Bert -
>> >