[Vm-dev] usage of bitShift: in VMMaker & plugins

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Jan 8 18:42:15 UTC 2010


2010/1/8 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> 2010/1/8 David T. Lewis <lewis at mail.msen.com>:
>>
>> On Thu, Jan 07, 2010 at 08:07:03PM +0100, Nicolas Cellier wrote:
>>>
>>> Folks,
>>> browsing generated code I see ugly things like this one:
>>>
>>>       byte = jpegDecodeValueFromsize(dcTable, dcTableSize);
>>>       if (byte < 0) {
>>>               return interpreterProxy->primitiveFail();
>>>       }
>>>       if (byte != 0) {
>>>               bits = getBits(byte);
>>>               /* begin scaleAndSignExtend:inFieldWidth: */
>>>               if (bits < ((((byte - 1) < 0) ? ((usqInt) 1 >> -(byte - 1)) :
>>> ((usqInt) 1 << (byte - 1))))) {
>>>                       byte = (bits - (((byte < 0) ? ((usqInt) 1 >> -byte) : ((usqInt) 1
>>> << byte)))) + 1;
>>>                       goto l1;
>>>               } else {
>>>                       byte = bits;
>>>                       goto l1;
>>>               }
>>>       l1:     /* end scaleAndSignExtend:inFieldWidth: */;
>>>       }
>>>
>>> You would never write such code by hand because it costs a useless
>>> runtime test ()?:
>>> And you know very well that byte > 0 when engaging the bitShift:
>>> operation, so it should be:
>>>
>>>       byte = jpegDecodeValueFromsize(dcTable, dcTableSize);
>>>       if (byte < 0) {
>>>               return interpreterProxy->primitiveFail();
>>>       }
>>>       if (byte != 0) {
>>>               bits = getBits(byte);
>>>               /* begin scaleAndSignExtend:inFieldWidth: */
>>>               if (bits < ((usqInt) 1 << (byte - 1))) {
>>>                       byte = (bits - ((usqInt) 1 << byte)) + 1;
>>>                       goto l1;
>>>               } else {
>>>                       byte = bits;
>>>                       goto l1;
>>>               }
>>>       l1:     /* end scaleAndSignExtend:inFieldWidth: */;
>>>       }
>>>
>>> Of course, the generator cannot guess that easily when feeded with a
>>> bitShift: and a non literal argument...
>>> That's why we should use explicit << and >> directly in slang when we
>>> know the argument will be positive.
>>
>> Good tip on the difference in generated code for explicit << and >>.
>> However, I would change things like this with caution. The example you
>> give above is actually generated from this method:
>>
>>  scaleAndSignExtend: aNumber inFieldWidth: w
>>      self inline: true.
>>      aNumber < (1 >> (w - 1))
>>        ifTrue: [^aNumber - (1 bitShift: w) + 1]
>>        ifFalse: [^aNumber]
>>
>> There are no type declarations here, which means that the parameters
>> are treated as either signed 32 or 64 integers depending on image
>> format. This may or may not get mangled further by the slang inlining
>> process, but in any event you are operating on signed values and you
>> have no way of even knowing the size of the signed int. If in fact
>> the method is only ever used for small positive integer values, then
>> it would be fine to optimize it with the explicit shift operators.
>>
>
> 100% agree, this deserve care.
>
>> In this case there is no method comment, and the method name happens
>> to be #blahAndSignExtend, which certainly would not lead anyone to
>> expect that it is intended for use only on unsigned values. So it
>> might go faster if you optimize it for use for small positive values,
>> but you would definitely want to give it a different method name,
>> remove the conditional handling for negative values, add some type
>> declarations, and give it a method comment before doing so. (I think
>> that is exactly the case for this example, and the method actually
>> should be renamed and changed to take advantage of this. I'm just
>> trying to illustrate a point.)
>>
>> There are lots of issues like this throughout the VM and the plugins.
>> A few of them have been fixed, and many remain. Having personally
>> slogged through a few of them in order to get the interpreter and
>> a few plugins working on 64-bit machines, I have to say that I would
>> generally rather see code that is safe but a little bit slow, and
>> optimize it later in the hot spots.
>>
>
> Sure, hope this scale is universal:
> make it right > make it fast
> If in some cases we can make it both + simple, let's just do it :)
>

One example of this follows (rewrite is worth because method used in a
central loop):

JPEGReaderPlugin>>getBits: requestedBits
	| value |
	requestedBits > jsBitCount ifTrue:[
		self fillBuffer.
		requestedBits > jsBitCount ifTrue:[^-1]].
 ++   value := jsBitBuffer >> (jsBitCount - requestedBits).
 ++   jsBitBuffer := jsBitBuffer bitAnd: (1 << (jsBitCount - requestedBits)) -1.
 --    value := jsBitBuffer bitShift: (requestedBits - jsBitCount).
 --    jsBitBuffer := jsBitBuffer bitAnd: (1 bitShift: (jsBitCount -
requestedBits)) -1.
	jsBitCount := jsBitCount - requestedBits.
	^ value

Then it becomes obvious the last but one instruction can move two lines up:
getBits: requestedBits
	| value |
	requestedBits > jsBitCount ifTrue:[
		self fillBuffer.
		requestedBits > jsBitCount ifTrue:[^-1]].
	jsBitCount := jsBitCount - requestedBits.
	value := jsBitBuffer >> jsBitCount.
	jsBitBuffer := jsBitBuffer bitAnd: (1 << jsBitCount) -1.
	^ value

To my eyes, it is both simpler and more efficient.

Nicolas

> Maybe some of the 64bit uncompatibilities can be harvested by tracking
> compiler warnings like these:
> /Users/nicolas/Smalltalk/Squeak/trunk/src/vm/intplugins/SocketPlugin/SocketPlugin.c:288:0
> Implicit declaration of function 'sqResolverGetAddressInfoFamily'
> /Users/nicolas/Smalltalk/Squeak/trunk/src/vm/intplugins/AsynchFilePlugin/AsynchFilePlugin.c:237:0
> Passing argument 2 of 'asyncFileReadResult' makes integer from pointer
> without a cast
>
>>> I did that for example in http://bugs.squeak.org/view.php?id=7109 but
>>> see it could be generalized...
>>> ... Of course with due care!
>>
>> Now that looks like a worthwhile optimization for sure. I have not
>> looked at it, should it be pulled into the VMMaker package?
>>
>
> The speed up is around 30%, so I think it's worth.
> It works for me on a custom i386 linux and i386 mac VM.
> I can't see why it would not work on a big endian but I just can't test it.
>
> Nicolas
>
>>> Just my 2? tip of the day
>>
>> Good one, thanks.
>>
>> Dave
>>
>>
>


More information about the Vm-dev mailing list