[Vm-dev] usage of bitShift: in VMMaker & plugins
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Fri Jan 8 18:42:15 UTC 2010
2010/1/8 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> 2010/1/8 David T. Lewis <lewis at mail.msen.com>:
>>
>> On Thu, Jan 07, 2010 at 08:07:03PM +0100, Nicolas Cellier wrote:
>>>
>>> Folks,
>>> browsing generated code I see ugly things like this one:
>>>
>>> byte = jpegDecodeValueFromsize(dcTable, dcTableSize);
>>> if (byte < 0) {
>>> return interpreterProxy->primitiveFail();
>>> }
>>> if (byte != 0) {
>>> bits = getBits(byte);
>>> /* begin scaleAndSignExtend:inFieldWidth: */
>>> if (bits < ((((byte - 1) < 0) ? ((usqInt) 1 >> -(byte - 1)) :
>>> ((usqInt) 1 << (byte - 1))))) {
>>> byte = (bits - (((byte < 0) ? ((usqInt) 1 >> -byte) : ((usqInt) 1
>>> << byte)))) + 1;
>>> goto l1;
>>> } else {
>>> byte = bits;
>>> goto l1;
>>> }
>>> l1: /* end scaleAndSignExtend:inFieldWidth: */;
>>> }
>>>
>>> You would never write such code by hand because it costs a useless
>>> runtime test ()?:
>>> And you know very well that byte > 0 when engaging the bitShift:
>>> operation, so it should be:
>>>
>>> byte = jpegDecodeValueFromsize(dcTable, dcTableSize);
>>> if (byte < 0) {
>>> return interpreterProxy->primitiveFail();
>>> }
>>> if (byte != 0) {
>>> bits = getBits(byte);
>>> /* begin scaleAndSignExtend:inFieldWidth: */
>>> if (bits < ((usqInt) 1 << (byte - 1))) {
>>> byte = (bits - ((usqInt) 1 << byte)) + 1;
>>> goto l1;
>>> } else {
>>> byte = bits;
>>> goto l1;
>>> }
>>> l1: /* end scaleAndSignExtend:inFieldWidth: */;
>>> }
>>>
>>> Of course, the generator cannot guess that easily when feeded with a
>>> bitShift: and a non literal argument...
>>> That's why we should use explicit << and >> directly in slang when we
>>> know the argument will be positive.
>>
>> Good tip on the difference in generated code for explicit << and >>.
>> However, I would change things like this with caution. The example you
>> give above is actually generated from this method:
>>
>> scaleAndSignExtend: aNumber inFieldWidth: w
>> self inline: true.
>> aNumber < (1 >> (w - 1))
>> ifTrue: [^aNumber - (1 bitShift: w) + 1]
>> ifFalse: [^aNumber]
>>
>> There are no type declarations here, which means that the parameters
>> are treated as either signed 32 or 64 integers depending on image
>> format. This may or may not get mangled further by the slang inlining
>> process, but in any event you are operating on signed values and you
>> have no way of even knowing the size of the signed int. If in fact
>> the method is only ever used for small positive integer values, then
>> it would be fine to optimize it with the explicit shift operators.
>>
>
> 100% agree, this deserve care.
>
>> In this case there is no method comment, and the method name happens
>> to be #blahAndSignExtend, which certainly would not lead anyone to
>> expect that it is intended for use only on unsigned values. So it
>> might go faster if you optimize it for use for small positive values,
>> but you would definitely want to give it a different method name,
>> remove the conditional handling for negative values, add some type
>> declarations, and give it a method comment before doing so. (I think
>> that is exactly the case for this example, and the method actually
>> should be renamed and changed to take advantage of this. I'm just
>> trying to illustrate a point.)
>>
>> There are lots of issues like this throughout the VM and the plugins.
>> A few of them have been fixed, and many remain. Having personally
>> slogged through a few of them in order to get the interpreter and
>> a few plugins working on 64-bit machines, I have to say that I would
>> generally rather see code that is safe but a little bit slow, and
>> optimize it later in the hot spots.
>>
>
> Sure, hope this scale is universal:
> make it right > make it fast
> If in some cases we can make it both + simple, let's just do it :)
>
One example of this follows (rewrite is worth because method used in a
central loop):
JPEGReaderPlugin>>getBits: requestedBits
| value |
requestedBits > jsBitCount ifTrue:[
self fillBuffer.
requestedBits > jsBitCount ifTrue:[^-1]].
++ value := jsBitBuffer >> (jsBitCount - requestedBits).
++ jsBitBuffer := jsBitBuffer bitAnd: (1 << (jsBitCount - requestedBits)) -1.
-- value := jsBitBuffer bitShift: (requestedBits - jsBitCount).
-- jsBitBuffer := jsBitBuffer bitAnd: (1 bitShift: (jsBitCount -
requestedBits)) -1.
jsBitCount := jsBitCount - requestedBits.
^ value
Then it becomes obvious the last but one instruction can move two lines up:
getBits: requestedBits
| value |
requestedBits > jsBitCount ifTrue:[
self fillBuffer.
requestedBits > jsBitCount ifTrue:[^-1]].
jsBitCount := jsBitCount - requestedBits.
value := jsBitBuffer >> jsBitCount.
jsBitBuffer := jsBitBuffer bitAnd: (1 << jsBitCount) -1.
^ value
To my eyes, it is both simpler and more efficient.
Nicolas
> Maybe some of the 64bit uncompatibilities can be harvested by tracking
> compiler warnings like these:
> /Users/nicolas/Smalltalk/Squeak/trunk/src/vm/intplugins/SocketPlugin/SocketPlugin.c:288:0
> Implicit declaration of function 'sqResolverGetAddressInfoFamily'
> /Users/nicolas/Smalltalk/Squeak/trunk/src/vm/intplugins/AsynchFilePlugin/AsynchFilePlugin.c:237:0
> Passing argument 2 of 'asyncFileReadResult' makes integer from pointer
> without a cast
>
>>> I did that for example in http://bugs.squeak.org/view.php?id=7109 but
>>> see it could be generalized...
>>> ... Of course with due care!
>>
>> Now that looks like a worthwhile optimization for sure. I have not
>> looked at it, should it be pulled into the VMMaker package?
>>
>
> The speed up is around 30%, so I think it's worth.
> It works for me on a custom i386 linux and i386 mac VM.
> I can't see why it would not work on a big endian but I just can't test it.
>
> Nicolas
>
>>> Just my 2? tip of the day
>>
>> Good one, thanks.
>>
>> Dave
>>
>>
>
More information about the Vm-dev
mailing list