Moving this to vm-dev for discussion of the BitBlt primitive.
On Fri, Dec 20, 2013 at 02:13:29AM +0100, Nicolas Cellier wrote:
2013/12/20 Bert Freudenberg bert@freudenbergs.de
On 20.12.2013, at 00:55, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2013/12/13 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com
I feel like it would be better to change pixelValueAt: so as to always unhibernate, but I did not dare if ever someone had the idea of sending it in tight loops (one shouldn't, there is BitBlt for that purpose, but who knows...).
Maybe we could fail primitivePixelValueAt if receiver is Byte like, and unhibernate in fallback code. We would let every other BitBlt primitives accept a ByteArray to allow BitBlt tricks.
Thoughts?
No one raised a comment so far. Isn't it a good idea to fail the primitive?
The primitive must fail, yes. That is simply a bug. Before that primitive existed, pixelValueAt: used BitBlt which did the unhibernate.
If yes, then it's very simple to implement, just add:
(interpreterProxy isWords: bitmap) ifFalse:[interpreterProxy
primitiveFail].
IMHO the primitive should do the same check as BitBlt: verify that the size of the bits is exactly right given width, height, and depth.
- Bert -
I opened http://bugs.squeak.org/view.php?id=7799 just in case.
Thanks for opening the bugs.squeak.org issue. Your patch adds this:
bitsSize := interpreterProxy byteSizeOf: bitmap. ((interpreterProxy isWordsOrBytes: bitmap) and: [bitsSize = (stride * height * 4 "bytes per word")]) ifFalse: [^interpreterProxy primitiveFail].
Which translates as:
bitsSize = interpreterProxy->byteSizeOf(bitmap); if (!((interpreterProxy->isWordsOrBytes(bitmap)) && (bitsSize == ((stride * height) * 4)))) { interpreterProxy->primitiveFail(); return null; }
This looks like exactly what Bert was suggesting.
I think there are two things we should test:
1) Check if the extra computation has an effect on performance.
2) Check to make sure the logic works on the 64 bit image, to make sure that the 4 bytes per word does the right thing (I think that it will work, and I can follow up on this later to be sure).
Dave
2013/12/20 David T. Lewis lewis@mail.msen.com
Moving this to vm-dev for discussion of the BitBlt primitive.
On Fri, Dec 20, 2013 at 02:13:29AM +0100, Nicolas Cellier wrote:
2013/12/20 Bert Freudenberg bert@freudenbergs.de
On 20.12.2013, at 00:55, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2013/12/13 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com
I feel like it would be better to change pixelValueAt: so as to always unhibernate, but I did not dare if ever someone had the idea of
sending it
in tight loops (one shouldn't, there is BitBlt for that purpose, but
who
knows...).
Maybe we could fail primitivePixelValueAt if receiver is Byte like,
and
unhibernate in fallback code. We would let every other BitBlt primitives accept a ByteArray to allow BitBlt tricks.
Thoughts?
No one raised a comment so far. Isn't it a good idea to fail the primitive?
The primitive must fail, yes. That is simply a bug. Before that
primitive
existed, pixelValueAt: used BitBlt which did the unhibernate.
If yes, then it's very simple to implement, just add:
(interpreterProxy isWords: bitmap) ifFalse:[interpreterProxy
primitiveFail].
IMHO the primitive should do the same check as BitBlt: verify that the size of the bits is exactly right given width, height, and depth.
- Bert -
I opened http://bugs.squeak.org/view.php?id=7799 just in case.
Thanks for opening the bugs.squeak.org issue. Your patch adds this:
bitsSize := interpreterProxy byteSizeOf: bitmap. ((interpreterProxy isWordsOrBytes: bitmap) and: [bitsSize = (stride * height * 4 "bytes per word")]) ifFalse: [^interpreterProxy primitiveFail].
Which translates as:
bitsSize = interpreterProxy->byteSizeOf(bitmap); if (!((interpreterProxy->isWordsOrBytes(bitmap)) && (bitsSize ==
((stride * height) * 4)))) { interpreterProxy->primitiveFail(); return null; }
This looks like exactly what Bert was suggesting.
I think there are two things we should test:
- Check if the extra computation has an effect on performance.
I don't think the difference will be measurable, that's very few ops overhead... Even if it was, it must be compared to in-image protection alternative. So I don't think that this is relevant in this case.
- Check to make sure the logic works on the 64 bit image, to make sure
that the 4 bytes per word does the right thing (I think that it will work, and I can follow up on this later to be sure).
Dave
The 4 bytes per word is also hardcoded in other BitBlt primitive because that's really what the bit blt expect, 32bit words. I'm a bit puzzled by 64bits image terminology, is variableWordSubclass: composed of 32bits words or 64 bits words (like pointers)?
On Fri, Dec 20, 2013 at 04:54:39PM +0100, Nicolas Cellier wrote:
2013/12/20 David T. Lewis lewis@mail.msen.com
Moving this to vm-dev for discussion of the BitBlt primitive.
Thanks for opening the bugs.squeak.org issue. Your patch adds this:
bitsSize := interpreterProxy byteSizeOf: bitmap. ((interpreterProxy isWordsOrBytes: bitmap) and: [bitsSize = (stride * height * 4 "bytes per word")]) ifFalse: [^interpreterProxy primitiveFail].
Which translates as:
bitsSize = interpreterProxy->byteSizeOf(bitmap); if (!((interpreterProxy->isWordsOrBytes(bitmap)) && (bitsSize ==
((stride * height) * 4)))) { interpreterProxy->primitiveFail(); return null; }
This looks like exactly what Bert was suggesting.
I think there are two things we should test:
- Check if the extra computation has an effect on performance.
I don't think the difference will be measurable, that's very few ops overhead... Even if it was, it must be compared to in-image protection alternative. So I don't think that this is relevant in this case.
You are probably right, although I would not mind checking to be sure. Can you suggest a simple test that I can run to exercise the primitive heavily? I will try to follow up later with a timeToRun comparison of a VM before and after the change.
- Check to make sure the logic works on the 64 bit image, to make sure
that the 4 bytes per word does the right thing (I think that it will work, and I can follow up on this later to be sure).
Dave
The 4 bytes per word is also hardcoded in other BitBlt primitive because that's really what the bit blt expect, 32bit words.
I am quite sure that you are right about this, I just like to double check it whenever I see a "4" entering the code.
I'm a bit puzzled by 64bits image terminology, is variableWordSubclass: composed of 32bits words or 64 bits words (like pointers)?
The terminology is confusing. A "word" for variableWordSubclass: is always 32 bits in size, for all object memory formats (so far). In the 64 bit object memory, "bytes per word" is 8 rather than 4, which leads to confusion with the 4 byte "word". Neither of these kinds of word is related in any way to the C pointer size.
In the 64 bit object memory (image format 68002 or 68003, I'm not sure about Spur), the 32 bit Smalltalk words are packed into the 64 bit object memory words. There are two Smalltalk words per object memory word, and eight bytes per object memory word. An object pointer (oop) is 8 bytes, so there is one oop per object memory word. This is all independent of C pointer size, and a 64-bit object memory will run (slowly) on a VM compiled as a 32 bit executable (so there is no reason in principle that you cannot run a 64-bit object memory on a 32-bit Cog VM, and this is currently possible with an interpreter VM).
Some of the terminology is explained at http://www.squeakvm.org/squeak64/faq.html
Dave
2013/12/20 David T. Lewis lewis@mail.msen.com
On Fri, Dec 20, 2013 at 04:54:39PM +0100, Nicolas Cellier wrote:
2013/12/20 David T. Lewis lewis@mail.msen.com
Moving this to vm-dev for discussion of the BitBlt primitive.
Thanks for opening the bugs.squeak.org issue. Your patch adds this:
bitsSize := interpreterProxy byteSizeOf: bitmap. ((interpreterProxy isWordsOrBytes: bitmap) and: [bitsSize = (stride * height * 4 "bytes per
word")])
ifFalse: [^interpreterProxy primitiveFail].
Which translates as:
bitsSize = interpreterProxy->byteSizeOf(bitmap); if (!((interpreterProxy->isWordsOrBytes(bitmap)) && (bitsSize
==
((stride * height) * 4)))) { interpreterProxy->primitiveFail(); return null; }
This looks like exactly what Bert was suggesting.
I think there are two things we should test:
- Check if the extra computation has an effect on performance.
I don't think the difference will be measurable, that's very few ops overhead... Even if it was, it must be compared to in-image protection alternative. So I don't think that this is relevant in this case.
You are probably right, although I would not mind checking to be sure. Can you suggest a simple test that I can run to exercise the primitive heavily? I will try to follow up later with a timeToRun comparison of a VM before and after the change.
most heavy use are probably Form>>floodFill2:at: and PNMReadWriter>>nextPutRGB:
For example:
AndreasSystemProfiler spyOn: [| prw | prw := PNMReadWriter new. prw stream: (ByteArray new: 1e5) writeStream. [prw nextPutImage: PaintBoxMorph paletteImage] bench].
- Check to make sure the logic works on the 64 bit image, to make sure
that the 4 bytes per word does the right thing (I think that it will
work,
and I can follow up on this later to be sure).
Dave
The 4 bytes per word is also hardcoded in other BitBlt primitive because that's really what the bit blt expect, 32bit words.
I am quite sure that you are right about this, I just like to double check it whenever I see a "4" entering the code.
I'm a bit puzzled by 64bits image terminology, is variableWordSubclass: composed of 32bits words or 64 bits words (like pointers)?
The terminology is confusing. A "word" for variableWordSubclass: is always 32 bits in size, for all object memory formats (so far). In the 64 bit object memory, "bytes per word" is 8 rather than 4, which leads to confusion with the 4 byte "word". Neither of these kinds of word is related in any way to the C pointer size.
In the 64 bit object memory (image format 68002 or 68003, I'm not sure about Spur), the 32 bit Smalltalk words are packed into the 64 bit object memory words. There are two Smalltalk words per object memory word, and eight bytes per object memory word. An object pointer (oop) is 8 bytes, so there is one oop per object memory word. This is all independent of C pointer size, and a 64-bit object memory will run (slowly) on a VM compiled as a 32 bit executable (so there is no reason in principle that you cannot run a 64-bit object memory on a 32-bit Cog VM, and this is currently possible with an interpreter VM).
Some of the terminology is explained at http://www.squeakvm.org/squeak64/faq.html
Dave
On Fri, Dec 20, 2013 at 06:39:23PM +0100, Nicolas Cellier wrote:
2013/12/20 David T. Lewis lewis@mail.msen.com
On Fri, Dec 20, 2013 at 04:54:39PM +0100, Nicolas Cellier wrote:
2013/12/20 David T. Lewis lewis@mail.msen.com
I think there are two things we should test:
- Check if the extra computation has an effect on performance.
I don't think the difference will be measurable, that's very few ops overhead... Even if it was, it must be compared to in-image protection alternative. So I don't think that this is relevant in this case.
You are probably right, although I would not mind checking to be sure. Can you suggest a simple test that I can run to exercise the primitive heavily? I will try to follow up later with a timeToRun comparison of a VM before and after the change.
most heavy use are probably Form>>floodFill2:at: and PNMReadWriter>>nextPutRGB:
For example:
AndreasSystemProfiler spyOn: [| prw | prw := PNMReadWriter new. prw stream: (ByteArray new: 1e5) writeStream. [prw nextPutImage: PaintBoxMorph paletteImage] bench].
As you predicted, performance does not seem to be negatively affected by the change.
I tested like this:
someForms := Form allSubInstances. testBlock := [someForms do: [:form | 10 timesRepeat: [ 1 to: form width do: [:x | 1 to: form height do: [:y | form primPixelValueAtX: x y: y ]]]]].
Results for an interpreter VM prior to making the change: Time millisecondsToRun: testBlock. ==> 10548 "interpreter before change" Time millisecondsToRun: testBlock. ==> 10596 "interpreter before change" Time millisecondsToRun: testBlock. ==> 10741 "interpreter before change" Time millisecondsToRun: testBlock. ==> 10703 "interpreter before change" Time millisecondsToRun: testBlock. ==> 10668 "interpreter before change"
Result for a Cog VM prior to making the change, to confirm that most time is being spent in the primitive in this test: Time millisecondsToRun: testBlock. ==> 7582 "cog before change"
The same interpreter VM with your change applied: Time millisecondsToRun: testBlock. ==> 10681 "interpreter after change" Time millisecondsToRun: testBlock. ==> 9849 "interpreter after change" Time millisecondsToRun: testBlock. ==> 9383 "interpreter after change" Time millisecondsToRun: testBlock. ==> 9405 "interpreter after change" Time millisecondsToRun: testBlock. ==> 9386 "interpreter after change"
Same as above with interpreter VM, and with my rearrangement to move the check for bitmap isWordsOrBytes ahead of any use of the bitmaps variable: Time millisecondsToRun: testBlock. ==> 10564 "interpreter after change" Time millisecondsToRun: testBlock. ==> 10097 "interpreter after change" Time millisecondsToRun: testBlock. ==> 10316 "interpreter after change" Time millisecondsToRun: testBlock. ==> 10295 "interpreter after change" Time millisecondsToRun: testBlock. ==> 9766 "interpreter after change"
Conclusion: The change does not hurt performance. In fact, performance is inexplicably better with the change (possibly related to some random effect on C compiler optimizer).
I updated Mantis with the final version of the method that I used for this test (also attached here). The change is to do the check for bitmap isWordsOrBytes immediately after taking bitmap from the stack, before any other use of the variable. The code will actually work fine as you wrote it (because byteSizeOf: bitmap will answer 0 if it is an integer object), but it seemed a bit safer on general principles to do the test as early as possible. On the other hand, the way you originally wrote it seems to be faster (I cannot say why). What do you think?
Dave
vm-dev@lists.squeakfoundation.org