<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 31, 2015 at 11:35 AM, Chris Muller <span dir="ltr"><<a href="mailto:asqueaker@gmail.com" target="_blank">asqueaker@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
Sometimes the number of bytes is only known in a variable, so would it<br>
be possible to do 4 primitives which accept the number of bits (or<br>
bytes) as an argument? (uint:at: uint:at:put:) * (big endian, little<br>
endian)<br></blockquote><div><br></div><div>Of course its possible, but such an architecture can hardly be quick. If one needs the flexible primitives then use them, but don't hobble the system by only providing them. Having a real 64-bit VM means that the use of 2 32-bit accesses is unnecessarily slow.</div><div><br></div><div>Which would you rather, and which would you think would be faster (I don't know, but I have my suspicions):</div><div><br></div><div>Expand the existing flexible integerAt: prims to integerAt:put:bytes:signed:bigEndian: (yuck), or implement this in terms of a wrapper something like<br></div><div><br></div><div>ByteArray>>integerAt: index bytes: numBytes signed: signed bigEndian: bigEndian</div><div><br></div><div> ^size >= 4</div><div> ifTrue:</div><div> [size = 8 ifTrue:</div><div> [value := self unsignedLong64At: index.</div><div> bigEndian ifTrue:</div><div> [value := self byteReverseEightBytes: value].</div><div> (sign := value bitShift: -63) ~= 0 ifTrue: "if the VM is intelligent about left shift of zero then this test is unnecessary..."</div><div> [value := value - ((sign bitAnd: 1) bitShift: 64)].</div><div> ^value].</div><div> size = 4 ifTrue:</div><div> [value := self unsignedLong32At: index.</div><div> bigEndian ifTrue:</div><div> [value := self byteReverseFourBytes: value].</div><div> (sign := value bitShift: -31) ~= 0 ifTrue: "if the VM is intelligent about left shift of zero then this test is unnecessary..."</div><div> [value := value - ((sign bitAnd: 1) bitShift: 32)].</div><div> ^value].</div><div> ^self error: 'size must be a power of two from 1 to 8']</div><div> ifFalse:</div><div>...</div><div><br></div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class=""><div class="h5"><br>
<br>
On Mon, Aug 31, 2015 at 12:25 PM, Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>> wrote:<br>
> Hi Chrises,<br>
><br>
> my vote would be to write these as 12 numbered primitives, (2,4 & 8<br>
> bytes) * (at: & at:put:) * (big & little endian) because they can be<br>
> performance critical and implementing them like this means the maximum<br>
> efficiency in both 32-bit and 64-bit Spur, plus the possibility of the JIT<br>
> implementing the primitives.<br>
><br>
> On Sun, Aug 30, 2015 at 10:01 PM, Chris Cunningham <<a href="mailto:cunningham.cb@gmail.com">cunningham.cb@gmail.com</a>><br>
> wrote:<br>
>><br>
>> Hi Chris,<br>
>><br>
>> I'm all for having the fastest that in the image that works. If you could<br>
>> make your version handle endianess, then I'm all for including it (at least<br>
>> in the 3 variants that are faster). My first use for this (interface for<br>
>> KAFKA) apparently requires bigEndianess, so I really want that supported.<br>
>><br>
>> It might be best to keep my naming, though - it follows the name pattern<br>
>> that is already in the class. Or will yours also support 128?<br>
>><br>
>> -cbc<br>
>><br>
>> On Sun, Aug 30, 2015 at 2:38 PM, Chris Muller <<a href="mailto:asqueaker@gmail.com">asqueaker@gmail.com</a>> wrote:<br>
>>><br>
>>> Hi Chris, I think these methods belong in the image with the fastest<br>
>>> implementation we can do.<br>
>>><br>
>>> I implemented 64-bit unsigned access for Ma Serializer back in 2005.<br>
>>> I modeled my implementation after Andreas' original approach which<br>
>>> tries to avoid LI arithmetic. I was curious whether your<br>
>>> implementations would be faster, because if they are then it could<br>
>>> benefit Magma. After loading "Ma Serializer" 1.5 (or head) into a<br>
>>> trunk image, I used the following script to take comparison<br>
>>> measurements:<br>
>>><br>
>>> | smallN largeN maBa cbBa | smallN := ((2 raisedTo: 13) to: (2<br>
>>> raisedTo: 14)) atRandom.<br>
>>> largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.<br>
>>> maBa := ByteArray new: 8.<br>
>>> cbBa := ByteArray new: 8.<br>
>>> maBa maUint: 64 at: 0 put: largeN.<br>
>>> cbBa unsignedLong64At: 1 put: largeN bigEndian: false.<br>
>>> self assert: (cbBa maUnsigned64At: 1) = (maBa unsignedLong64At: 1<br>
>>> bigEndian: false).<br>
>>> { 'cbc smallN write' -> [ cbBa unsignedLong64At: 1 put: smallN<br>
>>> bigEndian: false] bench.<br>
>>> 'ma smallN write' -> [cbBa maUint: 64 at: 0 put: smallN ] bench.<br>
>>> 'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false. ]<br>
>>> bench.<br>
>>> 'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.<br>
>>> 'cbc largeN write' -> [ cbBa unsignedLong64At: 1 put: largeN<br>
>>> bigEndian: false] bench.<br>
>>> 'ma largeN write' -> [cbBa maUint: 64 at: 0 put: largeN ] bench.<br>
>>> 'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false ]<br>
>>> bench.<br>
>>> 'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.<br>
>>> }<br>
>>><br>
>>> Here are the results:<br>
>>><br>
>>> 'cbc smallN write'->'3,110,000 per second. 322 nanoseconds per run.' .<br>
>>> 'ma smallN write'->'4,770,000 per second. 210 nanoseconds per run.' .<br>
>>> 'cbc smallN access'->'4,300,000 per second. 233 nanoseconds per run.' .<br>
>>> 'ma smallN access'->'16,400,000 per second. 60.9 nanoseconds per run.' .<br>
>>> 'cbc largeN write'->'907,000 per second. 1.1 microseconds per run.' .<br>
>>> 'ma largeN write'->'6,620,000 per second. 151 nanoseconds per run.' .<br>
>>> 'cbc largeN access'->'1,900,000 per second. 527 nanoseconds per run.' .<br>
>>> 'ma largeN access'->'1,020,000 per second. 982 nanoseconds per run.'<br>
>>><br>
>>> It looks like your 64-bit access is 86% faster for accessing the<br>
>>> high-end of the 64-bit range, but slower in the other 3 metrics.<br>
>>> Noticeably, it was only 14% as fast for writing the high-end of the<br>
>>> 64-bit range, and similarly as much slower for small-number access..<br>
>>><br>
>>><br>
>>> On Fri, Aug 28, 2015 at 6:01 PM, Chris Cunningham<br>
>>> <<a href="mailto:cunningham.cb@gmail.com">cunningham.cb@gmail.com</a>> wrote:<br>
>>> > Hi.<br>
>>> ><br>
>>> > I've committed a change to the inbox with changes to allow<br>
>>> > getting/putting<br>
>>> > 64bit values to ByteArrays (similar to 32 and 16 bit accessors). Could<br>
>>> > this<br>
>>> > be added to trunk?<br>
>>> ><br>
>>> > Also, first time I used the selective commit function - very nice! the<br>
>>> > changes I didn't want committed didn't, in fact, get commited. Just<br>
>>> > the<br>
>>> > desirable bits!<br>
>>> ><br>
>>> > -cbc<br>
>>> ><br>
>>> ><br>
>>> ><br>
>>><br>
>><br>
>><br>
>><br>
>><br>
><br>
><br>
><br>
> --<br>
> _,,,^..^,,,_<br>
> best, Eliot<br>
><br>
><br>
><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>
</div></div>