[Vm-dev] Primitive to crop a ByteArray?

Levente Uzonyi leves at elte.hu
Fri Nov 9 02:21:34 UTC 2012


On Thu, 8 Nov 2012, Mariano Martinez Peck wrote:

>
> On Thu, Nov 8, 2012 at 4:48 PM, Mariano Martinez Peck
> <marianopeck at gmail.com> wrote:
>> On Thu, Nov 8, 2012 at 4:38 PM, Bert Freudenberg <bert at freudenbergs.de> wrote:
>>>
>>> On 2012-11-08, at 16:22, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
>>>
>>>> Hi guys. I have the following scenario. I have a buffer (ByteArray)
>>>> that I pass by FFI to a function of size N. This function puts data in
>>>> the array and answers me the M number of bytes that it put. M <= N.
>>>> Finally, I need to copy the array of size N to the accuare size M.
>>>> To do that, I am using #copyFrom:to:. If the ByteArray is large (which
>>>> could be the case), this function takes significant time because it
>>>> needs to allocate space for the new large resulting array. So...is
>>>> there a destructive primitive where I can "crop" the existing array,
>>>> modify its size field and mark the remaining bytes as "free space for
>>>> the heap".
>>>>
>>>> Do we have a primitive for that?
>>>
>>> We do not have any primitive that changes an object's size.
>>
>> :(
>>
>>>
>>> However, if your problem is indeed the time for allocating the new array, then maybe there should be a primitive for that? E.g. one that copies a portion of one array to a new object. This would avoid having to initialize the memory of the new array - this is what's taking time, otherwise allocation is normally constant in time.
>>>
>>> OTOH it seems unusual that you would have to use extremely large buffers where initialization time matters. E.g. if you were to use this for reading from a file or stream then it might make more sense to use many smaller buffers rather than a single huge one, no?
>>>
>>
>> Hi Bert. I should have explained better. The library I am wrapping is
>> LZ4, a fast compressor: http://code.google.com/p/lz4/
>> The function to compress looks like this:
>>
>> int LZ4_compress   (const char* source, char* dest, int isize);
>>
>> /*
>> LZ4_compress() :
>>     Compresses 'isize' bytes from 'source' into 'dest'.
>>     Destination buffer must be already allocated,
>>     and must be sized to handle worst cases situations (input data not
>> compressible)
>>     Worst case size evaluation is provided by macro LZ4_compressBound()
>>
>>     isize  : is the input size. Max supported value is ~1.9GB
>>     return : the number of bytes written in buffer dest
>>
>>
>> So, from the image side, I am doing (summarized):
>>
>> dest := ByteArray new: (self funcCompressBound: aByteArray size) + 4.
>> bytesCompressedOrError := self funcCompress: aByteArray
>> byteArrayDestination: dest isize: aByteArray size maxOutputSize:
>> maxOutputSize .
>> compressed := dest copyFrom: 1 to: bytesCompressedOrError + 4.
>>
>> But a normal scenario of compression is when the bytearray is indeed
>> quite large. The function requires that "dest" is already allocated.
>> And then I need to do the #copyFrom:to:, and that is what I was trying
>> to avoid.
>>
>>
>
> maybe there is something faster than #copyFrom:to:  for this case?

Do you think #copyFrom:to: is slow because it shows up in the profiler? If 
yes, then try running the same code without calling the compression 
function to see how "slow" #copyFrom:to: really is.
Also, calling #funcCompressBound: is just silly. Reimplement it in 
Smalltalk, it's easy and probably more efficient.
If it turns out that it's really #copyFrom:to: what's slow, then consider 
compressing your data in smaller chunks (at most a few MiB each). In this 
case you can even precalculate the upper bound.


Levente

P.S.: And please don't propose VM changes for such a marginal issue.
P.P.S.: I still doubt that using LZ4 will give you any benefit.


>
>>> - Bert -
>>>
>>
>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.com
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>


More information about the Vm-dev mailing list