[Vm-dev] Primitive to crop a ByteArray?

Bert Freudenberg bert at freudenbergs.de
Thu Nov 8 22:51:12 UTC 2012


On 08.11.2012, at 22:42, Mariano Martinez Peck <marianopeck at gmail.com> wrote:

> 
> On Thu, Nov 8, 2012 at 4:48 PM, Mariano Martinez Peck
> <marianopeck at gmail.com> wrote:
>> On Thu, Nov 8, 2012 at 4:38 PM, Bert Freudenberg <bert at freudenbergs.de> wrote:
>>> 
>>> On 2012-11-08, at 16:22, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
>>> 
>>>> Hi guys. I have the following scenario. I have a buffer (ByteArray)
>>>> that I pass by FFI to a function of size N. This function puts data in
>>>> the array and answers me the M number of bytes that it put. M <= N.
>>>> Finally, I need to copy the array of size N to the accuare size M.
>>>> To do that, I am using #copyFrom:to:. If the ByteArray is large (which
>>>> could be the case), this function takes significant time because it
>>>> needs to allocate space for the new large resulting array. So...is
>>>> there a destructive primitive where I can "crop" the existing array,
>>>> modify its size field and mark the remaining bytes as "free space for
>>>> the heap".
>>>> 
>>>> Do we have a primitive for that?
>>> 
>>> We do not have any primitive that changes an object's size.
>> 
>> :(
>> 
>>> 
>>> However, if your problem is indeed the time for allocating the new array, then maybe there should be a primitive for that? E.g. one that copies a portion of one array to a new object. This would avoid having to initialize the memory of the new array - this is what's taking time, otherwise allocation is normally constant in time.
>>> 
>>> OTOH it seems unusual that you would have to use extremely large buffers where initialization time matters. E.g. if you were to use this for reading from a file or stream then it might make more sense to use many smaller buffers rather than a single huge one, no?
>> 
>> Hi Bert. I should have explained better. The library I am wrapping is
>> LZ4, a fast compressor: http://code.google.com/p/lz4/
>> The function to compress looks like this:
>> 
>> int LZ4_compress   (const char* source, char* dest, int isize);
>> 
>> /*
>> LZ4_compress() :
>>    Compresses 'isize' bytes from 'source' into 'dest'.
>>    Destination buffer must be already allocated,
>>    and must be sized to handle worst cases situations (input data not
>> compressible)
>>    Worst case size evaluation is provided by macro LZ4_compressBound()
>> 
>>    isize  : is the input size. Max supported value is ~1.9GB
>>    return : the number of bytes written in buffer dest
>> 
>> 
>> So, from the image side, I am doing (summarized):
>> 
>> dest := ByteArray new: (self funcCompressBound: aByteArray size) + 4.
>> bytesCompressedOrError := self funcCompress: aByteArray
>> byteArrayDestination: dest isize: aByteArray size maxOutputSize:
>> maxOutputSize .
>> compressed := dest copyFrom: 1 to: bytesCompressedOrError + 4.
>> 
>> But a normal scenario of compression is when the bytearray is indeed
>> quite large. The function requires that "dest" is already allocated.
>> And then I need to do the #copyFrom:to:, and that is what I was trying
>> to avoid.
> 
> maybe there is something faster than #copyFrom:to:  for this case?

Not yet. There was a discussion some time ago to add an "uninitialized allocation" primitive (i.e. a ByteArray/WordArray not initialized to 0) but I think we didn't follow up on that. 

Are you actually keeping that huge array in memory? If not, e.g you are sending it to disk or network, then just keeping a count around (like OrderedCollection), or putting it into a read stream) would suffice.

- Bert -


More information about the Vm-dev mailing list