[Vm-dev] Primitive to crop a ByteArray?

Fri Nov 9 05:42:40 UTC 2012

On 8 November 2012 19:51, Bert Freudenberg <bert at freudenbergs.de> wrote:
>
>
> On 08.11.2012, at 22:42, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
>
>>
>> On Thu, Nov 8, 2012 at 4:48 PM, Mariano Martinez Peck
>> <marianopeck at gmail.com> wrote:
>>> On Thu, Nov 8, 2012 at 4:38 PM, Bert Freudenberg <bert at freudenbergs.de> wrote:
>>>>
>>>> On 2012-11-08, at 16:22, Mariano Martinez Peck <marianopeck at gmail.com> wrote:
>>>>
>>>>> Hi guys. I have the following scenario. I have a buffer (ByteArray)
>>>>> that I pass by FFI to a function of size N. This function puts data in
>>>>> the array and answers me the M number of bytes that it put. M <= N.
>>>>> Finally, I need to copy the array of size N to the accuare size M.
>>>>> To do that, I am using #copyFrom:to:. If the ByteArray is large (which
>>>>> could be the case), this function takes significant time because it
>>>>> needs to allocate space for the new large resulting array. So...is
>>>>> there a destructive primitive where I can "crop" the existing array,
>>>>> modify its size field and mark the remaining bytes as "free space for
>>>>> the heap".
>>>>>
>>>>> Do we have a primitive for that?
>>>>
>>>> We do not have any primitive that changes an object's size.
>>>
>>> :(
>>>
>>>>
>>>> However, if your problem is indeed the time for allocating the new array, then maybe there should be a primitive for that? E.g. one that copies a portion of one array to a new object. This would avoid having to initialize the memory of the new array - this is what's taking time, otherwise allocation is normally constant in time.
>>>>
>>>> OTOH it seems unusual that you would have to use extremely large buffers where initialization time matters. E.g. if you were to use this for reading from a file or stream then it might make more sense to use many smaller buffers rather than a single huge one, no?
>>>
>>> Hi Bert. I should have explained better. The library I am wrapping is
>>> LZ4, a fast compressor: http://code.google.com/p/lz4/
>>> The function to compress looks like this:
>>>
>>> int LZ4_compress   (const char* source, char* dest, int isize);
>>>
>>> /*
>>> LZ4_compress() :
>>>    Compresses 'isize' bytes from 'source' into 'dest'.
>>>    Destination buffer must be already allocated,
>>>    and must be sized to handle worst cases situations (input data not
>>> compressible)
>>>    Worst case size evaluation is provided by macro LZ4_compressBound()
>>>
>>>    isize  : is the input size. Max supported value is ~1.9GB
>>>    return : the number of bytes written in buffer dest
>>>
>>>
>>> So, from the image side, I am doing (summarized):
>>>
>>> dest := ByteArray new: (self funcCompressBound: aByteArray size) + 4.
>>> bytesCompressedOrError := self funcCompress: aByteArray
>>> byteArrayDestination: dest isize: aByteArray size maxOutputSize:
>>> maxOutputSize .
>>> compressed := dest copyFrom: 1 to: bytesCompressedOrError + 4.
>>>
>>> But a normal scenario of compression is when the bytearray is indeed
>>> quite large. The function requires that "dest" is already allocated.
>>> And then I need to do the #copyFrom:to:, and that is what I was trying
>>> to avoid.
>>
>> maybe there is something faster than #copyFrom:to:  for this case?
>
> Not yet. There was a discussion some time ago to add an "uninitialized allocation" primitive (i.e. a ByteArray/WordArray not initialized to 0) but I think we didn't follow up on that.
>
> Are you actually keeping that huge array in memory? If not, e.g you are sending it to disk or network, then just keeping a count around (like OrderedCollection), or putting it into a read stream) would suffice.
>

right, i also said that to Mariano: usually you split data on smaller
chunks and compress them.
And the compressed output you pipe to output stream.
So, yes, you can't avoid copying things around (because of piping to stream)..
but you cannot squeeze out of it much at this point.
And since you using stream, you don't really need to crop anything.

> - Bert -

-- 
Best regards,
Igor Stasenko.