[Vm-dev] Issue 99 in cog: Link LZ4 Compression

Sat Oct 13 14:45:01 UTC 2012

On Sat, 13 Oct 2012, Camillo Bruni wrote:

>
>>>> I do not understand what this means. Is it a request for someone to
>>>> write a plugin?
>>> Link cog with the lz4 as a first step to write bindings with FFI/NativeBoost.
>>
>> An external plugin sounds a lot better to me.
> replace "link" with "bundle".
>
>> The larger the VM binary is, the slower it will be on today's CPUs.
>
> how come?

It's because of the cache hierarchy. You don't have control over what will 
be where in a binary, or what sizes and levels of cache a CPU has. The 
smaller the binary is, the higher the chance that the part of the VM which 
needs to be used right now is in the CPU cache. I wrote about this a few 
years ago. I found that building all non-essential plugins as external 
gives ~4-5% better performance (using the Interpreter).

>
>> Make it work, make it right, make it fast. You don't have the system yet, but you want to make it fast already?
>
> I agree 100% if you implement everything from scratch by yourself.
> But in this case it's relying on an external project, which will give me speedup for free ;).

So you have the system ready to be tested with compression via FFI to find 
out if it makes sense to use it at all or not. There's still no reason to 
add extra code to the VM yet, because noone knows if it's worth it or not.

>
>
>>> Plus by having a super-fast compression library at hand decompression would
>>> essentially be a NOP.
>>>
>>
>> I didn't see any benchmarks where (de)compression is done on small chunks of data (a few kilobytes at most - which is your intended use case).
>
> see [1]

I don't see where that paper is "talking" about the size of the exported 
chunks, or where it contains (de)compression benchmarks done on small 
chunks of data. Please be more specific.

>
>> And even though the (de)compression might not make much difference in runtime, it definitely will give higher CPU usage, which is unwelcome in some cases (e.g. mobile devices).
>
> well it runs on multiple cores. Cog runs on a single core. So wasting some CPU cycles
> on the non-used cores won't harm that much.
>
> For mobile devices you might simply not want the image to swap, hence you will pay
> a lot of attention to make sure it stays small. So yes, in this case you won't rely
> directly on such a features.
>
> However swapping out unused parts of the system and reload them are still interesting
> on such a "limited" platform  [1]. And in this case you exactly don't want to waste
> cycles on loading the data, so compression in memory is interesting again.
>
>> It might result in lower overall CPU usage too, but the ~2 compression ration makes me think that it's unlikely.
>
>
> It does if it makes swapping out data cheaper, that's a win. But here you don't
> show me benchmarks either ;)

IIUC there's an unspecified (probably non-public/not open source) project. 
You'd like to do some experiments with it (because you have access to it) 
and therefore you want to add some extra code (not useful for most users) 
to the public VM (used by everyone) to support that expriment. Am I right?

Btw, I'm not showing any benchmarks, because I don't know what to measure.

Levente

>
>
> [1] http://rmod.lille.inria.fr/archives/papers/Mart11c-COMLAN-ObjectSwapping.pdf
>
>