Adaptive PIC scheme (Re: [Vm-dev] InterpreterSimulator)

Eliot Miranda eliot.miranda at gmail.com
Wed Mar 16 06:54:56 UTC 2016


Hi Ben, Hi Levente,

> On Mar 15, 2016, at 11:09 PM, Ben Coman <btc at openinworld.com> wrote:
> 
> 
>> On Wed, Mar 16, 2016 at 10:50 AM, Clément Bera <bera.clement at gmail.com> wrote:
>> 
>> Hello,
>> 
>> I was considering counters in the PICs not necessarily to reorder the PIC but also to make available the frequency information of each different types available to the sista optimizer, in a similar way to the Dart VM where a lot of profiling information is available. Two cases then comes to mind, specializing the code only for the most frequently used type and compile the fall back for least recently used types or generate optimized code with prefilled inlined cache, including PICs in the correct order, on optimized code. Obviously that's quite some work, so we have to measure what we would really earn.
>> 
>> As Eliot mentioned, the counters cannot be inlined in the PIC as each time one modifies the machine code the cpu instruction cache line is flushed, so inlined counters implies a huge slow down. The divide by 2 idea sounds cool though. I was also thinking that the housekeeping could happen during machine code garbage collection, as the PIC is moved anyway, reordering at that time might be cheaper.
> 
> Just some wild speculation.  I don't really need a reply - just to say
> them out loud so I can move on.
> 
> * If the data needs to be stored away from the PIC, then maybe it can
> even be queued for processing by a second CPU. A small gain
> possible(?) in spite of the main VM being not multi-processor.

At least on x86/x86_64 it's essential to keep the counters far away from the code because the read-modify-write cycle for the counter update flushes the  processor internal CISC-to-RISC JIT instruction cache and results in truly abysmal performance.

> 
> * An obvious thing would be only running expensive counters during
> unit tests, but I guess that workload might be different to real life.
> 
> * I wonder also if such a mechanism for counting might be re-purposed
> for profiling execution time (excepting that existing solution are
> sufficient)
> 
> cheers -ben
> 
> 
>> 
>> 2016-03-16 2:19 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>> 
>>> 
>>> 
>>>> On Tue, Mar 15, 2016 at 1:15 PM, Bert Freudenberg <bert at freudenbergs.de> wrote:
>>>> 
>>>> 
>>>>> On 15.03.2016, at 20:28, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>>>> #(1.0 #one 'one' #[1]. $1. #(1))
>>>> 
>>>> #(1.0 #one 'one' #[1]. $1. #(1)) size
>>>> ==> 8
>>>> 
>>>> (not 6 as intended)
>>>> 
>>>> - Bert -
>>> 
>>> 
>>> Thanks, Bert.  If one corrects there's no significant change, so the figures are still representative.  Reproducible times on a Mac would be nice to have... ;-)  I can get two congruent runs, but timing is all over the place by the third run :-(
>>> 
>>> _,,,^..^,,,_
>>> best, Eliot
>> 
>> 


More information about the Vm-dev mailing list