[squeak-dev] hydra vm update

Sat May 3 07:53:22 UTC 2008

Oops, didn't noticed that message sent only to John.

---------- Forwarded message ----------
From: Igor Stasenko <siguctua at gmail.com>
Date: 2008/5/3
Subject: Re: [squeak-dev] hydra vm update
To: johnmci at smalltalkconsulting.com

2008/5/3 John M McIntosh <johnmci at smalltalkconsulting.com>:

>
 > > Consider, that my implementation was based on observations of what
 > > windows VM doing. And its already used a multimedia timer, which was
 > > set to trigger checking for interrupts each 1 msec (or at minimum time
 > > periods which OS can provide, but not less than 1mses). And these
 > > routines created own timer thread, hidden from the eyes of developer.
 > > So, what i did, i just replaced this thread by own implementation.
 > > Also, because of using multimedia timer, i seen an ovelap: an
 > > optimistic ( interruptCheckCounter handling) was doing the same as
 > > multimedia timer does, which, IMHO, can't be considered as a good
 > > algorithm.
 > >
 > > After refactoring, i got a code which can handle timer with better
 > > accuracy comparing to old VM. (Read a topic some time back about
 > > Delays accuracy).
 > > In fact, i was surprised, when seen, that my implementation provides
 > > more accurate timers, i expected it to be worser :)
 > >
 >
 >  Ok, I'll have to run some benchmarks, when I changed the macintosh VM to
 > pound the interrupt delay
 >  logic 1000 a second back in the era of 500 Mhz machines the impact on
 > performance was noticeable.
 >  Maybe today no one cares, maybe the folks chasing the why does opening
 > windows take 2x as long
 >  can fix that, then mmm we'll consume part of the gain back in overhead to
 > improve Delay accuracy.
 >

 My benchmarks showing opposite: the tinyBenchmarks runs faster with
 new model than with old checkForInterrupts.
 And Delay accuracy is improved. Maybe this can be not true for other
 platforms, but for Windows, i got gain in both areas, at a cost of
 implementing own timer thread routine. :)

 >  Maybe I can clock watch before handleEvents() to avoid the overhead of a
 > timer routine.
 >
 >  I note we use to have clock watching on each primitive call in ages past,
 > but removed it since we
 >  found that under certain conditions one could spend % of time just getting
 > the clock, some vestiges
 >  of that lurk via the non-existant lowres millisecond clock function. Maybe
 > it's not noticeable now.
 >
 >  Ya, benchmarking first...
 >

 Also note, that HydraVM is targeted for multicore CPUs. In case if
 timer thread resides on different core than interpreter thread, there
 will be no scheduler overhead to switch active thread, and this makes
 delays handling even more accurate.

 Just run test again on quad-core box with same image:

 |delay bag| delay := Delay forMilliseconds: 1.
       bag := Bag new.
       1000 timesRepeat:[bag add: [delay wait] timeToRun].
       bag sortedCounts

 HydraVM:

 a SortedCollection(932->2 67->1 1->3)
 a SortedCollection(932->2 68->1)
 a SortedCollection(932->2 68->1)

 Croquet VM:

  a SortedCollection(952->2 48->1)
  a SortedCollection(951->2 48->1 1->4)
  a SortedCollection(951->2 46->1 3->3)

 This can be interpreted to one of following:
 - Hydra delays is more accurate
 - maybe its having similar accuracy (or even lower), but spends less
 time to signal semaphore, so we got noticeable increase in numbers
 with 1 msec results.

 No wonder, benchmarks on vanilla VM are faster :

 Croquet VM

 1 tinyBenchmarks
  '485768500 bytecodes/sec; 14611978 sends/sec'
  '486229819 bytecodes/sec; 14520007 sends/sec'
  '485768500 bytecodes/sec; 14554360 sends/sec'

 [ 1 tinyBenchmarks ] timeToRun
  5337
  5336
  5402

 HydraVM:

 1 tinyBenchmarks
 '454706927 bytecodes/sec; 13731345 sends/sec'
 '455516014 bytecodes/sec; 13363453 sends/sec'
 '456327985 bytecodes/sec; 13210400 sends/sec'
 '453900709 bytecodes/sec; 13373136 sends/sec'

 [ 1 tinyBenchmarks ] timeToRun
 5692
 5770
 5796

 Not sure, if using timeToRun is fair here, because it using timeToRun
 in code to determine one of the parameters.

 This actually shows an overhead of introducing interpreter as argument
 to each function.

 However, it would be interesting to add as option to build VM using
 thread-local storage to minimize impact of introducing multiple
 interpreter instances. As a bonus we'll have full compatibility with
 old primitives, because we don't need to pass interpreter instance as
 argument.
 But it was a design choice, and we decided to pass interpreter as
 extra argument instead of using thread-local storage.

-- 
Best regards,
Igor Stasenko AKA sig.