Hi Vanessa, Hi Fabio, Hi David, Hi All,
On Dec 20, 2020, at 10:24 PM, commits@source.squeak.org wrote:
Vanessa Freudenberg uploaded a new version of System to project The Trunk: http://source.squeak.org/trunk/System-codefrau.1205.mcz
==================== Summary ====================
Name: System-codefrau.1205 Author: codefrau Time: 20 December 2020, 10:23:10.790782 pm UUID: f94486f3-3743-4300-a495-c2a89089e122 Ancestors: System-dtl.1204
Update platformName for SqueakJS 1.0
=============== Diff against System-dtl.1204 ===============
Item was changed: ----- Method: SmalltalkImage>>isLowerPerformance (in category 'system attributes') ----- isLowerPerformance "Some operations - TestCases for example - need an idea of the typical performance of the system on which they are being performed. For now we will simply assert that running on an ARM cpu or as a SqueakJS instance is enough of a discriminator. Options for the future might also involve whether the vm is a full Cog or Sisata system, even actually measuring the performance at some point to be sure" ^ (self platformSubtype beginsWith: 'arm') "Raspberry PI for example"
or: [self platformName = 'JS'] "SqueakJS"!
or: [self platformName = 'Web'] "SqueakJS"!
this is interesting. The method is do crude, but potentially we have a much more rational basis upon which to derive its result. I would expect the effective performance to be the product of processor speed (mips), core execution engine architecture and object representation.
Mips varies hugely across the range from eg Raspberry Pi 2,3,4 to various Intel (i5,I7,i9 etc) and Apple Silicon. The range here is about one order of magnitude.
Execution architecture varies from pure context interpreter (the BTTF VM), Stack Interpreter, Squeak JS interpreter, Squeak JS generation one JIT, Squeak JS subsequent generation JITs (temps in JS vars, sends mapped to JS calls), Cog JIT, Sista JIT.
Very crudely Spur = 2 x v3 (actually about 1.7 and varies according to workflow).
Of the execution architectures Sista JIT is for practical purposes not yet implemented, a prototype, but may offer 2x to 4x of Cog. Of the Squeak JS JITs I think that the send mapping isn’t implemented (am I right?). But is the temp var mapping implemented? If so what difference does it make? Context to Stack is about 1.5. Stack to Cog is about 6.
So the notion is that if we can come up with crude numbers that rank the execution architectures and a measure of mips we can compute a meaningful numeric estimate of likely Smalltalk execution speed and answer isLowerPerformance if this number falls below a specific threshold. What we have now based on platformName is simply wrong. eg a Raspberry Pi 4 is way faster than a Pi 3.
One thing I did for visual works is estimate processor mips by timing the first invocation of the allInstances primitive and dividing by the number of objects. Basically the heuristic is that mips is roughly (inversely) proportional to how much time per object the first allInstances invocation spends. There is (almost) always an allInstances invocation at startup in VisualWorks (to clear font handles IIRC), and there may be in a Squeak image. Alternatives are measuring how long it takes to load and/or swizzle the image on load divided by the heap size. Basically we have the opportunity to introspection at startup cheaply measuring the time some costly primitive takes to run and this result can be cached, accessed via a primitive or vmParameter and perhaps updated as execution proceeds.
Does this sound like overkill? If not, what should we choose as our mips measurer? We want something that all VMs have to do somewhat similarly fairly early on system startup and we can correlate with stopwatches and macro benchmarks like the time taken for the Compiler package to recompile itself, etc.
Eliot _,,,^..^,,,_ (phone)
On Tue, Dec 22, 2020 at 1:08 AM Eliot Miranda eliot.miranda@gmail.com wrote:
Of the execution architectures Sista JIT is for practical purposes not yet implemented, a prototype, but may offer 2x to 4x of Cog. Of the Squeak JS JITs I think that the send mapping isn’t implemented (am I right?). But is the temp var mapping implemented? If so what difference does it make? Context to Stack is about 1.5. Stack to Cog is about 6.
None of that has been implemented in SqueakJS. The current JIT only gets rid of the generic bytecode decoding, plus it inlines small-int arithmetic.
However, that still gives an 8x increase in bytecode speed, which causes the send speed as measured by tinyBenchmarks to go up by 3.5x too. It also *feels* significantly faster with the JIT enabled.
See the comment on top of https://github.com/codefrau/SqueakJS/blob/main/jit.js
Does this sound like overkill? If not, what should we choose as our mips measurer? We want something that all VMs have to do somewhat similarly fairly early on system startup and we can correlate with stopwatches and macro benchmarks like the time taken for the Compiler package to recompile itself, etc.
I like measuring all-over performance, and not adding any extra work.
Like, DateAndTime is pretty early in the startup list. It could remember the time its startup was invoked. Another class that comes later could set a LowPerformance flag if it took longer than x ms since DateAndTime was initialized.
I just tried that with ProcessorScheduler (see attachment). On Safari and a 5.3 image I get ImageStartMS = 133 ms, on Chrome 250 ms. On a fast VM I get 5 ms. So maybe if that takes longer than say 50 ms it could be considered low performance?
Vanessa
On Tue, Dec 22, 2020 at 1:44 PM Vanessa Freudenberg vanessa@codefrau.net wrote:
On Tue, Dec 22, 2020 at 1:08 AM Eliot Miranda eliot.miranda@gmail.com wrote:
Of the execution architectures Sista JIT is for practical purposes not yet implemented, a prototype, but may offer 2x to 4x of Cog. Of the Squeak JS JITs I think that the send mapping isn’t implemented (am I right?). But is the temp var mapping implemented? If so what difference does it make? Context to Stack is about 1.5. Stack to Cog is about 6.
None of that has been implemented in SqueakJS. The current JIT only gets rid of the generic bytecode decoding, plus it inlines small-int arithmetic.
However, that still gives an 8x increase in bytecode speed, which causes the send speed as measured by tinyBenchmarks to go up by 3.5x too. It also *feels* significantly faster with the JIT enabled.
See the comment on top of https://github.com/codefrau/SqueakJS/blob/main/jit.js
Does this sound like overkill? If not, what should we choose as our mips measurer? We want something that all VMs have to do somewhat similarly fairly early on system startup and we can correlate with stopwatches and macro benchmarks like the time taken for the Compiler package to recompile itself, etc.
I like measuring all-over performance, and not adding any extra work.
Like, DateAndTime is pretty early in the startup list. It could remember the time its startup was invoked. Another class that comes later could set a LowPerformance flag if it took longer than x ms since DateAndTime was initialized.
I just tried that with ProcessorScheduler (see attachment). On Safari and a 5.3 image I get ImageStartMS = 133 ms, on Chrome 250 ms. On a fast VM I get 5 ms. So maybe if that takes longer than say 50 ms it could be considered low performance?
Works for me. I would record and provide an accessor for ImageStartUsecs (a class variable in SmalltalkImage, using microseconds :-) ). Then one can either use isLowerPerformance or the actual time for a more "nuanced" view.
_,,,^..^,,,_ best, Eliot
vm-dev@lists.squeakfoundation.org