[Vm-dev] Re: goto instruction with Cog VM

Sun Nov 9 00:20:11 UTC 2014

Eliot Miranda wrote:
>  
> Hi Ben,
> 
> On Nov 8, 2014, at 3:35 PM, Ben Coman <btc at openInWorld.com> wrote:
> 
>> Eliot Miranda wrote:
>>> ------------------------------------------------------------------------
>>> On Sat, Nov 8, 2014 at 11:21 AM, Ralph Boland <rpboland at gmail.com <mailto:rpboland at gmail.com>> wrote:
>>>          > Hi Ralph,
>>>    ...
>>>     > >
>>>     > > I was aware of caseOf: in Squeak.  I always found it awkward to
>>>    use and
>>>     > > felt a true case statement would be simpler.  Alas, it's
>>>    impossible to
>>>     > > have a true case statement added to Smalltalk now I think.
>>>     > So what's a "true" case statement?  For me, at least, the Squeak
>>>    one *is*,
>>>     > and is more general than one limited to purely integer keys, as
>>>    for example
>>>     > is C's switch statement.  A number of languages provide case
>>>    statements
>>>     > that are like Squeak's.  What do you consider a "true" case
>>>    statement?
>>>    I mean that:  caseOf: is not part of the language itself but rather
>>>    part of the
>>>    standard library or set of packages that one finds in the IDE.  To
>>>    be part of the
>>>    language it would need to be something the compiler is aware of. Ah OK.  I see what you mean. But you're wrong on a few counts.  First, there are *no* control structures in the language beyond closures and polymorphism.  ifTrue:, to:do:, and: whileTrue: et al are all defined in the library, not by the compiler.  Second, tehse structures, /including/ caseOf: are understood by the compiler and compiled to non-message-sending code.  So none of the blocks in caseOf:, ifTrue: and: whileTrue: et al, the optimized selectors, are created and all are inlined by the compiler.  So a) by your criterion of being in the compiler caseOf: is in the language, but b) it all control structures in Smalltalk are defined in the library, and some are optimized by the compiler.
>> Reviewing the code for the following is enlightening:
>> True>ifTrue:
>> True>>ifFalse:
>> False>>ifTrue:
>> False>>ifFalse:
>> to see as the original implementation, but remembering that as an optimization these are inlined, so that code is currently not executed.
>>
>> Eliot, Would I be right to presume that the Interpreter does execute those methods without optimisation?
> 
> The interpreter directly executes the bytecode produced by the compiler.  Go look.  So it depends in how the code base is compiled.  Right now the interpreter does *not* , because inlined blocks, conditional branches  and jumps are much faster than closure creation and messages. The interpreter benefits a lot from this; early Smalltalk implementations were interpreted hence the optimisation in the first place.  However, with adaptive optimisation one can allow the JIT to perform the optimisation in context, allowing alternative implementations of ifTrue: et al in other than booleans.  In Sista we've chosen not to do that, keeping inlining and using conditional branches as our performance counters.  But it may allow the compiler to be smart and optimize these forms in fewer cases.
> 

Ahh. I was thinking about it the wrong way.  To check.. inlined means 
inlined-bytecode not inlined-machine-code? And the result of compilation 
is the same bytecode to run on the VM regardless of whether that VM is 
the Intepreter or Cog ?

(And indeed the compiler is itself running in-image on top of the VM).
cheers -ben

> 
>> cheers -ben
>>
>>>    That is to
>>>    day the Smalltalk language is not very much.  Smalltalk (Squeak) the
>>>    language
>>>    would not include Sets or Dictionaries but would include (some)
>>>    Array classes
>>>    because some aspects of Arrays are dealt with directly by the compiler.
>>> There is a syntactic form for creating Array, but really the notion that the Smalltalk compiler defines the language is a limited one.  It's fair to say that language is defined by a small set of variables, return, blocks, an object representation (ability to create classes that define a sequence of named inst vars and inherit from other classes), and message lookup rules (normal sends and super sends), and a small number of literal forms (Array, Integer, Float, Fraction, ByteArray, String and Symbol literals), and a method syntax.  The rest is in the library.  What this really means is that Smalltalk can't be reduced to a language, becaue the anguage doesn't defne enough.  Instead it is a small language and a large library.
>>>    Selectors such as  ifTrue: and  to:do:  are part of the language
>>>    because they are inlined by the compiler.
>>> No.  One can change the compiler to not inline them.  This is merely an optimization.
>>>     Put another way,  if I could get my doBlockAt: method incorporated
>>>    into the Squeak IDE
>>>    it would nevertheless NOT be part of Squeak the language.
>>>    The consequence of  caseOf:  not being part of the language is that
>>>    the compiler/VM
>>>    cannot perform optimizations when caseOf:  is run into but must
>>>    treat it as
>>>    user written code.
>>>    Squeak's  caseOf:  is more general than C's switch statement but it
>>>    could be more
>>>    general in that there is a hard coded message (=).  I would like to
>>>    be able to replace
>>>    the '=' message by an arbitrary binary operator such as  includes:     or '>'.
>>>    I have to backtrack here:  I looked at the code and it looks like
>>>    the compiler inlines
>>>    caseOf:  and caseOf:otherwise.  If so then these selectors are part
>>>    of the language
>>>    by my definition.
>>> Well, live and learn :-)
>>>     ...
>>>     > > But I wouldn't want to be forced to implement my FSMs this way.
>>>     > > It might be acceptable for small FSMs.
>>>     > > I want to avoid sequential search and
>>>     > >  even binary search might be rather expensive.
>>>     > > I look at computed gotos as the solution but,
>>>     > > as you pointed out, computed gotos pose problems for JIT.
>>>     > > Admittedly, for large FSM's, it might be best or necessary to
>>>     > > use a FSM simulator anyway, as I do now.
>>>     > Nah.  One should always be able to map it down somehow.  Tis will
>>>    be easier
>>>     > with the Spur instruction set which lifts number of literals and
>>>    length of
>>>     > branches limits.
>>>    Good to hear.
>>>     > > Again, for my FSM, case this would often be considered to be good.
>>>     > > But if the state transition tables are sparse then Dictionaries
>>>     > > might be preferable to Arrays.
>>>     >
>>>     > Yes, but getting to the limit of what the VM can reasonably
>>>    interpret.
>>>     > Better would be an Array of value. pc pairs, where the keys are
>>>    the values
>>>     > the switch bytecode compares top of stack against, and the pcs
>>>    are where to
>>>     > jump to on a match.  The JIT can therefore implement the table as
>>>    it sees
>>>     > fit, whereas the interpreter can just do a linear search through
>>>    the Array.
>>>    I am looking at this from the point of view of a compiler
>>>    writer/generator and consider
>>>    your proposal as inadequate for my needs.  You, I think, are looking
>>>    at this from
>>>    the point of view of a VM writer and what can reasonably be
>>>    delivered.  I don't think
>>>    what I want is overly difficult for the interpreter to deliver but
>>>    as you pointed out,
>>>    and you know much better than I, what I want causes serious problems
>>>    for the VM.
>>>     > > My expection is that  at:  be sent to the collection object
>>>     > >  to get the address to go to.  Knowing that the collection
>>>     > > is an array though makes it easier for the compiler/VM to
>>>     > > ensure that the addresses stored in the collection are valid.
>>>     > > Actually, the compiler will be generating the addresses.
>>>     > > Does the VM have absolute trust in the compiler to generate valid
>>>     > > addresses?
>>>     > Yes.  Generate bad bytecode and the VM crashes.
>>>         This is what I expected to hear but wanted it to be clear for
>>>    compilers generated
>>>    by my parser generator tool as you did.
>>>    Ralph
>>> -- 
>>> best,
>>> Eliot
>