[Vm-dev] Max number of method arguments

Eliot Miranda eliot.miranda at gmail.com
Tue Jan 8 22:31:33 UTC 2019


Hi Nicolas,

On Tue, Jan 8, 2019 at 1:51 PM Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
> Hi all,
> particularly Clement and Eliot,
>
> One of the most annoying limit of bytecode is the number of arguments (<16
> in V3), not so much annoying for pure Smalltalk, but certainly so for FFI
> (FORTRAN 77 lacks structures so existing code base often have functions
> with many arguments).
> For scientific Smalltalk, some of those old FORTRAN libraries are still
> around nowadays (LAPACK is an example).
>

Agreed.  There are VW users out there with autogenerated code that requires
more than 15 arguments.  Clément and I already have a design in mind, which
is much more elegant than using the extra bit below.  However, it does
require that we change the maximum Context stack size, which is one reason
(the other being lack of time) why we haven't implemented this so far.

]In 2008 my closure design introduced indirection vectors for closed over
arguments and among the five bytecodes added to implement it was the Create
Array bytes ode that can do one of two things:

V3PlusClosures:
138   10001010 jkkkkkkk Push (Array new: kkkkkkk) (j = 0)
or Pop kkkkkkk elements into: (Array new: kkkkkkk) (j = 1)

SistaV1:
231 11100111 jkkkkkkk Push (Array new: kkkkkkk) (j = 0)
& Pop kkkkkkk elements into: (Array new: kkkkkkk) (j = 1)

This bytecode is used to create indirection vectors, and to create tuples
of size <= 8.  e.g. { thisContext method symbolic. 2. 3. 4. 5. 6. 7. 8 }
#('89 <52> pushThisContext:
90 <81> send: method
91 <80> send: symbolic
92 <E8 02> pushConstant: 2
94 <E8 03> pushConstant: 3
96 <E8 04> pushConstant: 4
98 <E8 05> pushConstant: 5
100 <E8 06> pushConstant: 6
102 <E8 07> pushConstant: 7
104 <E8 08> pushConstant: 8
106 <E7 88> pop 8 into (Array new: 8)
108 <5C> returnTop

We call this version of the bytecode the cons array bytecode.  The other
form, used to create in direction vectors is the greater array bytecode.

(c.f. { thisContext method symbolic. 2. 3. 4. 5. 6. 7. 8. 9 } which
produces much more code but requires only 2 elements of stack depth).

There are also three bytecodes used to access indirection vectors:

V3PlusClosures:
140   10001100 kkkkkkkk jjjjjjjj Push Temp At kkkkkkkk In Temp Vector At:
jjjjjjjj
141   10001101 kkkkkkkk jjjjjjjj Store Temp At kkkkkkkk In Temp Vector At:
jjjjjjjj
142   10001110 kkkkkkkk jjjjjjjj Pop and Store Temp At kkkkkkkk In Temp
Vector At: jjjjjjjj
SistaV1:
251 11111011 kkkkkkkk sjjjjjjj Push Temp At kkkkkkkk In Temp Vector At:
jjjjjjj, s = 1 implies remote inst var access instead of remote temp vector
access
* 252 (3) 11111100 kkkkkkkk sjjjjjjj Store Temp At kkkkkkkk In Temp Vector
At: jjjjjjj s = 1 implies remote inst var access instead of remote temp
vector access
* 253 (3) 11111101 kkkkkkkk sjjjjjjj Pop and Store Temp At kkkkkkkk In Temp
Vector At: jjjjjjj s = 1 implies remote inst var access instead of remote
temp vector access

So the insight is that if we pass arguments beyond 14 in an indirection
vector we can have up to 15 + 127 = 142 arguments without needing any extra
bits in a CompiledMethod header or range in a bytecode.  We simply pop
arguments beyond the 14th into an indirection vector, using the cons array
bytecode.  Yes, this is slow compared to "native" support, but such methods
are extremely rare, and supporting them this way means we have less waste
elsewhere.  It will require some sophistication in the Decompiler, but
otherwise seems quite simple.

With this design, as far as the VM is concerned the maximum argument count
is still 15.  Only the image need bother with how to record the argument
count for a method that has 15 or more arguments, and indeed a method with
15 arguments can still use all 15 arguments without having to create an
indirection vector.  This isolates the effects to the compiler (arguments
beyond the 14th in methods with more than 15 arguments must be accessed
using the indirection vector bytecodes above), but otherwise are quite
localized: indirection vector creation occurs immediately after normal
argument marshaling and immediately before the send bytecode.

Does this design appeal to you?  If it does, then we should discuss when
and how it should be implemented.  One thing would be to make the maximum
size of a Context, defined at the image level by CompiledCode's LargeFrame
class variable, but hard coded into the VM, some kind of VM parameter, e.g.
stored in the image header and read at start-up.  It would be quite easy to
add this.  If we did so we should also ensure the stack page size
calculation allows for a stack page big enough for one or two huge frames.
Note that the design also means that a large stack is needed only to
*marshal* arguments, not to activate a method with many arguments, since
the excess arguments are stored in an indirection vector.

P.S. Indeed we could use the scheme used for arbitrary sized tuples to
marshall extra arguments, but this would affect code generation much more.
Different code would have to be used to marshall each argument beyond 15;
whereas using the cons array bytecode

I patched the old Squeak compiler in Smallapack to workaround this
> limitation (it was easy enough to pass a single Array, and invoke FFI with
> many args).
> In modern Pharo flavour, this is more involved with the new OpalCompiler
> (iit does not seem to be designed for extensibility as it seems necessary
> to patch many pieces/subclasses for a single feature change...).
>
> But we now have Sista V1 bytecodes which removed a lot of limitations (#
> inst vars, #literals, max jump offset ...). Alas I don't see a modified
> limit for number of arguments (source:
> https://hal.inria.fr/hal-01088801/document a bytecode set for adaptive
> optimization): there is still a limit of 4 reserved bits in compiled method
> header documented in link above.
> Though, there is an adjacent unused bit now...
> In Squeak,/Pharo, EncoderForSistaV1>>genSend:numArgs: suggests that the
> limit is 31 (sic)
>
>     (nArgs < 0 or: [nArgs > 31]) ifTrue:
>         [^self outOfRangeError: 'numArgs' index: nArgs range: 0 to: 31
> "!!"].
>
> or at least 2047 if we believe code below:
>
>     "234        11101010    i i i i i j j j    Send Literal Selector
> #iiiii (+ Extend A * 32) with jjj (+ Extend B * 8) Arguments"
>
>
> https://github.com/pharo-project/pharo/blob/50992c3e5fed790b7e660954aee983f4681da658/src/Kernel-BytecodeEncoders/EncoderForSistaV1.class.st
>
> Pharo also limit the numArgs to 15 whatever the encoding in
> CompiledMethod>>newBytes:trailerBytes:nArgs:nTemps:nStack:nLits:primitive:
>
> https://github.com/pharo-project/pharo/blob/50992c3e5fed790b7e660954aee983f4681da658/src/Kernel/CompiledMethod.class.st
>
> But Squeak does not limit nArgs at all in
>
> EncoderForSistaV1>>computeMethodHeaderForNumArgs:numTemps:numLits:primitive:
>
> So my questions:
> - is that doc up-to-date?
> - if so, couldn't we expand the limit to 31 args by using the unused bit?
>
> Note: there is another unused bit in V3 (not adjacent), and the double
> extended (send) byte code has room for 31 args in V3 too, since only the
> first 3 bits of second byte encode the type of operation...
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190108/c71b8f1b/attachment.html>


More information about the Vm-dev mailing list