Dear Squeakers,
I have changed my mind a little bit after studying some Squeak sources.
Now I plan to implement LargeInteger primitives (20-39)
_without_ changing
the representation of LargeIntegers by extending the Interpreter
arithmetic
primitives using the CCodeGenerator. This avoids some problems dealing with different platforms.
In a later step changing the representation to bigger 'digits' (16,32,.. bits?) _could_ be interesting.
Perhaps 16 bits, but unlikely 32. How, for example, would you handle integer overflow when doing, say, the LargeInteger add?
My first idea was to take one small and one big C-integer type, represent digits with the small and perform the operations with the bigger one. If the bigger one is twice as big as the smaller there are no problems with mult.
I agree that would work for 16-bit digits, but you need stronger stuff for 32. A downside is that this would not work in any of the simulated Smalltalk, since SmallIntegers are not full 32-bit values. The resulting Smalltalk code would only be useful for compilation.
I think on all machines with ANSI-C these conditions are met; at _least_ we could take an 8 bit val for representation and make the computation in 16 bit, at best - so far - representation in 32 and computation in 64.
Indeed, that's exactly what is done right now. bytes are accumulated by addition and multiplication into SmallIntegers.
But here we introduce platform specific stuff, which has to be handled. By using the Smalltalk to C compiler without changing the current representation we are _totally_ platform independent (no #defines in C at all) and in spite of this hopefully much faster than without compiling in C.
I don't understand. Of course we would use the Smalltalk to C compiler, but won't will still have to compile the C code? It does raise a byte endian gender issue, that inside the C-code is irrelevant, even though different platforms will execute differently, but for the remaining smalltalk code than manipulates digits of the object will need to address. A simple solution is to implement a digitAt: function as well, so that your plugin or core code will work fine across platforms, bigendian or otherwise.
I see no reason why any implementation need be platform specific. Done properly, the generated code should work fine with any conforming C compiler supporting 32-bit ints. (Note that Squeak does expect the C compiler to treat ints as 32-bitters.)
And how could you do the multiply without bit munging? Is it possible that a straight loop with carry would be much faster than all the masking and shifting necessary to do a carry written in C?
That could be, but then you'd introduce assembler, what is _very_ platform dependent.
Because I've never compiled a Squeak so far I think it's best for me first to take the straigthforward approach and then to think of other representations...
Are you planning to rewrite as numbered primitives for incorporation in the VM? Why not make a pluggable primitive instead, so that folks who don't need speedier longs (most users) won't need to have the code in their VM footprint? Done right, the code should run identically, albeit slower, without the plugin installed as it does when the plugin is installed.