On Fri, May 8, 2009 at 2:26 PM, Jecel Assumpcao Jr jecel@merlintec.comwrote:
Sorry about the wrong attribution - Celeste makes a big mess of all of Eliot's emails and most replies to those emails. Unfortunately, fixing this is not currently near the top of my "to do" list so I'll just have to deal with this for a few more months.
I totally agree about the value of immediates being to speed up computations by avoiding allocations. My idea for symbols was not to avoid the costly mapping of strings to new instances but rather speed up class lookup a little bit. This wouldn't help us now, but for a future modular Squeak that would be loading and unloading object graphs all the time, this could make a difference.
One of the things I think would really helps here is to have a way of assigning the id-hash of a Symbol based on its string hash. Then MethodDictionaries and the like don't have to be rehashed on load. I could imagine a new:withHash: primitive that creates an object with a specified hash atomically, whihc is safer than adding a separate setIdHash: primtive; VW has the latter.
Like VisualWorks, Self uses 30 bit integers with a 00 tag (which makes
detagging/retagging unnecessary for addition, subtraction and bitwise logical operations).
FYI, VW does not use 00 for SmallIntegers; it uses 00 for objects. So it does have to detag for certain operations. But of course it optimizes addition/subtraction by only detagging one of the two values so it doesn't have to retag.
Self, Strongtalk and V8 all do use 00.
The other tag values represent 30 bit floats,
object pointers (you always use a constant offset with these anyway, so the detagging can be built into that constant) and object headers. The memory is divided into segments (generations) and each segment stores tagged data from the bottom and binary data from the top. ByteVectors are normal tagged objects with a SmallInteger pointing to the actual bytes.
The idea of a tag pattern for object headers is that you can "flatten" the memory scanning operations. You just scan from top to the limit until you found what you were looking for and then back up to the previous header to see what object contains that oop. This can be many times faster than a objects do: [ :obj | obj fieldsDo: [ :oop | ....]] nested loop.
Yes, this is neat. They use it in become operations which are very common as slots are added and removed right?
For my old RISC42 design I came up with the idea of having the top two bits the same to indicate SmallIntegers. This is hard to check in software, but in hardware is just a two input XOR gate. This allows you to avoid detagging not only for the operations I mentioned above, but also for multiplies, divides, left shift and signed right shift too. A few weeks ago I found out that the old Swamp Smalltalk computer from 1986 used exactly the same scheme (the two patterns where the top two bits were different were used for oops and for pointers to context objects).
-- Jecel
Eliot Miranda wrote on Fri, 8 May 2009 15:18:32 -0700
One of the things I think would really helps here is to have a way of assigning the id-hash of a Symbol based on its string hash. Then MethodDictionaries and the like don't have to be rehashed on load. I could imagine a new:withHash: primitive that creates an object with a specified hash atomically, whihc is safer than adding a separate setIdHash: primtive; VW has the latter.
This would be great for merging binary packages with the current Symbols that map 1 to 1 to Strings. I have to confess that this has not been a priority for me in my designs since I have always interested in schemes which map a single Symbol to multiple Strings. The idea was to allow Smalltalk to be written in languages other than English. I know that a lot of people are very much against this idea, but in my experience it will be an important factor in the future.
[header tag for quick memory scans]
Yes, this is neat. They use it in become operations which are very common as slots are added and removed right?
Yes, though adding and removing slots is not common when normal applications are running, only during development. And only "data slots" require this as constant slots and method cause the object to get a new "map" (hidden class, roughly) rather than changing the object's size.
-- Jecel
vm-dev@lists.squeakfoundation.org