There was an interesting discussion in the "fonc" list a short time ago about trying to make the system as independent as possible from the representation of "tags".
I suppose that all of you know that objects in dynamically typed languages are tagged so that the system can know the type of each one at run time. There are several ways of doing this. A simple solution is to have the first word in each object point to some other object (a class, for example) which indicates that type. Another solution is to divide up the memory space into regions and only allow objects of a given type in each region. A variation of this is to reserve some bits in the object pointer to indicate the type. In practice a combination of these is almost always used.
For Squeak, for example, there are three systems. The first is using the lowest bit in the object reference to separate small integers (1) from all other objects (0). The second, which applies to the "all other objects" case, uses some bits in a header word as an index into a table which then points to the class. In the third option the header includes a full extra word which is a direct pointer to the class.
This method works well because objects are aligned on multiple of four bytes, so all normal pointers have the lowest two bits as 00 and can never be confused with small integers. The problem is that if you add two integers then the result will be wrong: you have to clear the tags, do the addition and then set the tag (just clearing one of the tags has the same effect for addition). Note that we lost one bit of precision for small integers, but pointers are no worse off due to this scheme (we can still address 4GB).
For Self the two lowest bits are 00 for small integers (so we lose two bits), 01 for object pointers, 10 for floating point and 11 for header words. This would seem to make access to an object's fields more costly, but in practice most of the time you have a constant offset (or an index register plus a constant) added to a base address and can adjust the constant so no extra operations are needed. The small integers can now be added and subtracted very simply and you just have to watch out for overflow. The floating point numbers got the short end of the stick with a serious loss in precision, but you get better performance than the "boxed" (full objects rather than encoded in object references) floats like in Self. And you can have boxed doubles if needed. The special tags for headers allow you to scan though memory at very high speeds until you find some reference you want and then back up a little to see what object it is in. Without this you have to scan by objects (which normally implies recursion and other complications) to get the same result.
For RISC42 I changed the object reference format a while back so that the top two bits are 00 for positive small integers, 11 for negative small integers and 01 or 10 for object references (two sizes of task/object division). Note that the precision of small integers is 31 bits like in Squeak. With this format not only do 32 bit addition and subtraction instructions work just fine, but also signed multiplication and division, shifts and logical operations too. Only rotation instructions don't work. Of course, the result won't be a valid small integer for all inputs. One solution would be to add a second overflow flag to detect this. Another would be to redefine the old overflow flag, but in some cases we would like the old definition.
The interesting thing about this scheme is how different the costs are for a software versus hardware implementation. It is a bit complicated to check these flags in software (you need a shift, an exclusive or and a masking operation) and detecting non small integer results is much worse. For the hardware a single exclusive or gate will to the job and detecting overflow adds only a little more logic.
So a system that has both a hardware and software-only implementations would be improved by allowing different tagging schemes. This would make sharing images more complicated (but not that much).
-- Jecel
hardware@lists.squeakfoundation.org