Tagging

Sat Aug 4 10:03:21 UTC 2007

Jason Johnson writes:
 > Hello all,
 > 
 > As I have mentioned before, I am thinking about creating a new VM for
 > testing some of my ideas.  One thing I noticed in the blue book was the
 > tagging they use on the first (least significant) bit.  By doing this you
 > always have to shift the number before you can work with it.

You don't need to shift the numbers for addition and subtraction, just
make sure the bottom bits are 0. Even with the tag being 1, you
only need to detag one argument and if that's a constant the
detagging can be done at compile time.

With multiplication and division only one argument needs to be
shifted.

 > My thought was moving the tag to the last bit (most significant) and making
 > 0 mean SmallPositiveInteger and 1 for OOP's.  This way, once the number is
 > identified to be a number nothing more need be done to use it and it doesn't
 > need to be converted to be put back.
 > 
 > As far as negative numbers, if I use a 31 bit header for my OOPS then the
 > first bit of the header can be a flag.  If the flag is 1 it's actually a
 > negative number, otherwise it's a normal header.
 > 
 > To clarify what I mean, here are some examples (first half byte in binary,
 > rest in hex)
 > 
 > 0000-0 00 00 ff - SmallPositiveInteger(255)
 > 0111-f ff ff ff - SmallPositiveInteger(2147483647)
 > 
 > 1xxx-x xx xx 01 - OOP(random memory address)
 > 1xxx-x xx xx 02 - OOP(random memory address)
 > 1xxx-x xx xx 03 - OOP(random memory address)
 > 
 > (at address 0xxx-x xx xx 01)
 > 1000-0 00 00 00 - SmallNegativeInteger(-2147483647)
 > 
 > (at address 0xxx-x xx xx 02)
 > 1111-f ff ff ff - SmallNegativeInteger(-1)
 > 
 > (at address 0xxx-x xx xx 03)
 > 0xxx-x xx xx xx - OO Header for a regular object
 > 
 > So does anyone know any problems with this approach that make it
 > unusable/unpractical?  The only things I see are:

It was used by some Lisp systems in the mid 80's. I think CMU Common
Lisp from memory.

There are two disadvantages of using the top bit:
1) You need to control the address space to make sure all objects
live. Controlling the address space to that degree is unlikely to 
be something that can be done portably.
2) You loose half your address space.

 > 1) This representation is tied to 2's compliment, but are there any system
 > that don't use that?
 > 2) This system can only deal with 2 gigs worth of address pointers (can be
 > hard coded to high or low but not both).  How does any Smalltalk get around
 > this?

Alignment. Words are aligned to 4 byte boundaries which is always
required for performance and required by some CPU architectures.

 > 3) Negative numbers require indirect memory access.  Are negative numbers
 > used so much this would be a problem?  If so, one option would be to switch
 > from 31 bits to 30 and use 2 bits for the tag (00 = SmallPositiveInteger, 11
 > = SmallNegativeInteger, 01 = Float?)  I think VW is using two tag bits.
 > 4) By tagging on the high side I have no option of using a value less then
 > 32 bits (e.g. an 8-byte char).  Perhaps characters can be done as my initial
 > negative number solution as well by having a special OO header to represent
 > them.  Since the pointer points to the top, not the bottom maybe the header
 > could be variable size (i.e. less then 32 bits).
 > 
 > So what do you all think?  Any big problems I didn't mention?  Or solutions
 > to the problems I mentioned?  Or papers about this kind of thing (since my
 > google-fu has not found too much on the subject so far)?

There is a large literature on garbage collection. These day's I'd try
citeseer as well as Google. Scanning PLDI and OOPSLA proceedings from
the late 70s and 80s will provide plenty of information if you've
either got ACM membership or access to a university library.

Bryce