counting matching bits / bitXnor

Paolo Bonzini bonzini at gnu.org
Mon Dec 17 12:34:48 UTC 2007


Cees de Groot wrote:
> Hi,
> 
> I'm fiddling with some little project where I need to count the number
> of bits that are the same between two bitstrings. My first
> approximation would be to bitXnor: both integers and count bits in the
> result. However, there's no bitXnor: and the standard implementation:
> 
> a XNOR b = (a AND b) OR (NOT a AND NOT b)
> 
> won't fly because there's not bitNot in Integer, as far as I can tell
> (negate draws in all the ugliness of 2-complement arithmetic :)).

There's bitInvert, I think (I have no Squeak image here).  If you don't 
have it, bitInvert is just "-1 - self".

> It's probably a stupid question that only shows how rusty my bit
> fiddling is - but what would be a *good* performing implementation of
> something that counts matching bits?

You need to give the width, as (if both numbers have the same sign, at 
least) you can always add a zero to the left and get one more matching 
bit. :-)   So the method would be something like this:

      bitMatch: b width: n
          ^((self bitXor: b) bitInvert bitAnd: (1 bitShift: n) - 1)
              bitCount

      bitCount
          | n cnt |
          n := self. cnt := 0.
          [ n = 0 ] whileFalse: [
              cnt := cnt + 1.
              n := n bitAnd: n - 1 ].
          ^cnt

 > 0101 / 0001 -> 3

      ^2r0101 bitMatch: 2r0001 width: 4

> integers are unequal in size, it's a pattern match so the smaller int
> needs to be shifted along the larger int and the largest count is the
> one that's returned):
> 
> 011101 / 101 -> 3
> 011111 / 101 -> 2

I don't know the answer for this, but I know of an efficient algorithm 
for a similar problem, that is to *find* a bitstring in another.  For 
this, you can adapt the Knuth-Morris-Pratt string searching algorithm.

KMP works by computing the longest prefix corresponding to all the 
partial match (e.g. if you've matched six chars in 0101011 and fail on 
the seventh char, you can try again with a four-character partial 
match).  You can modify the algorithm in two ways:

1) you can optimize the prefix table because you only have two values. 
In the previous case, if you failed to match the seventh char in 
0101011, since you wanted a one you know that the target bitstring had a 
zero.  So you can continue with a five-character partial match.

2) when you find a match, if you want overlapping matches you can use 
the prefix table to find them.  For example, if you are looking for is 
0101, the KMP prefix table (not the one modified according to 
optimization 1) says that after failing to match the fourth character 
you can try matching against the second.  So, that's what you do even 
after finding a full match.  Optimization 1 can also be applied to this 
table, though in a slightly different way.

You can find code for KMP inside VisualWorks or GNU Smalltalk (see 
Stream>>#nextPutAll:).

Paolo



More information about the Squeak-dev mailing list