练习有益健康,拿这书的解法去面人就比较苛刻
2015-05-20
Hacker's delight is a interesting book. The only problem is it skiped many steps and hard to follow. For example, one of the topic is how to cout the number of binary 1s for a unsigned interger.
1. Easy answer start from here
unsigned int CountBitOne(unsigned int x)
{
unsigned int count = 0;
while(x)
{
count += x & 1;
x>>= 1;
}
return count;
}
Note: don't use for loop! That's not efficient as while loop. Think about the sparse 1 case.
2. A well know trick.
unsigned int CountBitOne(unsigned int x)
{
unsigned int count = 0;
while(x)
{
++count;
x &= (x -1);
}
return count;
}
It is easy to prove x&(x-1) reset the least significiant bit of 1 of value. Suppose we have 8bits interger value
x b'XXXX1000 (Capital X means either 1 or 0)
x-1 b'XXXX0111
x&(x-1) b'XXXX0000
3. Can you do more optimization? Look up table is a good, and probbaly the fast solution, just make sure your lookup table is always in cache.
4. Method 1 and 2 count one bit at one time. Can we do more bits at the same time?
Divide and Conquer Algorithm. Suppose there is a 8bit integer 213(11010101 in binary), the algorithm works like this(each time merge two neighbor blocks):
+-------------------------------+
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | <- x
| 1 0 | 0 1 | 0 1 | 0 1 | <- first time merge
| 0 0 1 1 | 0 0 1 0 | <- second time merge
| 0 0 0 0 0 1 0 1 | <- third time ( answer = 00000101 = 5)
+-------------------------------+
uint32_t CountBitOne(uint32_t x)
{
x = (x & 0x55555555) + ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F);
x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF);
x = (x & 0x0000FFFF) + ((x >> 16)& 0x0000FFFF);
return x;
}
I changed the parameter's type to uint32_t sicne the introduction of magic binary numbers.
5. Divide and conqure method provided in Hacker's delight
https://books.google.com/books?id=iBNKMspIlqEC&pg=PA66&hl=en#v=onepage&q&f=false
uint32_t pop(uint32_t x)
{
x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x = x + (x >> 8);
x = x + (x >> 16);
return x & 0x0000003F;
}
The first line of this algorith has the same effect as method 4, proved by truth table
---------------------------------------------
| v | (v >> 1) & 0b0101 | v - x |
---------------------------------------------
0b00 0b00 0b00
0b01 0b00 0b01
0b10 0b01 0b01
0b11 0b01 0b10
6. Last one is the so called "fastest way—without using lookup tables and popcount.", copied from
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
It counts the set bits with just 12 operations.
uint32_t popcount(uint32_t v) {
v = v - ((v >> 1) & 0x55555555); // put count of each 2 bits into those 2 bits
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // put count of each 4 bits into those 4 bits
return ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
The trick is to multiply the result by 0b10101010 which has an interesting property. If our number has four bytes, A B C D, it will result in a new number with these bytes A+B+C+D B+C+D C+D D. A 4 byte number can have maximum of 32 bits set, which can be represented as 0b00100000.
All we need now is the first byte which has the sum of all set bits in all the bytes, and we get it by >> 24.
7. Intel x86 CPU has a POPCOUNT in SSE instruction set to count the number of 1s. On the GNU compiler you can just use:
int __builtin_popcount (unsigned int x);
In the worst case the compiler will generate a call to a function. In the best case the compiler will emit a cpu instruction to do the same job faster. Unfortunately there is no equivelant on ARM yet.