Weird And Fun Bithacks

Table of Contents

    Background

    Bithacks, bit-twiddling hacks, or bitwise hacks, are tricks or techniques in programming that use bitwise operations to accomplish some task more efficiently than a naive method. This usually takes advantage of the fact that bitwise operations are very fast in every modern CPU, whereas other instructions or methods may be significantly slower.

    Note

    This is still a work in progress, I will continue adding algorithms and methods.

    For the purposes of this page, I will be writing all the bit hacks in C/C++ syntax, however, I will not be doing anything special like inline functions or macros (besides where they are common or necessary.) That should be done on a case-by-case basis in your own code. By that I also mean look at alternatives or possible builtins. Most CPUs nowadays probably have quite a bit of the functionality on this page as instructions.

    This page assumes a default 32 bit integer for functions where that ends up mattering. You may also have to specify some integers as unsigned so the compiler doesn't complain, though these all worked with my (admittedly limited) test program.

    This page also assumes a two's complement system. This means that ~x == -x - 1, and therefore that -x == ~x + 1.

    This page may also make use of some common and not so common functions, all of which will either be part of most standard libraries, or defined here (sometimes in multiple ways). min, max, abs, and sign all have branchless variants farther down the page. Here are the overviews of their behaviors:

    
    max(a, b); // Returns a if a > b else b
    
    min(a, b); // Returns a if a < b else b
    
    abs(x); // Returns x if x > 0 else -x
    
    sign(x); //If x is > 0, it returns 1, if x < 0, it returns -1, otherwise (x == 0) it returns 0
    
    ctz(x); // Counts the number of consecutive trailing zeroes, or the number of zeroes below the lowest set bit.
    
    clz(x); // Counts the number of consecutive leading zeroes, or the number of zeroes about the highest set bit.
    
    popcount(x), popcnt(x); // Counts the number of bits set in an integer
            

    Other Resources

    There are quite a few other sites with information about bit hacks. Here are a few of them:

    Example

    The following is an example of a very famous and commonly used bitwise method for swapping two integers without using extra memory.

    It works using four simple properties of the XOR operation

    
    void swap1(int* a, int* b) {
        *a ^= *b;
        *b ^= *a;
        *a ^= *b;
    }
    		

    The "Hacks"

    Count Trailing/Leading Zeroes (binary search)

    The CTZ function, or "count trailing zeroes", is a function that computes the number of consecutive zeroes in a binary integer starting from the LSB. The simplest method is simply looping through the bits and keeping count, but an unrolled binary search is another method that seems to predate a fair bit of modern computing. There's no reason to use the iterative version.

    The CLZ, or count leading zeros, is the same except it starts from the MSB.

    Also provided here are the algorithms for computing one from the other, which is done by isolating the MSB or LSB.

    Note that in modern processors there are instructions for this, and as far as I am aware they are much faster on any generation of CPU. I'd recommend using that unless you can't guarantee its existence for portability, or you're working with a fairly limited microprocessor.

    
    int ctz_binary(int x) {
        if (x == 0) { return 32; }
        int n = 0;
        if ((x & 0x0000FFFF) == 0) { n += 16; x >>= 16; }
        if ((x & 0x000000FF) == 0) { n += 8; x >>= 8; }
        if ((x & 0x0000000F) == 0) { n += 4; x >>= 4; }
        if ((x & 0x00000003) == 0) { n += 2; x >>= 2; }
        if ((x & 0x00000001) == 0) { n += 1; }
        return n;
    }
    
    int clz_binary(int x) {
        if (x == 0) { return 32; }
        int n = 0;
        if ((x & 0xFFFF0000) == 0) { n += 16; x <<= 16; }
        if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
        if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
        if ((x & 0xC0000000) == 0) { n += 2; x <<= 2; }
        if ((x & 0x80000000) == 0) { n += 1; }
        return n;
    }
    
    
    int ctz_with_clz(int x) {
        return x == 0 ?
            32 :
            32 - (clz_binary(x & -x) + 1);
            // AND x with it's two's complement (-x) clears everything but LSB
            // Thus isolating LSB
    }
    
    int clz_with_ctz(unsigned int x) {
        if (x == 0) { return 32; }
        // We have to isolate MSB, which is more difficult than isolating LSB
        // First we fill in all the lower bits
        x |= x >> 32;
        x |= x >> 16;
        x |= x >>  8;
        x |= x >>  4;
        x |= x >>  2;
        x |= x >>  1;
        // After filling in lower bits we can finally isolate MSB
        x = (x >> 1) + 1;
        // And compute our CLZ function!
        return 32 - (ctz_binary(x) + 1);
    }
    		
    CTZ (perfect hashing table lookup)

    Using a lookup table, we can compute the CTZ using only a few operations. We use a perfect hashing function, which just so happens to have a very convenient modulus for us.

    
    // mod 37 gives us a perfect hash into the table
    // The AND with negation clears everything but the least significant bit.
    int ctz_lookup(unsigned int x) {
    
        // 37 is coprime with all of our single bit values (2^n for n from [0, 31])
        static int hash_table37[] = {
            32, 0, 1, 26, 2, 23, 27, 0,  3,  16, 24, 30, 28, 11, 0,
            13, 4, 7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6,  0,  21,
            14, 9, 5, 20, 8, 19, 18
        };
    
        return hash_table37[(x & -x) % 37];
    }
    		
    CTZ (De Brujin table lookup)

    Using a De Brujin sequence of bits as a constant and multiplying, we will have unique values for every LSB in the higher bits of our result, which we then shift down. We can actually bring the size of our table down to the minimum possible, and we can remove the need for a modulus, which could be considerably slower on certain architectures.

    
    
    
    int ctz_lookup_mul(unsigned int x) {
        static int debrujin_hash_table[] = {
            0,  1,  28, 2,  29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4,  8,
            31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6,  11, 5,  10, 9
        };
    
        // 0x077CB531 is our De Brujin sequence/magic number
        return x == 0 ?
            32 :
            debrujin_hash_table[((x & -x) * 0x077CB531U) >> 27];
    }
    		
    Popcounts

    The population count, as explained above, is a function which returns the number of bits set in it's argument.

    Like CTZ/CLZ, it has modern CPU support. I would recommend using these if you've got that, and I would recommend never using the iterative/slow version. It's awful.

    
    // These ones are basically magic
    // The idea is to build up from interleaving bits
    
    /*
    Constants:
    0x55555555 = 01010101 01010101 01010101 01010101
    0x33333333 = 00110011 00110011 00110011 00110011
    0x0F0F0F0F = 00001111 00001111 00001111 00001111
    0x01010101 = 00000001 00000001 00000001 00000001
    */
    unsigned int popcount(unsigned int x) {
        x -= (x >> 1) & 0x55555555;
        x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
        x = (x + (x >> 4)) & 0x0f0f0f0f;
        return (x * 0x01010101) >> 24;
    }
    
    unsigned int popcount64(unsigned long x) {
        x -= (x >> 1) & 0x5555555555555555;
        x = (x & 0x3333333333333333) + ((x >> 2) & 0x3333333333333333);
        x = (x + (x >> 4)) & 0x0f0f0f0f0f0f0f0f;
        return (x * 0x0101010101010101) >> 56;
    }
    
    // Loops an amount of times equal to the number of bits set
    // Never use this it's so bad
    unsigned int slow_popcount(int x) {
        int c;
        for (c = 0; x; c++) {
            x &= x - 1; // Clear LSB
        }
        return c;
    }
            
    Bit Fills

    Some simple functions to fill everything above the MSB and below the LSB.

    Filling from LSB will keep the LSB set, likewise with MSB. Filling from LSB will clear everything above the LSB, and filling from MSB will clear everything below the MSB.

    
    unsigned int fill_below(unsigned int x) {
        // OR our LSB with everything set below the LSB
        if (x == 0) return 0;
        return ((x & -x) - 1) | (x & -x);
    }
    
    unsigned int fill_above(unsigned int x) {
        if (x == 0) return 0;
        x |= x >> 32;
        x |= x >> 16;
        x |= x >>  8;
        x |= x >>  4;
        x |= x >>  2;
        x |= x >>  1;
        // After filling in lower bits we can finally isolate MSB
        x = (x >> 1) + 1;
    
        // Again we OR our MSB with everything set above it.
        return x | (x ^ -x);
    }
            
    Branchless Versions of Functions

    With modern CPUs, branch prediction can improve speed enormously on situations where the predictor hardware can get good predictions. Most arithmetic, however, is not easily predictable by the branch predictor, and a function that commonly misses branches can have serious performance impacts if it gets evaluated frequently.

    Branchless functions offer fairly simple ways to represent these functions without branches, just by using some intelligent arithmetic tricks.

    
    int max(int x, int y) {
        int mask = x - y;
        return x + ((y - x) & (mask >> 31));
    }
    
    int min(int x, int y) {
        int mask = x - y;
        return y - ((y - x) & (mask >> 31));
    }
    
    int abs(int x) {
        int mask = x >> 31;
        return (x + mask) ^ mask;
    }
    
    int sign(int x) {
        // -1, 0, or +1
        // Comparisons are not branching
        return (x > 0) - (x < 0);
    }
            
    Assorted Arithmetical Equivalencies

    Various combinations of boolean and arithmetic operators combine to simplify to a single operator. This could also be used to expand out and obfuscate some of your math in code or whatever.

    Some of these are rather straightforward, such as the two's complement and negate operators, and some expansions directly into NAND form or the like, but the addition equivalence x + y == 2 * (x | y) - (x ^ y), AND equivalance x & y == (x | y) - (x ^ y), and the XOR equivalence x ^ y == x - y + 2 * (~x & y) are much more interesting.

    
    x + y == 2 * (x | y) - (x ^ y)
    x + y == (x | y) + (x & y)
    
    x - y == 2 * (x | ~y + 1) - (x ^ ~y + 1)
    x - y == (x | ~y + 1) + (x & ~y + 1)
    
    x | y == (x ^ y) | (x & y)
    x | y == ~(~(x & x) & ~(y & y))
    
    x & y == (x | y) - (x ^ y)
    x & y == ~(~(x & y) & ~(x & y))
    
    x ^ y == x - y + 2 * (~x & y)
    x ^ y == (x | y) & ~(x & y)
    x ^ y == ~(x & y) & ~(~x & ~y)
    x ^ y == (~x & y) | (x & ~y)
    
    ~x == ~(x & x)
    ~x == ~(x & (x ^ ~x)) & ~(~x & 0)
    ~x == x ^ (x ^ ~x)
    
    -x == ~(x & x) + 1
    -x == (~(x & (x ^ ~x)) & ~(~x & 0)) + 1
    -x == (x ^ (x ^ ~x)) + 1
    
    // Additionally, addition is easily expressed as a recursive function:
    // a ^ b is addition without carry (addition mod 2) and (a & b) << 1 is carry, which can also be (a & b) * 2
    
    unsigned int obfus_add(unsigned int a, unsigned int b) {
        return ((b == 0) ? a : obfus_add(a ^ b, (a & b) << 1));
    }
    
    // There are a bunch of other cool ones, but there are far too many to list here in any detail.
            
    Modulus 3 Identity and Functions

    Performing modulus 3 on a number has a rather nice identity (From Hacker's Delight, 2nd Edition) which allows you simplify it down a lot.

    For any number $$ 2^k \; \text{mod} \; 3 \equiv \begin{cases} 1 & k \; \text{is even} \\ -1 & k \; \text{is odd} \\ \end{cases} $$

    Since all of our numbers are in binary, the representation of an integer can be transformed from this: $$ \text{unsigned char} \; n = b_7 \cdot 2^7 + b_6 \cdot 2^6 + b_5 \cdot 2^5 + b_4 \cdot 2^4 + b_3 \cdot 2^3 + b_2 \cdot 2^2 + b_1 \cdot 2^1 + b_0 \cdot 2^0 $$ Into this: $$ \text{unsigned char} \; n \; \text{mod} \; 3 = -b_7 + b_6 - b_5 + b_4 - b_3 + b_2 - b_1 + b_1 $$ And since all our bits are either one or 0, it turns out that you can total the number of set odd bits vs even bits and get a result that way. $$ n \; \text{mod} \; 3 \equiv (popcnt(n \wedge \text{0x55555555}) - popcnt(n \wedge \text{0xAAAAAAAA})) \; \text{mod} \; 3 $$ Where the \(\wedge\) symbol is logical and, &. This means that since each binary digit corresponds to a power \(2^n\) for some \(n), you can take the number of even bits with the mask value 0x55555555, and subtract the number of the odd bits (1, 4, 16) with the mask value 0xAAAAAAAA and it will have the same value modulus 3 but be in a much smaller range. Using some identities, you can simplify it further to what we have here.

    
    // n (mod 3) = pop(n ^ 0xAAAAAAAA) - 2 (mod 3)
    // Reduces n drastically
    // It also works mod 11
    
    unsigned int remainder3(unsigned int x) {
        x = popcount(x ^ 0xAAAAAAAA) + 23;
        x = popcount(x ^ 0x2A) - 3;
        // Get it from the range [-3, 2] to [0, 2]
        return x + (((int) x >> 31) & 3);
    }
    
    // Digit summing
    unsigned int remainder3_table(unsigned int x) {
        static char modulus_table_63[62] = {
            0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2,
            0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1,2,
            0,1,2, 0,1,2, 0,1,2, 0,1,2, 0,1
        };
        x = (x >> 16) + (x & 0xFFFF);
        x = (x >>  8) + (x & 0x00FF);
        x = (x >>  4) + (x & 0x000F);
        return modulus_table_63[x];
    }
            

    More To Come

    I'll be adding to this page periodically, and if there's anything specific you'd like to see a solution for, then contact me on my GitHub, which can be found at the bottom of the page.