Find first set

In software, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, identifies the least significant index or position of the bit set to one in the word. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm $\lfloor \log_2 x\rfloor$ .^[1] This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit. These four operations also have negated versions:

find first zero (ffz), which identifies the index of the least significant zero bit;
count trailing ones, which counts the number of one bits following the least significant zero bit.
count leading ones, which counts the number of one bits preceding the most significant zero bit;
The operation that finds the index of the most significant zero bit, which is a rounded version of the binary logarithm.

There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1,^[2] herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

Examples

Given the following 32-bit word:

00000000000000001000000000001000

The count trailing zeros operation would return 3, while the count leading zeros operation returns 16. The count leading zeros operation depends on the word size: if this 32-bit word were truncated to a 16-bit word, count leading zeros would return zero. The find first set operation would return 4, indicating the 4th position from the right. The log base 2 is 15.

Similarly, given the following 32-bit word, the bitwise negation of the above word:

11111111111111110111111111110111

The count trailing ones operation would return 3, the count leading ones operation would return 16, and the find first zero operation ffz would return 4.

If the word is zero (no bits set), count leading zeros and count trailing zeros both return the number of bits in the word, while ffs returns zero. Both log base 2 and zero-based implementations of find first set generally return an undefined result for the zero word.

Hardware support

Many architectures include instructions to rapidly perform find first set and/or related operations, listed below. The most common operation is count leading zeros (clz), likely because all other operations can be implemented efficiently in terms of it (see Properties and relations).

Platform	Mnemonic	Name	Word sizes	Description	Result on zero input
ARM (ARMv5T architecture and later)	clz^[3]	Count Leading Zeros	32	clz	32
ARM (ARMv8-A architecture)	clz	Count Leading Zeros	32, 64	clz	input size
AVR32	clz^[4]	Count Leading Zeros	32	clz	32
DEC Alpha	ctlz^[5]	Count Leading Zeros	64	clz	64
DEC Alpha	cttz^[5]	Count Trailing Zeros	64	ctz	64
Intel 80386 and later	bsf^[6]	Bit Scan Forward	16, 32, 64	ctz	source is unchanged^[7], sets zero flag
Intel 80386 and later	bsr^[6]	Bit Scan Reverse	16, 32, 64	log base 2	source is unchanged^[8], sets zero flag
x86 supporting ABM	lzcnt^[9]	Count Leading Zeros	16, 32, 64	clz	input size, sets carry flag
x86 supporting BMI1	tzcnt^[10]	Count Trailing Zeros	16, 32, 64	ctz	input size, sets carry flag
Itanium	clz^[11]	Count Leading Zeros	64	clz	64
MIPS	clz^[12]^[13]	Count Leading Zeros in Word	32, 64	clz	input size
MIPS	clo^[12]^[13]	Count Leading Ones in Word	32, 64	clo	input size
Motorola 68020 and later	bfffo^[14]	Find First One in Bit Field	arbitrary	log base 2	field offset + field width
PDP-10	jffo	Jump if Find First One	36	ctz	Do not jump
POWER/PowerPC/Power ISA	cntlz/cntlzw/cntlzd^[15]	Count Leading Zeros	32, 64	clz	input size
Power ISA 3.0 and later	cnttzw/cnttzd^[16]	Count Trailing Zeros	32, 64	ctz	input size
SPARC Oracle Architecture 2011 and later	lzcnt (synonym: lzd) ^[17]	Leading Zero Count	64	clz	64
VAX	ffs^[18]	Find First Set	0-32	ctz	input size, sets zero flag
z/Architecture	vclz^[19]	Vector Count Leading Zeroes	8, 16, 32, 64	clz	input size
z/Architecture	vctz^[20]	Vector Count Trailing Zeroes	8, 16, 32, 64	ctz	input size

Notes: On some Alpha platforms CTLZ and CTTZ are emulated in software.

Tool and library support

A number of compiler and library vendors supply compiler intrinsics or library functions to perform find first set and/or related operations, which are frequently implemented in terms of the hardware instructions above:

Tool/library	Name	Type	Input type(s)	Notes	Result for zero input
POSIX.1 compliant libc 4.3BSD libc OS X 10.3 libc^[2]^[21]	`ffs`	Library function	int	Includes glibc. POSIX does not supply the complementary log base 2 / clz.	0
FreeBSD 5.3 libc OS X 10.4 libc^[22]	`ffsl` `fls` `flsl`	Library function	int, long	fls ("find last set") computes (log base 2) + 1.	0
FreeBSD 7.1 libc^[23]	`ffsll` `flsll`	Library function	long long		0
GCC	`__builtin_ffs[l,ll,imax]`	Built-in functions	unsigned int, unsigned long, unsigned long long, uintmax_t		0
GCC 3.4.0^[24]^[25] Clang 5.x ^[26]^[27]	`__builtin_clz[l,ll,imax]` `__builtin_ctz[l,ll,imax]`	Built-in functions	unsigned int, unsigned long, unsigned long long, uintmax_t		undefined
Visual Studio 2005	`_BitScanForward`^[28] `_BitScanReverse`^[29]	Compiler intrinsics	unsigned long, unsigned __int64	Separate return value to indicate zero input	0
Visual Studio 2008	`__lzcnt`^[30]	Compiler intrinsic	unsigned short, unsigned int, unsigned __int64	Relies on x64-only lzcnt instruction	Input size in bits
Intel C++ Compiler	`_bit_scan_forward` `_bit_scan_reverse`^[31]	Compiler intrinsics	int		undefined
NVIDIA CUDA^[32]	`__clz`	Functions	32-bit, 64-bit	Compiles to fewer instructions on the GeForce 400 Series	32
NVIDIA CUDA^[32]	`__ffs`	Functions	32-bit, 64-bit	Compiles to fewer instructions on the GeForce 400 Series	0
LLVM	`llvm.ctlz.` `llvm.cttz.`^[33]	Intrinsic	8, 16, 32, 64, 256	LLVM assembly language	Input size if arg 2 is 0, else undefined
GHC 7.10 (base 4.8), in `Data.Bits`	`countLeadingZeros` `countTrailingZeros`	Library function	`FiniteBits b => b`	Haskell programming language	Input size in bits

Properties and relations

If bits are labeled starting at 1 (which is the convention used in this article), then count trailing zeros and find first set operations are related by $ctz(x) = ffs(x) - 1$ (except when the input is zero). If bits are labeled starting at 0, then count trailing zeros and find first set are exactly equivalent operations.

Given w bits per word, the log base 2 is easily computed from the clz and vice versa by $lg(x) = w - 1 - clz(x)$ .

As demonstrated in the example above, the find first zero, count leading ones, and count trailing ones operations can be implemented by negating the input and using find first set, count leading zeros, and count trailing zeros. The reverse is also true.

On platforms with an efficient log base 2 operation such as M68000, ctz can be computed by:

ctz(x) = lg(x & (−x))

where "&" denotes bitwise AND and "−x" denotes the negative of x treating x as a signed integer in two's complement arithmetic. The expression x & (−x) clears all but the least-significant 1 bit, so that the most- and least-significant 1 bit are the same.

On platforms with an efficient count leading zeros operation such as ARM and PowerPC, ffs can be computed by:

ffs(x) = w − clz(x & (−x)).

Conversely, clz can be computed using ctz by first rounding up to the nearest power of two using shifts and bitwise ORs,^[34] as in this 32-bit example (note that this example depends on ctz returning 32 for the zero input):

function clz(x):
    for each y in {1, 2, 4, 8, 16}: x ← x | (x >> y)
    return 32 − ctz(x + 1)

On platforms with an efficient Hamming weight (population count) operation such as SPARC's POPC^[35]^[36] or Blackfin's ONES,^[37] ctz can be computed using the identity:^[38]^[39]

ctz(x) = pop((x & (−x)) − 1),

ffs can be computed using:^[40]

ffs(x) = pop(x ^ (~(−x)))

where "^" denotes bitwise xor, and clz can be computed by:

function clz(x):
    for each y in {1, 2, 4, 8, 16}: x ← x | (x >> y)
    return 32 − pop(x)

The inverse problem (given i, produce an x such that ctz(x)=i) can be computed with a left-shift (1 << i).

Find first set and related operations can be extended to arbitrarily large bit arrays in a straightforward manner by starting at one end and proceeding until a word that is not all-zero (for ffs/ctz/clz) or not all-one (for ffz/clo/cto) is encountered. A tree data structure that recursively uses bitmaps to track which words are nonzero can accelerate this.

Algorithms

FFS

Find first set can also be implemented in software. A simple loop implementation:

function ffs (x)
    if x = 0 return 0
    t ← 1
    r ← 1
    while (x & t) = 0
        t ← t << 1
        r ← r + 1
    return r

where "<<" denotes left-shift. Similar loops can be used to implement all the related operations. On modern architectures this loop is inefficient due to a large number of conditional branches. A lookup table can eliminate most of these:

table[0..2ⁿ-1] = ffs(i) for i in 0..2ⁿ-1
function ffs_table (x)
    if x = 0 return 0
    r ← 0
    loop
        if (x & (2ⁿ-1)) ≠ 0
            return r + table[x & (2ⁿ-1)]
        x ← x >> n
        r ← r + n

The parameter n is fixed (typically 8) and represents a time–space tradeoff. The loop may also be fully unrolled.

CTZ

Count Trailing Zeros (ctz) counts the number of zero bits succeeding the least significant one bit. For example, the ctz of 0x00000F00 is 8, and the ctz of 0x80000000 is 31.

An algorithm for 32-bit ctz by Leiserson, Prokop, and Randall uses de Bruijn sequences to construct a minimal perfect hash function that eliminates all branches:^[41] ^[42] This algorithm requires a CPU with a 32-bit multiply instruction with a 64-bit result. The 32-bit multiply instruction in the low-cost ARM Cortex-M0 / M0+ / M1 cores have a 32-bit result, though other ARM cores have another multiply instruction with a 64-bit result.^[43]

table[0..31] initialized by: for i from 0 to 31: table[ ( 0x077CB531 * ( 1 << i ) ) >> 27 ] ← i
function ctz_debruijn (x)
    return table[((x & (-x)) * 0x077CB531) >> 27]

The expression (x & (-x)) again isolates the least-significant 1 bit. There are then only 32 possible words, which the unsigned multiplication and shift hash to the correct position in the table. (Note: this algorithm does not handle the zero input.) A similar algorithm works for log base 2, but rather than isolate the most-significant bit, it rounds up to the nearest integer of the form 2ⁿ−1 using shifts and bitwise ORs:^[44]

table[0..31] = {0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
                8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31}
function lg_debruijn (x)
    for each y in {1, 2, 4, 8, 16}: x ← x | (x >> y)
    return table[(x * 0x07C4ACDD) >> 27]

A binary search implementation which takes a logarithmic number of operations and branches, as in these 32-bit versions:^[45]^[46] This algorithm can be assisted by a table as well, replacing the bottom three "if" statements with a 256 entry lookup table using the final byte as an index.

function ctz (x)
    if x = 0 return 32
    n ← 0
    if (x & 0x0000FFFF) = 0: n ← n + 16, x ← x >> 16
    if (x & 0x000000FF) = 0: n ← n +  8, x ← x >>  8
    if (x & 0x0000000F) = 0: n ← n +  4, x ← x >>  4
    if (x & 0x00000003) = 0: n ← n +  2, x ← x >>  2
    if (x & 0x00000001) = 0: n ← n +  1
    return n

CLZ

Count Leading Zeros (clz) counts the number of zero bits preceding the most significant one bit. For example, the clz of 0x00000F00 is 20, and the clz of 0x00000001 is 31.

Just as count leading zeros is useful for software floating point implementations, conversely, on platforms that provide hardware conversion of integers to floating point, the exponent field can be extracted and subtracted from a constant to compute the count of leading zeros. Corrections are needed to account for rounding errors.^[45]^[47]

The following C language examples require a header file for the definition of uint8_t, uint16_t, uint32_t. It is stated here instead of repeating in each code example.

/* uint8_t is a 8-bit unsigned integer, uint16_t is a 16-bit unsigned integer, uint32_t is a 32-bit unsigned integer */
#include <stdint.h> /* exists in C99 compatible C/C++ compilers */

The non-optimized approach examines one bit at a time until a non-zero bit is found, as shown in this C language example, and slowest with an input value of 1 because of the many loops it has to perform to find it.

int clz1( uint32_t x )
{
  int n;
  if (x == 0) return 32;
  for (n = 0; ((x & 0x80000000) == 0); n++, x <<= 1);
  return n;
}

An evolution of the previous looping approach examines four bits at a time then using a lookup table for the final four bits, which is shown here. A faster looping approach would examine eight bits at a time and increasing to a 256 entry lookup table.

static const uint8_t clz_table_4bit[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 };
int clz2( uint32_t x )
{
  int n;
  if (x == 0) return 32;
  for (n = 0; ((x & 0xF0000000) == 0); n += 4, x <<= 4);
  n += (int)clz_table_4bit[x >> (32-4)];
  return n;
}

Faster than the looping method is a binary search implementation which takes a logarithmic number of operations and branches, as in these 32-bit versions:^[45]^[46]

function clz3(x)
    if x = 0 return 32
    n ← 0
    if (x & 0xFFFF0000) = 0: n ← n + 16, x ← x << 16
    if (x & 0xFF000000) = 0: n ← n +  8, x ← x <<  8
    if (x & 0xF0000000) = 0: n ← n +  4, x ← x <<  4
    if (x & 0xC0000000) = 0: n ← n +  2, x ← x <<  2
    if (x & 0x80000000) = 0: n ← n +  1
    return n

The binary search algorithm can be assisted by a table as well, replacing the bottom two "if" statements with a 16 entry lookup table using the final nibble (4-bits) as an index, which is shown here. An alternate approach replaces the bottom three "if" statements with a 256 entry lookup table using the final byte (8-bits) as an index. In both of these methods, the initial check for zero is removed because the final table operation takes care of it.

static const uint8_t clz_table_4bit[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 };
int clz4( uint32_t x )
{
  int n;
  if ((x & 0xFFFF0000) == 0) {n  = 16; x <<= 16;} else {n = 0;}
  if ((x & 0xFF000000) == 0) {n +=  8; x <<=  8;}
  if ((x & 0xF0000000) == 0) {n +=  4; x <<=  4;}
  n += (int)clz_table_4bit[x >> (32-4)];
  return n;
}

The fastest practical approach to simulate clz uses a precomputed 64KB lookup table, as shown in this C language example.

/* The table MUST be calculated before calling this function */
static uint8_t clz_table_16bit[65536];
int clz5( uint32_t x )
{
  if ((x & 0xFFFF0000) == 0)
    return (int)clz_table_16bit[x] + 16;
  else
    return (int)clz_table_16bit[x >> 16];
}

Other than a dedicated assembly instruction that performs the CLZ type operation, the fastest method to compute CLZ is reading a pre-computed value from a lookup table. A 4-bit lookup table, clz_table_4bit[16], is used in above examples. The following are C language examples of CLZ for a 8-bit, 16-bit, 32-bit input value. The tables must be pre-computed by functions not shown here. An alternate 8-bit approach could pack two results in each table entry thus needing a 128 entry table instead of 256 entry table, because the bit count is 0 to 8 which fits in a 4-bit nibble.

/* Note: Tables MUST be calculated before calling each function or macro that reads from it */

uint8_t clz_table_8bit[256];
int clz8f( uint8_t x )
{
  return (int)clz_table_8bit[x];
}
#define clz8d(x) (clz_table_8bit[x])

uint8_t clz_table_16bit[65536];
int clz16f( uint16_t x )
{
  return (int)clz_table_16bit[x];
}
#define clz16d(x) (clz_table_16bit[x])

uint8_t clz_table_32bit[4294967296]; /* conceptual, but 4GB array is not practical */
int clz32f( uint32_t x )
{
  return (int)clz_table_32bit[x];
}
#define clz32d(x) (clz_table_32bit[x])

Applications

The count leading zeros (clz) operation can be used to efficiently implement normalization, which encodes an integer as m × 2^e, where m has its most significant bit in a known position (such as the highest position). This can in turn be used to implement Newton-Raphson division, perform integer to floating point conversion in software, and other applications.^[45]^[48]

Count leading zeros (clz) can be used to compute the 32-bit predicate "x = y" (zero if true, one if false) via the identity clz(x − y) >> 5, where ">>" is unsigned right shift.^[49] It can be used to perform more sophisticated bit operations like finding the first string of n 1 bits.^[50] The expression 1 << (16 − clz(x − 1)/2) is an effective initial guess for computing the square root of a 32-bit integer using Newton's method.^[51] CLZ can efficiently implement null suppression, a fast data compression technique that encodes an integer as the number of leading zero bytes together with the nonzero bytes.^[52] It can also efficiently generate exponentially distributed integers by taking the clz of uniformly random integers.^[45]

The log base 2 can be used to anticipate whether a multiplication will overflow, since $\lceil\log_2 xy\rceil \leq \lceil\log_2 x\rceil + \lceil\log_2 y\rceil$ .^[53]

Count leading zeros and count trailing zeros can be used together to implement Gosper's loop-detection algorithm,^[54] which can find the period of a function of finite range using limited resources.^[46]

The binary GCD algorithm spends many cycles removing trailing zeros; this can be replaced by a count trailing zeros (ctz) followed by a shift. A similar loop appears in computations of the hailstone sequence.

A bit array can be used to implement a priority queue. In this context, find first set (ffs) is useful in implementing the "pop" or "pull highest priority element" operation efficiently. The Linux kernel real-time scheduler internally uses sched_find_first_bit() for this purpose.^[55]

The count trailing zeros operation gives a simple optimal solution to the Tower of Hanoi problem: the disks are numbered from zero, and at move k, disk number ctz(k) is moved the minimum possible distance to the right (circling back around to the left as needed). It can also generate a Gray code by taking an arbitrary word and flipping bit ctz(k) at step k.^[46]

References

↑ Anderson, Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)
1 2 "FFS(3)". Linux Programmer's Manual. The Linux Kernel Archives. Retrieved 2 January 2012.
↑ "ARM Instruction Reference > ARM general data processing instructions > CLZ". ARM Developer Suite Assembler Guide. ARM. Retrieved 3 January 2012.
↑ "AVR32 Architecture Document" (PDF). Atmel. Retrieved 2016-10-22.
1 2 Alpha Architecture Reference Manual (PDF). Compaq. 2002. pp. 4–32, 4–34.
1 2 Intel 64 and IA-32 Architectures Software Developer Manual. Volume 2A: Intel. pp. 3-92&ndash, 3–97. Order number 325383.
↑ AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions (PDF).
↑ AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions (PDF).
↑ AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions3 (PDF). AMD. 2011. pp. 204&ndash, 5.
↑ "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System Instructions" (PDF). amd.com. AMD. October 2013. Retrieved 2014-01-02.
↑ Intel Itanium Architecture Software Developer's Manual. Volume 3: Intel Itanium Instruction Set. Intel. 2010. pp. 3:38.
1 2 MIPS Architecture For Programmers. Volume II-A: The MIPS32 Instruction Set (Revision 3.02 ed.). MIPS Technologies. 2011. pp. 101–102.
1 2 MIPS Architecture For Programmers. Volume II-A: The MIPS64 Instruction Set (Revision 3.02 ed.). MIPS Technologies. 2011. pp. 105, 107, 122, 123.
↑ M68000 Family Programmer's Reference Manual (PDF). Motorola. 1992. pp. 4-43&ndash, 4–45.
↑ Frey, Brad. PowerPC Architecture Book (Version 2.02 ed.). 3.3.11 Fixed-Point Logical Instructions: IBM. p. 70.
↑ Power ISA Version 3.0B. 3.3.13 Fixed-Point Logical Instructions, 3.3.13.1 64-bit Fixed-Point Logical Instructions: IBM. pp. 95, 98.
↑ Oracle SPARC Architecture 2011. Oracle.
↑ VAX Architecture Reference Manual (PDF). DEC. 1987. pp. 70–71.
↑ IBM z/Architecture Principles of Operation (PDF) (Eleventh ed.). Chapter 22. Vector Integer Instructions: IBM. March 2015. p. 22-10.
↑ IBM z/Architecture Principles of Operation (PDF) (Eleventh ed.). Chapter 22. Vector Integer Instructions: IBM. March 2015. p. 22-10.
↑ "FFS(3)". Mac OS X Developer Library. Apple, Inc. 1994-04-19. Retrieved 4 January 2012.
↑ "FFS(3)". Mac OS X Developer Library. Apple. 2004-01-13. Retrieved 4 January 2012.
↑ "FFS(3)". FreeBSD Library Functions Manual. The FreeBSD Project. Retrieved 4 January 2012.
↑ "Other built-in functions provided by GCC". Using the GNU Compiler Collection (GCC). Free Software Foundation, Inc. Retrieved 14 November 2015.
↑ "GCC 3.4.0 ChangeLog". GCC 3.4.0. Free Software Foundation, Inc. Retrieved 14 November 2015.
↑ "Clang Language Extensions, chapter Builtin Functions". Clang supports a number of builtin library functions with the same syntax as GCC. The Clang Team. Retrieved 9 April 2017.
↑ "Source code of Clang". LLVM Team, University of Illinois at Urbana-Champaign. Retrieved 9 April 2017.
↑ "_BitScanForward, _BitScanForward64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 21 May 2018.
↑ "_BitScanReverse, _BitScanReverse64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 21 May 2018.
↑ "__lzcnt16, __lzcnt, __lzcnt64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 3 January 2012.
↑ Intel C++ Compiler for Linux Intrinsics Reference. Intel. 2006. p. 21.
↑ NVIDIA CUDA Programming Guide (PDF) (Version 3.0 ed.). NVIDIA. 2010. p. 92.
↑ "'llvm.ctlz.*' Intrinsic, 'llvm.cttz.*' Intrinsic". LLVM Language Reference Manual. The LLVM Compiler Infrastructure. Retrieved 23 February 2016.
↑ Anderson, Round up to the next highest power of 2.
↑ SPARC International, Inc. (1992). "A.41: Population Count. Programming Note". The SPARC architecture manual: version 8 (PDF) (Version 8 ed.). Englewood Cliffs, New Jersey, USA: Prentice Hall. p. 231. ISBN 0-13-825001-4. Archived from the original (PDF) on 2012-01-18.
↑ Warren, Jr., Henry S. (2013) [2002]. Hacker's Delight (2 ed.). Addison Wesley - Pearson Education, Inc. ISBN 978-0-321-84268-8. 0-321-84268-5.
↑ Blackfin Instruction Set Reference (Preliminary ed.). Analog Devices. 2001. pp. 8–24. Part Number 82-000410-14.
↑ Dietz, Henry Gordon. "The Aggregate Magic Algorithms". University of Kentucky.
↑ Isenberg, Gerd. forward-Index of LS1B by Popcount "BitScanProtected" Check |url= value (help). Chess Programming Wiki. Retrieved 3 January 2012.
↑ SPARC International, Inc. (1992). The SPARC architecture manual : version 8 (PDF) (Version 8. ed.). Englewood Cliffs, N.J.: Prentice Hall. p. 231. ISBN 0-13-825001-4. Archived from the original (PDF) on 2012-01-18. A.41: Population Count. Programming Note.
↑ Leiserson, Charles E.; Prokop, Harald; Randall, Keith H. (1998), Using de Bruijn Sequences to Index a 1 in a Computer Word (PDF)
↑ Busch, Philip (2009), Computing trailing Zeros HOWTO (PDF)
↑ Cortex-M0 r0p0 Technical Reference Manual; ARM Holdings.
↑ Anderson, Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup
1 2 3 4 5 Warren, Section 5-3: Counting Leading 0's.
1 2 3 4 Warren, Section 5-4: Counting Trailing 0's.
↑ Anderson, Find the integer log base 2 of an integer with an 64-bit IEEE float.
↑ Sloss, Andrew N.; Symes, Dominic; Wright, Chris (2004). ARM system developer's guide designing and optimizing system software (1st ed.). San Francisco, CA: Morgan Kaufmann. pp. 212–213. ISBN 1-55860-874-5.
↑ Warren, Section 2-11: Comparison Predicates
↑ Warren, Section 6-2. Find First String of 1-Bits of a Given Length.
↑ Warren, 11-1: Integer Square Root.
↑ Schlegel, Benjamin; Rainer Gemulla; Wolfgang Lehner (June 2010). "Fast integer compression using SIMD instructions". Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN 2010): 34–40. doi:10.1145/1869389.1869394.
↑ Warren, Section 2-12. Overflow Detection.
↑ Gosper, Bill (1972). "Loop detector". HAKMEM (239): Item 132.
↑ Aas, Josh (2005). Understanding the Linux 2.6.8.1 CPU Scheduler (PDF). Silicon Graphics, Inc. p. 19.

External links

Intel Intrinsics Guide
Chess Programming Wiki: BitScan: A detailed explanation of a number of implementation methods for ffs (called LS1B) and log base 2 (called MS1B).

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Anderson, Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)

[ffsmanpage-2] 1 2 "FFS(3)". Linux Programmer's Manual. The Linux Kernel Archives. Retrieved 2 January 2012.

[3] "ARM Instruction Reference > ARM general data processing instructions > CLZ". ARM Developer Suite Assembler Guide. ARM. Retrieved 3 January 2012.

[4] "AVR32 Architecture Document" (PDF). Atmel. Retrieved 2016-10-22.

[alpha-5] 1 2 Alpha Architecture Reference Manual (PDF). Compaq. 2002. pp. 4–32, 4–34.

[intel_dev_manual-6] 1 2 Intel 64 and IA-32 Architectures Software Developer Manual. Volume 2A: Intel. pp. 3-92&ndash, 3–97. Order number 325383.

[7] AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions (PDF).

[8] AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions (PDF).

[9] AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions3 (PDF). AMD. 2011. pp. 204&ndash, 5.

[10] "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System Instructions" (PDF). amd.com. AMD. October 2013. Retrieved 2014-01-02.

[11] Intel Itanium Architecture Software Developer's Manual. Volume 3: Intel Itanium Instruction Set. Intel. 2010. pp. 3:38.

[mips32-12] 1 2 MIPS Architecture For Programmers. Volume II-A: The MIPS32 Instruction Set (Revision 3.02 ed.). MIPS Technologies. 2011. pp. 101–102.

[mips64-13] 1 2 MIPS Architecture For Programmers. Volume II-A: The MIPS64 Instruction Set (Revision 3.02 ed.). MIPS Technologies. 2011. pp. 105, 107, 122, 123.

[14] M68000 Family Programmer's Reference Manual (PDF). Motorola. 1992. pp. 4-43&ndash, 4–45.

[15] Frey, Brad. PowerPC Architecture Book (Version 2.02 ed.). 3.3.11 Fixed-Point Logical Instructions: IBM. p. 70.

[16] Power ISA Version 3.0B. 3.3.13 Fixed-Point Logical Instructions, 3.3.13.1 64-bit Fixed-Point Logical Instructions: IBM. pp. 95, 98.

[17] Oracle SPARC Architecture 2011. Oracle.

[18] VAX Architecture Reference Manual (PDF). DEC. 1987. pp. 70–71.

[19] IBM z/Architecture Principles of Operation (PDF) (Eleventh ed.). Chapter 22. Vector Integer Instructions: IBM. March 2015. p. 22-10.

[20] IBM z/Architecture Principles of Operation (PDF) (Eleventh ed.). Chapter 22. Vector Integer Instructions: IBM. March 2015. p. 22-10.

[21] "FFS(3)". Mac OS X Developer Library. Apple, Inc. 1994-04-19. Retrieved 4 January 2012.

[22] "FFS(3)". Mac OS X Developer Library. Apple. 2004-01-13. Retrieved 4 January 2012.

[23] "FFS(3)". FreeBSD Library Functions Manual. The FreeBSD Project. Retrieved 4 January 2012.

[24] "Other built-in functions provided by GCC". Using the GNU Compiler Collection (GCC). Free Software Foundation, Inc. Retrieved 14 November 2015.

[25] "GCC 3.4.0 ChangeLog". GCC 3.4.0. Free Software Foundation, Inc. Retrieved 14 November 2015.

[26] "Clang Language Extensions, chapter Builtin Functions". Clang supports a number of builtin library functions with the same syntax as GCC. The Clang Team. Retrieved 9 April 2017.

[27] "Source code of Clang". LLVM Team, University of Illinois at Urbana-Champaign. Retrieved 9 April 2017.

[28] "_BitScanForward, _BitScanForward64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 21 May 2018.

[29] "_BitScanReverse, _BitScanReverse64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 21 May 2018.

[30] "__lzcnt16, __lzcnt, __lzcnt64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 3 January 2012.

[31] Intel C++ Compiler for Linux Intrinsics Reference. Intel. 2006. p. 21.

[32] NVIDIA CUDA Programming Guide (PDF) (Version 3.0 ed.). NVIDIA. 2010. p. 92.

[33] "'llvm.ctlz.*' Intrinsic, 'llvm.cttz.*' Intrinsic". LLVM Language Reference Manual. The LLVM Compiler Infrastructure. Retrieved 23 February 2016.

[34] Anderson, Round up to the next highest power of 2.

[SPARC_1992-35] SPARC International, Inc. (1992). "A.41: Population Count. Programming Note". The SPARC architecture manual: version 8 (PDF) (Version 8 ed.). Englewood Cliffs, New Jersey, USA: Prentice Hall. p. 231. ISBN 0-13-825001-4. Archived from the original (PDF) on 2012-01-18.

[Warren_2013-36] Warren, Jr., Henry S. (2013) [2002]. Hacker's Delight (2 ed.). Addison Wesley - Pearson Education, Inc. ISBN 978-0-321-84268-8. 0-321-84268-5.

[AD_2001-37] Blackfin Instruction Set Reference (Preliminary ed.). Analog Devices. 2001. pp. 8–24. Part Number 82-000410-14.

[38] Dietz, Henry Gordon. "The Aggregate Magic Algorithms". University of Kentucky.

[39] Isenberg, Gerd. forward-Index of LS1B by Popcount "BitScanProtected" Check |url= value (help). Chess Programming Wiki. Retrieved 3 January 2012.

[40] SPARC International, Inc. (1992). The SPARC architecture manual : version 8 (PDF) (Version 8. ed.). Englewood Cliffs, N.J.: Prentice Hall. p. 231. ISBN 0-13-825001-4. Archived from the original (PDF) on 2012-01-18. A.41: Population Count. Programming Note.

[41] Leiserson, Charles E.; Prokop, Harald; Randall, Keith H. (1998), Using de Bruijn Sequences to Index a 1 in a Computer Word (PDF)

[42] Busch, Philip (2009), Computing trailing Zeros HOWTO (PDF)

[M0-TRM-43] Cortex-M0 r0p0 Technical Reference Manual; ARM Holdings.

[44] Anderson, Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup

[hackersdelight-clz-45] 1 2 3 4 5 Warren, Section 5-3: Counting Leading 0's.

[hackersdelight-ctz-46] 1 2 3 4 Warren, Section 5-4: Counting Trailing 0's.

[47] Anderson, Find the integer log base 2 of an integer with an 64-bit IEEE float.

[48] Sloss, Andrew N.; Symes, Dominic; Wright, Chris (2004). ARM system developer's guide designing and optimizing system software (1st ed.). San Francisco, CA: Morgan Kaufmann. pp. 212–213. ISBN 1-55860-874-5.

[49] Warren, Section 2-11: Comparison Predicates

[50] Warren, Section 6-2. Find First String of 1-Bits of a Given Length.

[51] Warren, 11-1: Integer Square Root.

[52] Schlegel, Benjamin; Rainer Gemulla; Wolfgang Lehner (June 2010). "Fast integer compression using SIMD instructions". Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN 2010): 34–40. doi:10.1145/1869389.1869394.

[53] Warren, Section 2-12. Overflow Detection.

[54] Gosper, Bill (1972). "Loop detector". HAKMEM (239): Item 132.

[55] Aas, Josh (2005). Understanding the Linux 2.6.8.1 CPU Scheduler (PDF). Silicon Graphics, Inc. p. 19.