bit.e 1.40

Versatile and fast bit operations on 32-bit integers

Copyrighted Freeware
© Jürgen Lüthje  –  2007, May 3
Standard disclaimer: Use at your own risk!

Euphoria 2.4 Official Release or later required (because parameters are passed to machine code).
Successfully tested with the Euphoria 3.0.2 DOS, Windows and Linux interpreters.



Contents

Introduction

Global functions

Global constants

Version history

Contact



Introduction

The arguments to several functions in this library are checked by user-defined types. After having tested your program, insert "without type_check" into your code somewhere before the statement "include bit.e", in order to gain speed.

Special note:
Observant readers of the source code will notice, that the machine code in this library changes the values of the registers EAX, ECX and EDX without saving and restoring them. This is not by mistake, but it's allowed to do so and is done here deliberately.
More information on this subject is provided e.g. on http://www.agner.org/optimize/.



Global functions

unsigned32

Syntax: x = unsigned32(x)
Description: This function ensures that the returned values have only 32 bits, any bits over that are truncated. It returns unsigned numbers, i.e. numbers in the closed interval [0, #FFFFFFFF].
The argument may be an atom or a sequence.
Examples: ? unsigned32(-1)
? unsigned32({-2, 5, -7})


signed32

Syntax: x = signed32(x)
Description: This function ensures that the returned values have only 32 bits, any bits over that are truncated. It returns signed numbers, i.e. numbers in the closed interval [-#80000000, #7FFFFFFF].
The argument may be an atom or a sequence.
Examples: ? signed32(#FFFFFFFF)
? signed32({#FFFFFFFE, 5, #FFFFFFF9})


and_all

Syntax: x = and_all(s)
Description: Apply the built-in function 'and_bits()' to all elements of sequence 's'.
Returns signed numbers.
Example: ? and_all({3, 5, 7})     -- prints 1


or_all

Syntax: x = or_all(s)
Description: Apply the built-in function 'or_bits()' to all elements of sequence 's'.
Returns signed numbers.
Example: ? or_all({3, 5, 7})      -- prints 7


xor_all

Syntax: x = xor_all(s)
Description: Apply the built-in function 'xor_bits()' to all elements of sequence 's'.
Returns signed numbers.
Example: ? xor_all({3, 5})        -- prints 6


crc32

Syntax: crc = crc32(seed, buffer)
Description: Compute the CRC-32 (this is something like a "fingerprint") of a byte sequence. Since CRC-32 can easily be cracked, it is not meant for cryptographic purposes, but it's good for quick data integrety check. So it's for instance used in ZIP files to be sure that the archive is not corrupted.
If length(buffer)=0, the function returns the given seed value. Pre- and post-conditioning (one's complement) is performed within this function so it shouldn't be done by the application.
Normally 0 is used as seed value. However, the nice thing about this function is that it can not only calculate the 32-bit checksum of a single sequence. By calling the function repeatedly, using its previous result as seed for the next call, it can also calculate the CRC-32 of a data stream. This is e.g. useful for getting the checksum of a huge file, because it can be calculated without loading the complete huge file into memory (see example).
Returns a signed number.
Example: include get.e
include bit.e

constant MAX_CHUNK = 32*1024

function file_crc32 (sequence filename)
   sequence chunk
   atom crc
   integer fn

   fn = open(filename, "rb")
   if fn = -1 then
      return sprintf("File '%s' not found.", {filename})
   end if

   crc = 0              -- always use 0 as initial value
   while 1 do
      chunk = get_bytes(fn, MAX_CHUNK)
      crc = crc32(crc, chunk)
      if length(chunk) < MAX_CHUNK then
         exit
      end if
   end while
   close(fn)
   return crc
end function

object result
sequence filename

filename = "bigfile.dat"
result = file_crc32(filename)
if atom(result) then            -- OK
   printf(1, "CRC-32 of file '%s': #%08x", {filename, result})
else                            -- error
   puts(1, result)
end if
abort(getc(0))


The arguments to the following functions may be atoms or sequences. If a function takes two arguments, and both arguments are sequences, their lengths must be the same. Otherwise the function will abort with an appropriate error message. The rules for operations on sequences apply (see Euphoria docs, refman_2.htm#26).

Arguments denoted by 'x' must be representable as (sequences of) 32-bit numbers, either signed or unsigned. If you intend to manipulate full 32-bit values, you should declare your variables as atom, rather than integer, because Euphoria's integer type is limited to 31 bits.

shift_left

Syntax: x = shift_left(x, count)
Description: Shift the bits in (each element of) 'x' to the left 'count' times, with zeroes shifted in on the right.
Returns signed numbers.
Example 1: atom x
x = 20                   --           (= binary 00010100)
? shift_left(x, 2)       -- prints 80 (= binary 01010000)
Example 2: sequence x
x = {10, 20}             --                 (= binary {00001010, 00010100})
? shift_left(x, 2)       -- prints {40, 80} (= binary {00101000, 01010000})


shift_right

Syntax: x = shift_right(x, count)
Description: Shift the bits in (each element of) 'x' to the right 'count' times, with zeroes shifted in on the left.
Returns signed numbers.
Example: atom x
x = #80000014            --                  (= binary 10000000000000000000000000010100)
x = shift_right(x, 2)
printf(1, "#%08x", {x})  -- prints #20000005 (= binary 00100000000000000000000000000101)


shift_arithmetic_right

Syntax: x = shift_arithmetic_right(x, count)
Description: Shift the bits in (each element of) 'x' to the right 'count' times, with the current sign bit replicated in the leftmost bit.
Returns signed numbers.
Example: atom x
x = #80000014             --                  (= binary 10000000000000000000000000010100)
x = shift_arithmetic_right(x, 2)
printf(1, "#%08x", {x})   -- prints #E0000005 (= binary 11100000000000000000000000000101)


rotate_left

Syntax: x = rotate_left(x, count)
Description: Rotate the bits in (each element of) 'x' to the left 'count' times, with all bits pushed out the left side re-entering on the right.
Returns signed numbers.
Example: atom x
x = #80000014             --                  (= binary 10000000000000000000000000010100)
x = rotate_left(x, 2)
printf(1, "#%08x", {x})   -- prints #00000052 (= binary 00000000000000000000000001010010)


rotate_right

Syntax: x = rotate_right(x, count)
Description: Rotate the bits in (each element of) 'x' to the right 'count' times, with all bits pushed out the right side re-entering on the left.
Returns signed numbers.
Example: atom x
x = 20                    --           (= binary 00000000000000000000000000010100)
? rotate_right(x, 31)     -- prints 40 (= binary 00000000000000000000000000101000)


The following functions let you treat 32-bit integers as "bit arrays". Please note that the rightmost bit is the lowest order bit, and is indexed by 0. The highest order bit is indexed by 31.

bit_set

Syntax: x = bit_set(x, number)
Description: Set the bit in (each element of) 'x', that is indexed by 'number', to 1.
Returns signed numbers.
Example: atom x
x = 20                    --           (= binary 00010100)
? bit_set(x, 0)           -- prints 21 (= binary 00010101)


bit_reset

Syntax: x = bit_reset(x, number)
Description: Set the bit in (each element of) 'x', that is indexed by 'number', to 0.
Returns signed numbers.
Example: atom x
x = 20                     --           (= binary 00010100)
? bit_reset(x, 2)          -- prints 16 (= binary 00010000)


bit_test

Syntax: i = bit_test(x, number)
Description: Return the value (either 0 or 1) of the bit in (each element of) 'x', that is indexed by 'number'.
Example: atom x
x = 20                     -- (= binary 00010100)
? bit_test(x, 2)           -- prints 1


bit_scan_forward

Syntax: i = bit_scan_forward(x)
Description: Return the number of the lowest order bit in (each element of) 'x', that is set (or return -1, if (an element of) x = 0).
Example: atom x
x = 20                     -- (= binary 00010100)
? bit_scan_forward(x)      -- prints 2


bit_scan_reverse

Syntax: i = bit_scan_reverse(x)
Description: Return the number of the highest order bit in (each element of) 'x', that is set (or return -1, if (an element of) x = 0).
Example: atom x
x = 20                     -- (= binary 00010100)
? bit_scan_reverse(x)      -- prints 4


Global constants

Machine code

In order to calculate the CRC-32 of bytes at a given address in memory rather than of a Euphoria sequence, use this:

crc = c_func( CRC32_MEM, {seed, address, number_of_bytes} )

If all arguments to several functions are atoms, you can gain speed by directly calling the appropriate machine code rather than the normal Euphoria function:

a = c_func( SHIFT_LEFT, {a, count} )
a = c_func( SHIFT_RIGHT, {a, count} )
a = c_func( SHIFT_ARITHMETIC_RIGHT, {a, count} )
a = c_func( ROTATE_LEFT, {a, count} )
a = c_func( ROTATE_RIGHT, {a, count} )

a = c_func( BIT_SET, {a, number} )
a = c_func( BIT_RESET, {a, number} )
i  = c_func( BIT_TEST, {a, number} )
i  = c_func( BIT_SCAN_FORWARD, {a} )
i  = c_func( BIT_SCAN_REVERSE, {a} )

Other constants

MIN_32BIT_S     -- least signed 32-bit integer
MAX_32BIT_S    -- greatest signed 32-bit integer
MIN_32BIT_U     -- least unsigned 32-bit integer
MAX_32BIT_U    -- greatest unsigned 32-bit integer


Version history

o 2004, January 5 -- v1.00
  * First public release.

o 2004, January 8 -- v1.10
  * changed: - some internal details

o 2004, May 31 -- v1.20
  + new: - enhanced documentation
              - function unsigned32()
              - function signed32()
              - function and_all()
              - function xor_all()
              - function bit_scan_forward()
              - function bit_scan_reverse()

o 2004, Aug 29 -- v1.30
  + new: - function crc32()

o 2004, Sep 4 -- v1.31
  * changed: - Simplified some machine code.

o 2007, Apr 22 -- v1.32
  * changed: - Simplified some machine code.
                   - Slightly improved documentation.

o 2007, May 3 -- v1.40
  * changed: - crc32() function implemented in machine code. So the function is now about 3 times as
                      fast as before for short byte sequences, and 30 times or more for long byte sequences.
                   - When crc32() was called with an empty buffer, it used to return 0. Now it returns the given seed value.
                   - enhanced documentation


Contact

Please send questions, suggestions, and bug reports to <support {AT} luethje {DOT} eu>.