### Dealing with Bits and Bytes

In its role as a high level assembler used for writing operating systems, C is often used to access memory locations and change individual bits. You might for example need to access individual bits in an int. It can sometimes be useful to conserve memory by using a byte to hold 8 flags though with an abundance of ram it's common to just use one byte per flag or even one per int. Knowing how to extract or alter individual bits is still worthwhile knowing; you may never have to use it but when you have to maintain code that does you'll know how to.

If you need to remind yourself about binary this article What is Software may help.

An int holds 32 bits. It's conventional to number them with the highest bit on the left (31) to right (0) for 32 bit CPUs and 63 to 0 for 64 bit CPUs.

```
33222222-22221111-11111100-00000000
10987654-32109876-54321098-76543210
```

The dash shows where the break between bytes is located. To access individual bits we need to work on whole bytes or ints and then isolate the particular bit. This is where the binary operations, "and", "or" and "xor" are used. Don't confuse these with the same name logical operations. Logical operations use two characters, binary use one. A "Logical and" is like this

```
if (a && b )
{
c
}
```

and means that if expression a is non zero and expression b is non zero then do statement c;

A "Binary and" is like this, with one & character.

```
c= a & b;
```

c = the result of "binary anding" the two numbers a, b together. If a,b and c are all bytes then this "binary ands" all 8 bits of a with the corresponding bits of b to get c. Each bit of c is 1 if the corresponding bits of a and b are both 1. If either a or b is zero (or both) then the bit in c is also 0.

` a 10010110`_{2}
b 01111100_{2}
--------------------
a & b 00010100_{2}

**On the next Page** - Or and Xor.

### Manipulating Individual Bits with Binary Operations

If byte b is "binary anded" with a byte m where all the bits of m are 0 except for the bit in b that we are interested in, then we can isolate that bit. If the value of the expression is non zero then that bit is set, if it zero it isn't.

So

```
if (a & b)
printf("a & b was non zero") ;
else
printf("a & b was zero") ;
```

### Beware - Trap Ahead!

You might think from this that there seems little difference between (a && b) and (a & b). Sometimes this is true, but not always. E.g. if a is 4 and b is 2, then (a && b) is true as both are non zero but (a & b) is false!

` a 00000100`_{2}
b 00000010_{2}
-------------------
a & b 00000000_{2}

### Setting Bits

Use the "binary or" operation for this which sets any bits to 1 in the result if either (or both) of the two numbers has a 1 in its corresponding bits.

` a 10010110`_{2}
b 01111100_{2}
--------------------
a | b 11111110_{2}

So to set the bit in a (below) at position 2 to a 1 we or with b = 00000100

` a 10010010`_{2}
b 00000100_{2}
--------------------
a | b 10010110_{2}

### Toggle that bit!

The 3rd binary operation is "exclusive or" or xor which in C uses the ^ operator. If two numbers a and b are xored together then each bit in the result is 0 If the corresponding bits are identical, 1 if they are different. If it helps, think of xor as a "Not the same" function.

To toggle any bit, simply "binary xor" the number with a 1 set in the appropriate bit position. If the corresponding bit is 1, the result bit is 0 etc.

` a 10010010`_{2}
b 00000100_{2}
--------------------
a ^ b 10010110_{2}

**On the next page** - An example of printing binary numbers.

### Example 1 - Using a Binary And to Extract Bits

### Example 1

This example prints out ints in binary.

Download Example 1

```
// ex1.c
//
#include <stdio.h>
void PrintBinary(int x,int d)
{
char buffer[33];
int index;
index=0;
while (d >0) {
if (x & 1)
buffer[index++]='1';
else
buffer[index++]='0';
x >>= 1;
d--;
}
while (index >0 )
printf("%c",buffer[--index]) ;
printf("B\n") ;
return;
}
int main(int argc, char* argv[])
{
PrintBinary(10,6) ;
PrintBinary(8,4) ;
PrintBinary(32765,16) ;
return 0;
}
```

This processes an int from right to left. It uses a "binary and" to get the rightmost bit, putting '1' for non zero and '0' in a buffer. This continues for as many digits as were specified in the second

to PrintBinary().

` x >>= 1; `

This uses the right shift operator >> to rotate x by one bit to the right. This is exactly the same as dividing by 2 and many

use shift operations to optimize integer division. You could have used

```
x /= 2;
```

instead but I think >>= is more true to the meaning, and the compiler will probably generate right shift anyway.

The buffer is filled out in reverse. so PrintBinary(10,6) for example puts 010100 in the buffer then prints it out in reverse in the while loop. This makes it very easy to add leading zeros.

**On the next page** More about this example

### More about Example 1

The image shows 10_{10} in decimal which is the same as 1010_{2} in binary. 10 is made up of 8 + 2, so there are 1s in the binary positions for 8 and 2 and 0s elsewhere. There are only 4 bits shown but if there were 8 then the values of those bit positions would be 128 64 32 16. The value of any bit position n where n varies from 0 to 31 or 63 is 2^{n}. Anything raised to the power of 0 is always 1.

Because the value of d (the number of digits to output) is 6, this loops 6 times. Steps 5 and 6 are identical and just output 0 because x is already 0 after step 4. The red box highlights the output and shows how 010100 is output.

If you feel confident you can simplify this further with two changes. Turn the while into a for loop (with no initalizing part!) and condense the if into a one line statement. The brackets around the x & 1 are needed as + has a higher precedence than binary & so if you remove them, it will generate rubbish. This works because you can use ints and chars together in this way. The longer if statement is easier to understand and more portable and would be my preference

```
void PrintBinary(int x,int d)
{
char buffer[33];
int index=0;
for (;d>0;d--)
{
buffer[index++] = '0'+(x & 1) ;
x >>= 1;
}
while (index >0 )
printf("%c",buffer[--index]) ;
printf("B\n") ;
return;
}
```

**On the next page** : An example of and, or and xor.

### Another Example that Demonstrates all three Binary Operations

Example 2, listed below uses this PrintBinary() function to demonstrate the three binary operations.

Download Example 2

```
int a =0x45;
int b= 0x67;
int c;
printf("This is a & b\n") ;
printf("a ") ;PrintBinary(a,8) ;
printf("b ") ;PrintBinary(b,8) ;
c= a & b;
printf(" ---------\n") ;
printf("a&b ") ;PrintBinary(c,8) ;
printf("\nThis is a | b\n") ;
printf("a ") ;PrintBinary(a,8) ;
printf("b ") ;PrintBinary(b,8) ;
c= a | b;
printf(" ---------\n") ;
printf("a|b ") ;PrintBinary(c,8) ;
printf("\nThis is a ^ b\n") ;
printf("a ") ;PrintBinary(a,8) ;
printf("b ") ;PrintBinary(b,8) ;
c= a ^ b;
printf(" ---------\n") ;
printf("a^b ") ;PrintBinary(c,8) ;
```

Just as >> shifts bits to the right then << shifts them to the left and is the same as multiplying by 2 for each position you shift them left.

X << n is the same as X* 2^{n}. 5 << 4 is 80. So is 5 *2^{4}

### Flipping Bits

Another bit operator, most commonly seen in if statements is ~, known as the not operator, it flips every bit. It's the same as xor with -1 but not quite as intuitive. These two statements do exactly the same.

```
c =~c;
c^=-1;
```

Some caution should be exercised with using ~ on

numbers or characters.

numbers, those that start at 0 are fine but if applied to a negative number, it will become positive and vice versa. The CPU doesn't care whether the contents of a memory location hold a character, a signed number (or unsigned) or even a

. It will happily let you flip bits, regardless of whether it is a meaningful operation. So what exactly does ~'A' mean? When assigned to an int, C happily compiles it and outputs -66.

**On the next page** About Hexadecimal and Binary

### A look at Xor in Lightweight Encryption

Counting 1s and 0s in binary is error prone - try 1001101011 how big is that? Instead use decimal (base 10), base 8 or base 16 to specify numbers. Each hexadecimal digit is 4 bits. Digits 0-9 are the same as their decimal equivalents in binary (0000-1001) and hexadecimal digits A-F are represented by 1010 (A) through to 1111 (F). Every byte is represented by two hexadecimal digits, a 32 bit int by 8 hexadecimal digits. Here are some values, showing decimal, hexadecimal and binary.

- 7, 0x7, 00000111
- 246,0xd6, 11010110
- 1050,0x41a 010000011010
- 98765,0x181cd, 00010100000111001101

### The magical property of Xor ^

If we xor two ints a and b to get c then a ^ c gives b and b ^ c gives a.

To create an encrypted text file, "binary xor" a file of text with a value, say 255 to get an encrypted file. If we xor this with 255, we get back our original unencrypted text. As encryption schemes go, this is light weight and probably the most reinvented wheel by novice programmers. It can be made tougher by using values to xor that have at least 3 bits set instead of just 255.

In 8 bits, there are 256 different possible values- 0 to 255. There are only 36 values with 0,1, OR 2 set bits so that leaves 219 values which can be used. Stick these in an array and xor the first byte of the file against the first byte of the table, 2nd against the 2nd array value and so on.

Although light weight it is useful for hiding text strings in applications. A binary file viewer/editor lets anyone look for text strings, hidden commands, registry keys, even secret passwords. By encrypting these it makes life harder for them.

Example 3 displays the 219 values with three or more bits set, plus the remainder. It uses a similar mechanism to that PrintBinary() used in example 1.

Download Example 3.

**On the next page** - A look at Unions

### An example using Simple Xor Encryption

### Simple Encryption

Example 4 uses this technique in a simple example. The function

**hideandshow(char *,char *)**

processes the text in one buffer into another using 8 values for the xor. I picked 8 values from the output of example 3 with 5 or more bits set. With a higher number of bits set there is a greater overall transformation. Using only 8 values though means the range of transformations is lower.

As this is a light weight scheme, you should realize that it is not suitable for anything stronger than hiding text. Every developer thinks that their encryption scheme is uncrackable but that protection usually relies on hiding how the data was encrypted, this is known as "security through obscurity" and is not regarded every highly.

The strongest encryption schemes have their algorithms published and it is the strength of those algorithms that protects the data. If you write your own scheme, it is quite difficult to prevent your code being reverse engineered and if the prize is high enough, it will be.

The 8 values are indexed by the variable **eindex** which is repeatedly incremented but constrained within the range 0..7 by the line

```
eindex % = 8;
```

This uses the built in C modulus operator. This line is the same as

```
if (eindex>=8)
eindex=0;
```

As can be seen from the output, the text displayed by the last printf is identical to the text supplied to the first call of

**hideandshow()**

in the output listing below.

```
The hidden text is
89
1a
e3
15
c8
93
cc
d0
b3
13
eb
57
The unhidden text is Hello world.
```

Download Example 4.

**On the next page** All about Unions.

### Unions- Holding Data in the Same Place

### Unions

In the past we've looked at

. To remind you a struct is a way of managing several variables in one complete "super" variable which can be assigned or copied in one statement. The definition and use of a struct is this

```
struct s {
int a;
char * b;
int c[10];
};
typedef struct s sblock; // avoids needing the word struct!
sblock sval; // declare an instance
sval.c[4] =10; // assign a value
```

Download Example 5.

In a struct like the type s above, the three fields, a b and c are laid out in memory one after another. In a union though, all three fields occupy the same space.

If you know Borland/Turbo Pascal or Delphi then you may know of a variant record (nothing to do with Variant types). This is similar.

Unions are a bit of an oddity- not something you will use in everyday programming but when you need them, they are handy. Syntactically, take a struct and replace the word with union. That's it!

```
union u {
int a;
char * b;
nt c[10];
};
```

Within the union u all declared fields start at the same location and the size allocated is big enough to take the largest field. The array c is ten times the size of the int a or the char * b so the structure has to big enough to take it. If the ints are 32 bits then sizeof(u) will be 40 bytes, as big as c. In any variable of type union u, assigning a value to a is the same as assigning it to b or c[0]. The space occupied by c[1] to c[9] is inaccessible and thus wasted space as far as a and b are concerned.

**On the next page** - Uses of unions

### Where would you use a Union?

The uses of union are few and far between. On most computers, the size of a pointer and an int are usually the same- this is because both usually fit into a register in the CPU. So if you want to do a quick and dirty cast of a pointer to an int or the other way, declare a union.

```
union intptr {
int i;
int * p;
};
union intptr x;
x.i = 1000;
*(x.p)=90; /* puts 90 at location 1000 */
```

Another use of a union is in a command or message protocol where different size messages are sent and received. Each message type will hold different information but each will have a fixed part (probably a

) and a variable part bit. This is how you might implement it..

```
struct head {
int id;
int response;
int size;
};
struct msgstring50 {
struct head fixed;
char message[50];
}
struct struct msgstring80 {
struct head fixed;
char message[80];
}
struct msgint10 {
struct head fixed;
int message[10];
}
struct msgack {
struct head fixed;
int ok;
}
union messagetype {
struct msgstring50 m50;
struct msgstring80 m80;
struct msgint10 i10;
struct msgack ack;
}
```

In practice, although the unions are the same size, it makes sense to only send the meaningful data and not wasted space. A msgack is just 16 bytes in size while a msgstring80 is 92 bytes. So when a messagetype variable is initialized, it has its size field set according to which type it is. This can then be used by other functions to transfer the correct number of bytes.

```
union messagetype ackmsg;
ackmsg.fixed.size = sizeof(msgack) ; /* 16 */
```

**On the next page** - Packing Bits with Bit Fields

### Packing Bits with Bit Fields

Packing Bits C lets you pack bits tightly without requiring a lot of anding, and shifting. Lets assume we want to pack 4 values into one byte. These are a,b,c and d, defined like this.

```
ddddcbaa
```

If we call this byte x then the value of d is

```
d = (x >> 4) &0xf;
```

Which is a bit convoluted. What if we could just access the value of d and let the compiler do all the messy bit manipulation? Well we can but we have to define the bits using a bit field. To define our byte, we use a struct as follows.

```
struct flagtype
{
unsigned char a: 2;
unsigned char b: 1;
unsigned char c : 1;
unsigned char d : 4;
};
struct flagtype f;
int x;
x=f.d; // fetch value
```

The values such as : 4 define how big each field is in bits. The type makes it a bit more complicated as it affects the position of the bits in the struct. Change b to an int and flagtype increases in size from 1 byte to 12! Change a as as well to an int though and flagtype shrinks to 8 bytes! This suggests that it is organizing the bits to start on a 4 byte boundary, presumably for optimal access speed. My guess is that it is something like this when a is a char and b an int:

```
00000000
00000000
00000000
000ddddc
00000000
00000000
00000000
0000000b
00000000
00000000
00000000
000000aa
```

Change a to an int as well and it all fits in 8 bytes like this:

```
00000000
00000000
00000000
000ddddc
00000000
00000000
00000000
00000baa
```

The actual layout implementation is probably

dependant. On Visual C++ 6, flagtype as declared is exactly one byte in size. Example 6 uses a union to access a

of type flagtype. By assigning values to a,b,c etc, you can identify the layout of the bits.

Download Example 6.

That completes this tutorial. The next one will be on file I/O.