C Programming Tutorial on Random Access File Handling

of 05

Programming Random Access File I/O in C

Apart from the simplest of applications, most programs have to read or write files. It may be just for reading a config file, or a text parser or something more sophisticated. This tutorial focuses on using random access files in C. The basic file operations are

  • fopen - open a file- specify how it's opened (read/write) and type (binary/text)
  • fclose - close an opened file
  • fread - read from a file
  • fwrite - write to a file
  • fseek/fsetpos - move a file pointer to somewhere in a file
  • ftell/fgetpos - tell you where the file pointer is located

The two fundamental file types are text and binary. Of these two, binary files are usually the simpler to deal with. For that reason and the fact that random access on a text file isn't something you need to do often, this tutorial is limited to binary files. The first four operations listed above are for both text and random access files. The last two just for random access.

Random access means you can move to any part of a file and read or write data from it without having to read through the entire file. Years ago, data was stored on large reels of computer tape. The only way to get to a point on the tape was by reading all the way through the tape. Then disks came along and now you can read any part of a file directly.

of 05

Programming With Binary Files

A binary file is a file of any length that holds bytes with values in the range 0 to 255. These bytes have no other meaning unlike in a text file where a value of 13 means carriage return, 10 means line feed and 26 means end of file. Software reading text files have to deal with these other meanings.

Binary files a stream of bytes, and modern languages tend to work with streams rather than files. The important part is the data stream rather than where it came from. In C, you can think about the data either as files or streams. With random access, you can read or write to any part of the file or stream. With sequential access, you have to loop through the file or stream from the start like a big tape.

This code sample shows a simple binary file being opened for writing, with a text string (char *) being written into it. Normally you see this with a text file, but you can write text to a binary file.

 // ex1.c
 #include <stdio.h>
 #include <string.h>
 int main(int argc, char * argv[])
   const char * filename="test.txt";
   const char * mytext="Once upon a time there were three bears.";
   int byteswritten=0;
   FILE * ft= fopen(filename, "wb") ;
   if (ft) {
     fwrite(mytext,sizeof(char),strlen(mytext), ft) ;
     fclose( ft ) ;
   printf("len of mytext = %i ",strlen(mytext)) ;
   return 0;

This example opens a binary file for writing and then writes a char * (string) into it. The FILE * variable is returned from the fopen() call. If this fails (the file might exist and be open or read only or there could be a fault with the filename), then it returns 0.

The fopen() command attempts to open the specified file. In this case, it's test.txt in the same folder as the application. If the file includes a path, then all the backslashes must be doubled up. "c:\folder\test.txt" is incorrect; you must use "c:\\folder\\test.txt".

As the file mode is "wb," this code is writing to a binary file. The file is created if it doesn't exist, and if it does, whatever was in it is deleted. If the call to fopen fails, perhaps because the file was open or the name contains invalid characters or an invalid path, fopen returns the value 0.

Although you could just check for ft being non-zero (success), this example has a FileSuccess() function to do this explicitly. On Windows, it outputs the success/failure of the call and the filename. It's a little onerous if you are after performance, so you might limit this to debugging. On Windows, there is little overhead outputting text to the system debugger.

 fwrite(mytext,sizeof(char),strlen(mytext), ft) ;

The fwrite() calls outputs the specified text. The second and third parameters are the size of the characters and the length of the string. Both are defined as being size_t which is unsigned integer. The result of this call is to write count items of the specified size. Note that with binary files, even though you are writing a string (char *), it does not append any carriage return or line feed characters. If you want those, you must explicitly include them in the string.

of 05

File Modes for Reading and Writing Files

When you open a file, you specify how it is to be opened—whether to create it from new or overwrite it and whether it's text or binary, read or write and if you want to append to it. This is done using one or more file mode specifiers that are single letters "r", "b", "w", "a" and "+" in combination with the other letters.

  • r - Opens the file for reading. This fails if the file does not exist or cannot be found.
  • w - Opens the file as an empty file for writing. If the file exists, its contents are destroyed.
  • a - Opens the file for writing at the end of the file (appending) without removing the EOF marker before writing new data to the file; this creates the file first if it doesn't exist.

Adding "+" to the file mode creates three new modes:

  • r+ - Opens the file for both reading and writing. (The file must exist.)
  • w+ - Opens the file as an empty file for both reading and writing. If the file exists, its contents are destroyed.
  • a+ - Opens the file for reading and appending; the appending operation includes the removal of the EOF marker before new data is written to the file, and the EOF marker is restored after writing is complete. It creates the file first if it doesn't exist. Opens the file for reading and appending; the appending operation includes the removal of the EOF marker before new data is written to the file, and the EOF marker is restored after writing is complete. It creates the file first if it doesn't exist.

of 05

File Mode Combinations

This table shows file mode combinations for both text and binary files. Generally, you either read from or write to a text file, but not both at the same time. With a binary file, you can both read and write to the same file. The table below shows what you can do with each combination.

  • r text - read
  • rb+ binary - read
  • r+ text - read, write
  • r+b binary - read, write
  • rb+ binary - read, write
  • w text - write, create, truncate
  • wb binary - write, create, truncate
  • w+ text - read, write, create, truncate
  • w+b binary - read, write, create, truncate
  • wb+ binary - read, write, create, truncate
  • a text - write, create
  • ab binary - write, create
  • a+ text - read, write, create
  • a+b binary - write, create
  • ab+ binary - write, create

Unless you are just creating a file (use "wb") or only reading one (use "rb"), you can get away with using "w+b".

Some implementations also allow other letters. Microsoft, for example, allows:

  • t - text mode 
  • c - commit
  • n - non-commit 
  • S - optimizing caching for sequential access 
  • R - caching non-sequential (random access) 
  • T - temporary
  • D - delete/temporary, which kills the file when it's closed.

These aren't portable so use them at your own peril.

of 05

Example of Random Access File Storage

The main reason for using binary files is the flexibility that allows you to read or write anywhere in the file. Text files only let you read or write sequentially. With the prevalence of inexpensive or free databases such as SQLite and MySQL, reduces the need to use random access on binary files. However, random access to file records is a little old fashioned but still useful.

Examining an Example

Assume the example shows an index and data file pair storing strings in a random access file. The strings are differing lengths and are indexed by position 0, 1 and so on.

There are two void functions: CreateFiles() and ShowRecord(int recnum). CreateFiles uses a char * buffer of size 1100 to hold a temporary string made up of the format string msg followed by n asterisks where n varies from 5 to 1004. Two FILE * are created both using wb filemode in the variables ftindex and ftdata. After creation, these are used to manipulate the files. The two files are

  • index.dat
  • data.dat

The index file holds 1000 records of type indextype; this is the struct indextype, which has the two members pos (of type fpos_t) and size. The first part of the loop:

 for ( j=0; j<i+5; j++)

populates the string msg like this.

 This is string 0 followed by 5 asterisks :*****
 This is string 1 followed by 6 asterisks :******

and so on. Then this:

 index.size = (int)strlen(text) ;
 fgetpos(ftdata, &index.pos ) ;

populates the struct with the length of the string and the point in the data file where the string will be written.

At this point, both the index file struct and the data file string can be written to their respective files. Although these are binary files, they are written sequentially. In theory, you could write records to a position beyond the current end of file, but it's not a good technique to use and probably not at all portable.

The final part is to close both files. This ensures that the last part of the file is written to disk. During file writes, many of the writes don't go directly to disk but are held in fixed-sized buffers. After a write fills the buffer, the entire contents of the buffer are written to disk.

A file flush function forces flushing and you can also specify file flushing strategies, but those are intended for text files.

ShowRecord Function

To test that any specified record from the data file can be retrieved, you need to know two things: wWhere it starts in the data file and how big it is.

This is what the index file does. The ShowRecord function opens both files, seeks to the appropriate point (recnum * sizeof(indextype) and fetches a number of bytes = sizeof(index).

 fseek( ftindex, sizeof(index)*(recnum) ,SEEK_SET ) ;
 fread( &index,1,sizeof(index),ftindex) ;

SEEK_SET is a constant that specifies where the fseek is done from. There are two other constants defined for this. 

  • SEEK_CUR - seek relative to current position
  • SEEK_END - seek absolute from the end of the file
  • SEEK_SET - seek absolute from the start of the file

You could use SEEK_CUR to move the file pointer forward by sizeof(index).

 fseek( ftindex, sizeof(index) ,SEEK_SET ) ;

Having obtained the size and position of the data, it just remains to fetch it.

 fsetpos( ftdata, &index.pos ) ;
 fread( text,index.size, 1, ftdata) ;
 text[ index.size ]='\0';

Here, use fsetpos() because of the type of index.pos which is fpos_t. An alternative way is to use ftell instead of fgetpos and fsek instead of fgetpos. The pair fseek and ftell work with int whereas fgetpos and fsetpos use fpos_t.

After reading the record into memory, a null character \0 is appended to turn it into a proper c-string. Don't forget it or you'll get a crash. As before, fclose is called on both files. Although you won't lose any data if you forget fclose (unlike with writes), you will have a memory leak.