readfrag - read large fragmented files quickly

(Note: This is from 2002 and has not been touched since. The problem has since then largely disappeared as the filesystem code in Linux has improved)

What is it?

This tool will read large fragmented files quickly. It uses a Linux-only ioctl to do its work, so it's limited to Linux, and that ioctl can only be run as root, so it's limited to the superuser. What could it possibly deliver to make up for these disadvantages?

If you use a download accelerator or a file sharing tool like mldonkey, files get downloaded from all over the place, in an unpredictable order.

Under Unix and Linux, large files are not preallocated if you write somewhere in the middle, but "holes" are kept. This has the advantage that no disk space is spent on blocks that have not yet been written to (this is an obscure hack for old database systems), but it has the disadvantage that when all blocks have been written eventually, they are spread all over the disk. If you read such a file sequentially from start to end, the hard disk will be busy moving the heads and not reading stuff from disk.

How dramatic is this effect?

When trying to burn such a file to CD, I only could set the burner speed to 4 and not 16 to get reliable burning. With burn proof enabled, the burning process stalled all the time. Even worse, merely watching a big movie file with mplayer causes the hard drive to be busy moving the heads, severely limiting performance of other tasks in the system.

On one test file on my desktop system, copying a 700 MB file to another IDE disk with cp took 9 minutes, while copying the same file to the same other disk with readfrag took only 5 minutes. Both times are wall-clock times.

Please note that this depends on many factors, including your disks, the file system block size (I use 4k), and how exactly the blocks are spread, which varies with each file. I still think that this tool is worth it.

So how does it work?

It uses the "LILO ioctl" to find the physical block number for each logical block number of the file. It creates a look-up table with these values. Then, it iterates over the file in chunks (hard coded to be 128MB each), reading all the blocks in that chunk in an order that allows the disk head to move in only one direction.

Tip: have vmstat 1 running in another window and watch the bi and bo clumns.


Just download this readfrag.c file and compile it with gcc -o readfrag readfrag.c.