We're working on a project at work that uses Xen to run virtual machines (VMs). These VMs are running Oracle, which eats up disk space like a maniac. So the users asked us to add some 300GB chunks to the virtual file system. This is done by creating 300GB files and attaching them to the VMs. The guys created the first one by using dd (the Unix "convert and copy" command, so named because "cc" was already taken by the C compiler) to copy 300GB of zeroes from the /dev/zero pseudo device to a file. Then they made more by copying that file to additional locations. The /dev/zero trick is at least reasonably efficient, as the kernel just zero-fills chunks of memory as needed. But copying that file is a lose, as the system has to suck all 300GB off the disk drives, and write it back out.
However, there exists a command (on Solaris and BSD) tailor-made for the purpose. It's called
mkfile(8) and its sole purpose is to make files. And I remembered that it did so much faster than coping stuff from /dev/zero. But Linux
I really thought it would, but a quick scan of the RPMs on the install media didn't reveal anything likely. A little scripting (and
rpm2cpio) produced a list of every file in the whole distribution, but no mkfile.
So I tried to dredge up memories of how mkfile worked. I vaguely recalled it hinged on creative use of mmap() or lseek(), so I read those manual pages, and found this:
The lseek() function allows the file offset to be set beyond the end of the file (but this does not change the size of the file). If data is later written at this point, subsequent reads of the data in the gap (a "hole") return null bytes ('\0') until data is actually written into the gap.
Aha! All I have to do is write a short program that parses command line arguments for the file name and size (with optional units), open the desired file, lseek() off to the size (minus one), and write a single null byte, and voila!
So I did. Sure enough, my home-rolled mkfile was faster than dd. On local drives, it was nearly instant, even for huge files. On the OCFS2 volumes used by Oracle, it was rather slower (journaling, coordination, and all), but still outran the next-fastest method 2:1. Unfortunately, it wouldn't run on our VM servers, as the Oracle VM Server (OVS) distribution was 32-bit, and I had compiled it on a 64-bit VM that had gcc installed. So I went and rebuilt it for 32 bits and tried again. No joy, the 32-bit OS only supports file sizes up to 2GB. A nice research exercise, but ultimately, it didn't end up helping me.
Note that the version I wrote is on the customer's closed network and I don't have access to it, but in case someone needs it, I found another person's version here. It's a little wonkily-written, but should serve.