You are viewing madbodger

Welcome to my nightmare - mkfile(8) [entries|archive|friends|userinfo]
Spam

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

mkfile(8) [Jun. 24th, 2009|11:31 pm]
Previous Entry Add to Memories Share Next Entry
[Tags|]
[Current Location |Purcellville, VA]
[mood |geeky]

We're working on a project at work that uses Xen to run virtual machines (VMs). These VMs are running Oracle, which eats up disk space like a maniac. So the users asked us to add some 300GB chunks to the virtual file system. This is done by creating 300GB files and attaching them to the VMs. The guys created the first one by using dd (the Unix "convert and copy" command, so named because "cc" was already taken by the C compiler) to copy 300GB of zeroes from the /dev/zero pseudo device to a file. Then they made more by copying that file to additional locations. The /dev/zero trick is at least reasonably efficient, as the kernel just zero-fills chunks of memory as needed. But copying that file is a lose, as the system has to suck all 300GB off the disk drives, and write it back out.

However, there exists a command (on Solaris and BSD) tailor-made for the purpose. It's called mkfile(8) and its sole purpose is to make files. And I remembered that it did so much faster than coping stuff from /dev/zero. But Linux doesn't have that command.

I really thought it would, but a quick scan of the RPMs on the install media didn't reveal anything likely. A little scripting (and rpm2cpio) produced a list of every file in the whole distribution, but no mkfile.

So I tried to dredge up memories of how mkfile worked. I vaguely recalled it hinged on creative use of mmap() or lseek(), so I read those manual pages, and found this:

The lseek() function allows the file offset to be set beyond the end of the file (but this does not change the size of the file). If data is later written at this point, subsequent reads of the data in the gap (a "hole") return null bytes ('\0') until data is actually written into the gap.

Aha! All I have to do is write a short program that parses command line arguments for the file name and size (with optional units), open the desired file, lseek() off to the size (minus one), and write a single null byte, and voila!

So I did. Sure enough, my home-rolled mkfile was faster than dd. On local drives, it was nearly instant, even for huge files. On the OCFS2 volumes used by Oracle, it was rather slower (journaling, coordination, and all), but still outran the next-fastest method 2:1. Unfortunately, it wouldn't run on our VM servers, as the Oracle VM Server (OVS) distribution was 32-bit, and I had compiled it on a 64-bit VM that had gcc installed. So I went and rebuilt it for 32 bits and tried again. No joy, the 32-bit OS only supports file sizes up to 2GB. A nice research exercise, but ultimately, it didn't end up helping me.

Note that the version I wrote is on the customer's closed network and I don't have access to it, but in case someone needs it, I found another person's version here. It's a little wonkily-written, but should serve.

linkReply

Comments:
[User Picture]From: achinhibitor
2009-06-25 03:38 am (UTC)

(Link)

Yeah, but be careful -- Un*x OSs have the trick of not allocating a disk block to a file if that particular bit of the file has never been written to. The missing block is implicitly all-zeros, but a real disk block is not allocated until it is written to. So when you seek a zillion bytes out and write a byte, the recorded length of the file is set, and the block at the very end is allocated, and the byte written into it, but all the intermediate blocks *are not* allocated. You can verify this by looking at the output of df for the filesystem, or asking du how much space the file *really* occupies. Un-allocated sections of a file are called "holes".
[User Picture]From: blaisepascal
2009-06-25 04:47 am (UTC)

(Link)

This isn't a problem until you start filling in the holes, though. Then you better have the disk space for the space you are actually using.

Making the holes is the sole reason the lseek method works, and cp, tar, mv, etc should, with the right arguments, respect the holes (i.e., copy, archive, or move the file without filling in the holes in the process).

Actually, I'm surprised that dd doesn't allow you to make files with holes already. Or maybe it does, somewhere in its commands. Commands people don't memorize because they don't use holey files...
[User Picture]From: achinhibitor
2009-07-04 04:43 pm (UTC)

(Link)

dd can't make holes because it is a piping program, whose main function is to reformat data.
[User Picture]From: madbodger
2009-06-25 04:55 am (UTC)

(Link)

Yeah, that can be a problem in some cases, but it's just fine for ours. As the space is used, it gets populated, until then, it doesn't matter anyway. Basically, I'm shifting the time required to allocate all that storage from the initial phase (when impatient people are glaring at me) to the usage phase (when it just slows down their process by a tiny amount).
[User Picture]From: achinhibitor
2009-07-04 04:43 pm (UTC)

(Link)

That works as long as (1) you know you will have the blocks when needed, and (2) you don't mind the overheads of allocating those blocks at that time.
[User Picture]From: blaisepascal
2009-06-25 04:57 am (UTC)

dd can do it too.

(Link)

blaisepascal@circumflex:~$ dd of=testfile if=/dev/zero bs=1k count=1 seek=1G
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000105784 s, 9.7 MB/s
blaisepascal@circumflex:~$ ls -Flags testfile
16 -rw-r--r-- 1 blaisepascal 1099511628800 2009-06-25 00:52 testfile
blaisepascal@circumflex:~$ du testfile
16 testfile


The "seek=N" argument to dd makes it do a seek of N (obs) blocks before writing. In this case, that creates a holey file pretty quick. (I created a 1G file in 12s without the seek, and 0.004s with the seek.)

[User Picture]From: ronebofh
2009-06-25 05:34 am (UTC)

(Link)

Maybe you can take the mkfile source from OpenSolaris and port it to Linux?

Too bad 32-bit OVS doesn't have large-file support.
[User Picture]From: madbodger
2009-06-25 12:53 pm (UTC)

(Link)

On their closed network, I'd have had to apply for permission to transfer software, so it was quicker (and a fun diversion) to just write it myself. I actually did so while one of the 300GB dd's was in progress. You're right, though, anyone else who wants mkfile can just grab the OpenSolaris sources, it's probably pretty portable.

I think Oracleis about to release (or just did) their 5.4 version of OVS, maybe it will have large file support.