9. Directories

Contents:
Introduction
Getting and Setting Timestamps
Deleting a File
Copying or Moving a File
Recognizing Two Names for the Same File
Processing All Files in a Directory
Globbing, or Getting a List of Filenames Matching a Pattern
Processing All Files in a Directory Recursively
Removing a Directory and Its Contents
Renaming Files
Splitting a Filename into Its Component Parts
Program: symirror
Program: lst

Unix has its weak points but its file system is not one of them.

- Chris Torek

9.0. Introduction

To fully understand directories, you need to be acquainted with the underlying mechanics. The following explanation is slanted towards the Unix filesystem, for whose system calls and behavior Perl's directory access routines were designed, but it is applicable to some degree to most other platforms.

A filesystem consists of two parts: a set of data blocks where the contents of files and directories are kept, and an index to those blocks. Each entity in the filesystem has an entry in the index, be it a plain file, a directory, a link, or a special file like those in /dev. Each entry in the index is called an inode (short for index node). Since the index is a flat index, inodes are addressed by number.

A directory is a specially formatted file, whose inode entry marks it as a directory. A directory's data blocks contain a set of pairs. Each pair consists of the name of something in that directory and the inode number of that thing. The data blocks for /usr/bin might contain:

Name	Inode
bc	`17`
du	`29`
nvi	`8`
pine	`55`
vi	`8`

Every directory is like this, even the root directory ( / ). To read the file /usr/bin/vi, the operating system reads the inode for /, reads its data blocks to find the entry for /usr, reads /usr 's inode, reads its data block to find /usr/bin, reads /usr/bin's inode, reads its data block to find /usr/bin/vi, reads /usr/bin/vi 's inode, and then reads the data from its data block.

The name in a directory entry isn't fully qualified. The file /usr/bin/vi has an entry with the name vi in the /usr/bin directory. If you open the directory /usr/bin and read entries one by one, you get filenames like patch, rlogin, and vi instead of fully qualified names like /usr/bin/patch, /usr/bin/rlogin, and /usr/bin/vi.

The inode has more than a pointer to the data blocks. Each inode also contains the type of thing it represents (directory, plain file, etc.), the size of the thing, a set of permissions bits, owner and group information, the time the thing was last modified, the number of directory entries that point to this inode, and so on.

Some operations on files change the contents of the file's data blocks; some change just the inode. For instance, appending to or truncating a file updates its inode by changing the size field. Other operations change the directory entry that points to the file's inode. Changing a file's name changes only the directory entry; it updates neither the file's data nor its inode.

Three fields in the inode structure contain the last access, change, and modification times: atime, ctime, and mtime. The atime field is updated each time the pointer to the file's data blocks is followed and the file's data is read. The mtime field is updated each time the file's data changes. The ctime field is updated each time the file's inode changes. The ctime is not creation time; there is no way under standard Unix to find a file's creation time.

Reading a file changes its atime only. Changing a file's name doesn't change atime, ctime, or mtime because it was only the directory entry that changed (it does change the atime and mtime of the directory the file is in, though). Truncating a file doesn't change its atime (because we haven't read, we've just changed the size field in its directory entry), but it does change its ctime because we changed its size field and its mtime because we changed its contents (even though we didn't follow the pointer to do so).

We can access a file or directory's inode by calling the built-in function stat on its name. For instance, to get the inode for /usr/bin/vi, say:

@entry = stat("/usr/bin/vi") or die "Couldn't stat /usr/bin/vi : $!";

To get the inode for the directory /usr/bin, say:

@entry = stat("/usr/bin")    or die "Couldn't stat /usr/bin : $!";

You can stat filehandles, too:

@entry = stat(INFILE)        or die "Couldn't stat INFILE : $!";

The stat function returns a list of the values of the fields in the directory entry. If it couldn't get this information (for instance, if the file doesn't exist), it returns an empty list. It's this empty list we test for with the or die construct. Be careful of using || die because that throws the expression into scalar context, in which case stat only reports whether it worked. It doesn't return the list of values. The _ cache referred to below will still be updated, though.

The values returned by stat are listed in the following table.

Element	Abbreviation	Description
0	dev	Device number of filesystem
1	ino	Inode number (the "pointer" field)
2	mode	File mode (type and permissions)
3	nlink	Number of (hard) links to the file
4	uid	Numeric user ID of file's owner
5	gid	Numeric group ID of file's owner
6	rdev	The device identifier (special files only)
7	size	Total size of file, in bytes
8	atime	Last access time, in seconds, since the Epoch
9	mtime	Last modify time, in seconds, since the Epoch
10	ctime	Inode change time, in seconds, since the Epoch
11	blksize	Preferred block size for filesystem I/O
12	blocks	Actual number of blocks allocated

The standard File::stat module provides a named interface to these values. It overrides the stat function, so instead of returning the preceding array, it returns an object with a method for each attribute:

use File::stat;

$inode = stat("/usr/bin/vi");
$ctime = $inode->ctime;
$size  = $inode->size;

In addition, Perl provides a set of operators that call stat and return one value only. These are collectively referred to as the -X operators because they all take the form of a dash followed by a single character. They're modelled on the shell's test operators:

-X	Stat field	Meaning
`-r`	mode	File is readable by effective UID/GID
`-w`	mode	File is writable by effective UID/GID
`-x`	mode	File is executable by effective UID/GID
`-o`	mode	File is owned by effective UID

`-R`	mode	File is readable by real UID/GID
`-W`	mode	File is writable by real UID/GID
`-X`	mode	File is executable by real UID/GID
`-O`	mode	File is owned by real UID

`-e`		File exists
`-z`	size	File has zero size
`-s`	size	File has nonzero size (returns size)

`-f`	mode,rdev	File is a plain file
`-d`	mode,rdev	File is a directory
`-l`	mode	File is a symbolic link
`-p`	mode	File is a named pipe (FIFO)
`-S`	mode	File is a socket
`-b`	rdev	File is a block special file
`-c`	rdev	File is a character special file
`-t`	rdev	Filehandle is opened to a tty

`-u`	mode	File has setuid bit set
`-g`	mode	File has setgid bit set
`-k`	mode	File has sticky bit set

`-T`	N/A	File is a text file
`-B`	N/A	File is a binary file (opposite of `-T`)

`-M`	mtime	Age of file in days when script started
`-A`	atime	Same for access time
`-C`	ctime	Same for inode change time (not creation)

The stat and the -X operators cache the values that the stat (2) system call returned. If you then call stat or a -X operator with the special filehandle _ (a single underscore), it won't call stat again but will instead return information from its cache. This lets you test many properties of a single file without calling stat (2) many times or introducing a race condition:

open( F, "< $filename" )
    or die "Opening $filename: $!\n";
unless (-s F && -T _) {
    die "$filename doesn't have text in it.\n";
}

The stat call just returns the information in one inode, though. How do we get a list of the contents of a directory? For that, Perl provides opendir, readdir, and closedir:

opendir(DIRHANDLE, "/usr/bin") or die "couldn't open /usr/bin : $!";
while ( defined ($filename = readdir(DIRHANDLE)) ) {
    print "Inside /usr/bin is something called $filename\n";
}
closedir(DIRHANDLE);

These directory reading functions are designed to look like the file open and close functions. Where open takes a filehandle, though, opendir takes a directory handle. They look the same (a bare word) but they are different: you can open(BIN, "/a/file") and opendir(BIN, "/a/dir") and Perl won't get confused. You might, but Perl won't. Because filehandles and directory handles are different, you can't use the < > operator to read from a directory handle.

The filenames in a directory aren't necessarily stored alphabetically. If you want to get an alphabetical list of files, you'll have to read all the entries and sort them yourself.

The separation of directory information from inode information can create some odd situations. Operations that change directory only require write permission on the directory, not on the file. Most operations that change information in the file's data require write permission to the file. Operations that alter the permissions of the file require that the caller be the file's owner or the superuser. This can lead to the interesting situation of being able to delete a file you can't read, or write to a file you can't remove.

Although these situations make the filesystem structure seem odd at first, they're actually the source of much of Unix's power. Links, two filenames that refer to the same file, are now extremely simple. The two directory entries just list the same inode number. The inode structure includes a count of the number of directory entries referring to the file (nlink in the values returned by stat), but it lets the operating system store and maintain only one copy of the modification times, size, and other file attributes. When one directory entry is unlinked, data blocks are only deleted if the directory entry was the last one that referred to the file's inode - and no processes still have the file open. You can unlink an open file, but its disk space won't be released until the last close.

Links come in two forms. The kind described above, where two directory entries list the same inode number (like vi and nvi in the earlier table), are called hard links. The operating system cannot tell the first directory entry of a file (the one created when the file was created) from any subsequent hard links to it. The other kind, soft or symbolic links, are very different. A soft link is a special type of file whose data block stores the filename the file is linked to. Soft links have a different mode value, indicating they're not regular files. The operating system, when asked to open a soft link, instead opens the filename contained in the data block.

Executive Summary

Filenames are kept in a directory, separate from the size, protections, and other metadata kept in an inode.

The stat function returns the inode information (metadata).

opendir, readdir, and friends provide access to filenames in a directory through a directory handle.

Directory handles look like filehandles, but they are not the same. In particular, you can't use < > on directory handles.

The permissions on a directory determine whether you can read and write the list of filenames. The permissions on a file determine whether you can change the file's metadata or contents.

Three different times are stored in an inode. None of them is the file's creation time.