Those of you who have used a UNIX system before are probably familiar with the /proc filesystem. This directory provides a view of processes running on the system. Before getting into the gory details of the Solaris implementation (see proc(4) if you’re curious), I thought I would go over some of the different variants over the years. You’ll have to excuse any inaccuracies presented here; this is a rather quick blog entry that probably doesn’t do the subject justice. Hopefully you’ll be inspired to go investigate some of this on your own.
Eighth Edition UNIX
Tom Killian wrote the first implementation of /proc, explained in his paper1 published in 1984. It was designed to replace the venerable ptrace system call, which until then was used for primitive process tracing. Each process was a file in /proc, allowing the user to read and write directly to the file, rather than using ptrace‘s cumbersome single-byte transfers.
SVR4
The definitive /proc implementation, written by Roger Faulkner2 and Ron Gomes, explained in their paper3 published in 1991. This was a port of the Eighth Edition /proc, with some enhancements. The directory was still a flat directory, and each file supported read(), write(), and ioctl() interfaces. There were 37 ioctls total, including basic process control, signal/fault/syscall tracing, register manipulation, and status information. This created a powerful base for building tools such as ps without needing specialized system calls. Although useful, the system was not very user friendly, and was not particularly extensible. This system was brought into Solaris 2.0 with the move to a SVR4 base.
Solaris 2.6
The birth of the modern Solaris /proc, first conceived in 1992 and not fully implemented until 1996. This represented a massive restructuring of /proc; the most important change being that each pid now represented a directory. Each directory was populated with a multitude of files which removed the need for the ioctl interface. Process mappings, files, and objects can all be examined through calls to readdir() and read. Each LWP (thread) also has its own directory. The files are all binary files designed to be consumed by programs. The most interesting file is the /proc/<pid>/ctl file, which provides similar functionality of the old ioctl interfaces. The number of commands originally at 27, and they were much more powerful than their ioctl forbears. Very little has changed since this original implementation; only two new entries have been added to the directory, and only 4 new commands have been added. The tools built upon this interface, however, have changed dramatically.
4.4 BSD
The BSD kernel implements a version of procfs somewhere between the SVR4 version and the Solaris 2.6 version. Each process has its own directory, but there are only 8 entries in each directory, with the ability to access memory, registers, and current status. The control commands available are fairly primitive, allowing only for attach, detach, step, run, wait, and signal posting. In later derivatives (FreeBSD 4.10), the number of directory entries has expanded slightly to 12, though the control interfaces seem to have remained the same.
Linux
Linux takes a much different approach towards /proc. First of all, the Linux /proc contains a number of files and directories that don’t directly relate to procceses. Some of these files are migrating to the /system directory in the 2.6 kernel, but /proc is still a dumping ground for all sorts of device and system-wide files. Secondly, all the files are all plaintext files; a major departure from the historical use of /proc. A good amount of information is available in the the /proc pid directory, but the majority of control is still done through interfaces such as ptrace. I will certainly spend some time to see how things like gdb and strace interact with processes.
We in Solaris designed /proc as a tool for developers to build innovative solutions, not an end-user interface. The Linux community believes that ‘cat /proc/self/maps‘ is the best user interface, while we believe that pmap(1) is right answer. The reason for this is that mdb(1), truss(1), dtrace(1M) and a host of other tools all make use of this same information. It would be a waste of time to take binary information in the kernel, convert it to text, and then have the userland components all write their own (error prone) parsing routines to convert this information back into a custom binary form. Plus, we can change the options and output format of pmap without breaking other applications that depend on the contents of /proc. There are some very interesting ways in which we leverage this information which I’ll cover in future posts.
1 T. J. Killian. Processes as Files. Proceedings of the USENIX Software Tools Users Group Summer Conference, pp 203-207, June 1984.
2 Roger Faulkner was then at Sun, and continues to work here in a one-man race to be the oldest kernel hacker on the planet. These days you can find him in Michigan, pounding away at the amd64 port.
3 R. Faulkner and R. Gomes. The Process File System and Process Model in UNIX System V. USENIX Conference Proceedings. Dallas, Texas. January 1991.