One of the most visible features that I have integrated into Solaris 10 is the ability to store pathnames with each open file1. This allows new avenues of observability that were previously inaccessible. First off, we simply have the files as symbolic links in /proc/<pid>/path:
$ ls -l /proc/`pgrep Firebird`/path | cut -b 55- 0 -> /devices/pseudo/mm@0:null 1 -> /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0 10 -> /usr/local/MozillaFirebird/chrome/comm.jar 11 -> /usr/local/MozillaFirebird/chrome/en-US.jar 12 -> /usr/local/MozillaFirebird/chrome/embed-sample.jar 13 -> /usr/local/MozillaFirebird/chrome/pipnss.jar 14 -> /usr/local/MozillaFirebird/chrome/pippki.jar 15 -> /usr/local/MozillaFirebird/chrome/US.jar 16 -> /usr/local/MozillaFirebird/chrome/en-unix.jar 17 -> /usr/local/MozillaFirebird/chrome/classic.jar 18 -> /usr/local/MozillaFirebird/chrome/toolkit.jar 19 -> /usr/local/MozillaFirebird/chrome/browser.jar 2 -> /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0 20 21 22 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_MAP_ 23 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_001_ 24 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_002_ 25 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_003_ 26 27 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/formhistory.dat 28 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/history.dat 29 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/cert8.db 3 30 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/key3.db 4 -> /var/run/name_service_door 5 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/XUL.mfasl 6 7 8 9 a.out -> /usr/local/MozillaFirebird/MozillaFirebird-bin cwd -> /home/eschrock root -> / ufs.102.0.11082 -> /usr/lib/iconv/646%UTF-16LE.so ufs.102.0.11521 -> /usr/lib/iconv/UTF-16LE%646.so [ ... output elided ... ] $
As usual, mozilla firebird has lots of interesting stuff open. You may notice that some of the file descriptors have no path information. This is likely because they refer to a socket or FIFO (there is a small chance they refer to a file that has since been moved). The pfiles(1) command has been modified to use this information, so you can now see the path with the rest of the goodies:
$ pfiles `pgrep Firebird` 286670: /usr/local/MozillaFirebird/MozillaFirebird-bin Current rlimit: 512 file descriptors 0: S_IFCHR mode:0666 dev:200,0 ino:6815752 uid:0 gid:3 rdev:13,2 O_RDONLY|O_LARGEFILE /devices/pseudo/mm@0:null 1: S_IFREG mode:0644 dev:210,1281 ino:346 uid:138660 gid:10 size:4164 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0 2: S_IFREG mode:0644 dev:210,1281 ino:346 uid:138660 gid:10 size:4164 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0 3: S_IFIFO mode:0666 dev:209,0 ino:9 uid:0 gid:1 size:0 O_RDWR|O_NONBLOCK FD_CLOEXEC 4: S_IFDOOR mode:0444 dev:209,0 ino:52 uid:0 gid:0 size:0 O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[100253] /var/run/name_service_door 5: S_IFREG mode:0644 dev:210,1281 ino:744 uid:138660 gid:10 size:747398 O_RDONLY|O_LARGEFILE /home/eschrock/.phoenix/default/7pkwqbju.slt/XUL.mfasl 6: S_IFIFO mode:0000 dev:203,0 ino:119094 uid:138660 gid:10 size:0 O_RDWR|O_NONBLOCK 7: S_IFIFO mode:0000 dev:203,0 ino:119094 uid:138660 gid:10 size:0 O_RDWR|O_NONBLOCK [ ... output elided ... ] $
This should be enough to get most savvy sysadmins drooling. But wait, there’s more!. This feature allowed the new DTrace io provider (integrated into build 60, aka Beta 5, aka SX 07/04) to get path name information for arbitrary files in the system. This allows you to do neat stuff like:
# cat iohog.d #!/usr/sbin/dtrace -s io:::start { @[execname, args[2]->fi_pathname] = sum(args[0]->b_bcount); } # ./iohog.d ^C sched /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0 4096 xlp /var/adm/utmpx 4096 fsflush /export/iso/solaris_4.iso 73728 sched <none> 82432 cp <none> 114688 fsflush <none> 177152 cp /export/iso/solaris_4.iso 238936064 cp /export/iso/solaris_1.iso 239910912 #
For years we’ve had the iostat(1M) utility. It’s great to know that someone is hammering away on sd0, but that’s not really the question you want answered. What you really want to know is who is hammering away on your disks. With the DTrace io provider, we’ve taken it one step further by giving you the means to answer why someone is hammering away on your disks. All of a sudden one of the most opaque problems is now completely transparent. So head on over and check it out (while the io provider is not available in Solaris Express quite yet, the documentation for it is available on the DTrace page).
1 For the curious: Solaris implements a Virtual File System (VFS) layer, which includes the notion of a vnode to represent an abitrary file. The filesystem-dependent part is stored in a format private to the filesystem implementation (think of it in terms of inheritence if it helps). To illustrate with crude ASCII art:
USERLAND KERNEL VFS KERNEL FS fd ----+----> file_t -----+----> vnode_t ------> inode_t / | | prnode_t / fd ----+ | etc | fd ---------> file_t -----+
We store a (char *) pointer at the end of the vnode_t when we go to look up the file, and now we have path information for all the open files in the kernel (even those implicitly mapped into process address space, without an associated file_t). There are some subtleties with hard links and moving files around, but it works perfectly 99% of the time, which is all we can hope for in this case.
3 Responses
This is EXTREMELY kool! Previously my only hope was to truss the proccess and watch for the open() syscalls and then trace the FD thru all the accesses which not only sucked, but wasn’t possible if the FD had already been opened and wasn’t be closed/opened frequently… all you knew was that FD 4 was concerning but who knows what that is. The /proc updates are kool, the pfiles update is even kooler, but the DTrace updates for it are just pure gold! This should be especially helpful for tuning/moniting large allocations such as Oracle datafiles at the file level instead of per filesystem.
I humbly bow to you sir. Glad you blogged this… this is exactly the sort of stuff that just gets burried under all the other larger features in such a massive release as Solaris10.
benr
I’m so glad I read this! And more so that Sol10 now has this.
Really nice feature. One that when you read about you immediately feel how much you missed it.