In the footnote a few days ago, I commented on the fact that the history of Solaris debugging could rougly be divded into three ‘eras’. As someone interested in UNIX history, I decided to dig through the Solaris archives and put together a chronology of Solaris debuggability and observability tools. For fun, I divided it into eras to parallel Earth’s history. And I swear I’m not out to make anyone feel like a dinosaur (or a prokaryote, for that matter).
I’ve only been around for one of these “dawn of a new era” arrivals, DTrace1. When one of these revolutionary tools arrive, it’s amazing to see how quickly engineers avoid their own past. Try asking Bryan to debug a performance problem on Solaris 9, and you’ll probably get some choice phrases politely explaining that while he appreciates the importance of your problem, he would rather throw himself down a slide of broken glass and into a vat of rubbing alcohol. Being the neophyte that I am, I’ve only ventured into the ‘Paleozoic era’ on one occasion. After an MDB session on a Solaris 8 crashdump (paraphrased slightly):
$ mdb 0 > ::print mdb: invalid command '::print': unknown dcmd name > ::help print mdb: unknown command: print > ::please print mdb: invalid command'::please': unknown dcmd name > ::ihateyou $
I quickly ran away screaming, never to return. I think I ended up hiding in a corner of my office for two hours, cradling my DTrace answerbook and whispering “there’s no place like home” over and over. I’m still a spoiled brat, but at least I have respect and admiration for those Solaris veterans who crawled through debugging hell so that I could live a comfortable life2. It’s also made me feel sorry for the Linux (and Windows) developers out there. Not in the Nelson Muntz “Ha ha! You don’t have DTrace!” sense. More like “Poor little guy. It’s not his fault his species never evolved opposable thumbs.” There are a lot of brilliant Linux developers out there, stuck in a movement that doesn’t embrace debugging or observability as fundamental goals. But this post is supposed to be about history, not Linux. So without further ado, my brief history of Solaris (soon to be available in refrigerator magnet form):
<1989 | HADEAN | SunOS 4.X |
adb, ptrace, crash | ||
1990 | ARCHAEAN | SVr4 merge |
/proc | ||
truss(1) | ||
1991 | vtrace | |
vmstat(1M) | ||
iostat(1M) | ||
1992 | SOLARIS 2.0 | |
1993 | mpstat | |
SOLARIS 2.2 | ||
1994 | Kernel slab allocator | |
TNF | ||
basic ptools | ||
SOLARIS 2.4 | ||
1995 | ||
1996 | SOLARIS 2.5.1 | |
PROTEROZOIC | Next generation /proc | |
Userland watchpoints | ||
1997 | lockstat(1M) | |
pkill and pgrep | ||
libproc | ||
1998 | savecore on by default | |
SOLARIS 7 | ||
1999 | libproc for corefiles | |
coreadm(1M) | ||
prstat(1M) | ||
lockstat kernel profiling | ||
PALEOZOIC | MDB(1) | |
::findleaks | ||
2000 | SOLARIS 8 | |
EOL of crash(1M) | ||
2001 | live process control for MDB | |
EOL of adb(1) | ||
pargs and preap | ||
MESOZOIC | kernel CTF data | |
trapstat(1M) | ||
2002 | SOLARIS 9 | |
libumem(3LIB) and umem_debug(3MALLOC) | ||
::typegraph for mdb(1) | ||
2003 | Userland CTF | |
coreadm(1M) content control | ||
CENOZOIC | DTrace(1M) | |
intrstat(1M) | ||
2004 | DTrace pid provider for x86 | |
pfiles with pathnames | ||
DTrace sched, proc providers | ||
CTF for core libraries | ||
DTrace I/O provider | ||
KMDB(1) | ||
DTrace MIB, fpuinfo providers | ||
Per-thread ptools |
These are my choices based on SCCS histories and putback logs. Obviously, I’ve failed to include some things. Leave a comment or email if you think something’s not getting the recognition it deserves (keeping in mind this is a blog post, not a book).
1 I actually started exactly one day before DTrace integrated. But I had some experience (albeit limited) as an intern the previous year.
2 In all seriousness, it’s not that I don’t have to ever debug anything, or that the problems we have today are somehow orders of magnitude simpler than those in the past. What these tools provide is a dramatic reduction in time to root-cause. You still need the same inquisitive and logical mind to debug hard problems, its just that good tools let you form questions and get answers faster than you could before. Really good tools (like DTrace) let you ask the previously unanswerable questions. You may have been able to debug the problem before, but you would have ended up running around in circles trying to get data that’s now immediately available thanks to DTrace.
5 Responses
The lack of those commands in earlier Solarises (Solarii?) is one of the reasons that we find Solaris CAT so useful in addition to the other tools available. I too love the debugging features available in Solaris 10, however I don’t have the luxury of not looking at problems in earlier systems (given that it’s what they pay me for). Solaris CAT probably belongs in that timeline somewhere. I know that we were using it internally long before 4.0 was released.
That being said, it’s a rare day when you don’t hear me say something like, “if we could dtrace this, we’d have the answer by now”.
I’m just relieved to see that crash(1M) (deceased) didn’t make the list. Or perhaps Mike ripping out crash(1M) in June 2000 should be marked as progress? Regarding CAT: it’s fair to keep it (and its ilk) off the list; Eric’s history is only of tools in the OS, not of tools layered on top of it.
One milestone that probably should be added, however, is Bonwick’s kmem debugging facilities circa 1994. (And likewise, umem_debug(3MALLOC) in 2002.) And were I not so modest and self-effacing, I would probably also argue for ::findleaks (1998, shipped 2000) and ::typegraph (2003). 😉
The EOL of crash(1M) was in my notes but never made it to the final list; I’ve added it along with the EOL of adb for good measure. I totally forgot about libumem (and the original kmem allocator). I also threw in ::typegraph since it’s saved me a few times.
As Bryan points out, there are many useful tools like CAT, dbx, and lsof which won’t make this list simply because they’re not part of the OS.
Good point Bryan. Having started crashdumps with adb and kadb, I can certainly appreciate mdb/kmdb and especially not forgetting dtrace.
I’ll lose the modesty: you’ve got to add ::findleaks. ::findleaks has found more bugs in the system than any single one of those technologies — over 250 bugs to date! On the one hand, it’s not nearly as radical as something like DTrace — but it has made a substantial, quantifiable difference in the quality of software that we deliver. And thanks to libumem (and to the work of people like Alan to document its capabilities), ::findleaks is now finding lots of bugs outside Solaris as well. It may be hard for us to remember, but before ::findleaks, memory leaks were actually difficult to track down.
Okay, now back to my regular modest, self-effacing self… 😉