Per-process statistics and the future of Solaris observability

June 30, 2004

So a few posts ago I asked for some suggestions on improving observability in Solaris, specifically with respect to LSOF. I thought I’d summarize the responses, which fell into two basic groups:

  1. Socket and process visibility. Something along the lines of lsof -i or netstat -p on Linux.
  2. Per-process mpstat, vmstat, and iostat.

I’ll defer the first suggestion for the moment. The second suggestion is straightforward, thanks to the mystical powers of DTrace. As you can see from my previous post, it’s simple to aggregate I/O on a per-process basis. Thanks to the vminfo and sysinfo DTrace providers, we can do the same for most any interesting statistic. The problem with traditional kstats1 is that they present static state after the fact – you cannot tell why or when a counter was incremented. But for every kstat reported by vmstat and mpstat, a DTrace probe exists wherever it’s incremented. Throw in some predicates and aggregations, and we’re talking instant observability.

I envision two forms of these tools. The first, as suggested in previous comments, would be present prstat(1) style output, sorted according to the user’s choice of statistic. This would be aimed as administrators trying to understand systemic problems. The second form would take a pid and show all the relevant statistics for just that process. This would be aimed at developers trying to understand their application’s behavior.

Today, anyone can write D scripts to do this. But there’s something to be said for having a canned tool to jumpstart analysis. It doesn’t have to be too powerful; once you get beyond these basic questions you’ll be needing to write custom D scripts anyway. I’m sure the DTrace team has given this far more thought than I have, but I thought I’d let you know that your comments aren’t descending into some kind of black hole. Blogging provides a unique forum for customer conversations; somewhere between a face to face meeting (which tends to not scale well) and a newsgroup posting (which lacks organization and personal attention). Many thanks to those in Sun who pushed for this new forum, and those of you out there reading and taking advantage of it.


1 The statistics used by these tools are part of the kstat(1M) facility. The kernel provides any number of statistics from every different subsystem, which can be extracted through a library interface and processed by user applications.

2 Responses

  1. Even more than that Eric, the mib provider that went back into build 63 lets you probe at any point in which any of the counters that you might accesss through SNMP get modified.

  2. It’s fantastic people can get suggestions, tips and answers right from the people that design the kernel. And you’re all incredibly responsive.
    BTW: nice “black hole” link. Real subtle 🙂

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives