Eric Schrock's Blog

Month: February 2005

A long time ago I described a debugging problem where it was necessary to determine which threads owned a reader lock. In particular, I used the heuristic that if the address of the rwlock is in a particular thread’s stack, then it most likely held by the thread (and can be verified by examining the thread’s stack). This works 99% of the time, because you typically have the following:

rw_enter(lock, RW_READER);
/* ... do something ... */

The compiler has to preserve the address of the lock across all the junk in the middle, so it almost always ends up getting pushed on the stack. At described in the previous post, this means a combination of ::kgrep and ::whatis, plus some hand-pruning, to get the threads in question. At the time, I mentioned how nice it would be to have a dedicated command do this dirty work. Now that Solaris 10 has shipped, I finally sat down and gave it a try. In a testament to MDB’s well-designed interfaces, I was able to write the entire command in under 5 minutes with just 50 lines of code. On top of that, it runs in a fraction of the time. Rather than searching the entire address space, we only have to look at the stack for each thread. For example:

> c8d45bb6::kgrep | ::whatis
c8d45ae4 is c8d45aa0+44, allocated as a thread structure
cae92ed8 is in thread c8d45aa0's stack
cae92ee4 is in thread c8d45aa0's stack
cae92ef8 is in thread c8d45aa0's stack
cae92f24 is in thread c8d45aa0's stack
> c8d45bb6::whatthread

The simple output allows it to be piped to ::findstack to quickly locate questionable threads. There have been discussions about maintaining a very small set of held reader locks in the thread structure, but it’s a difficult problem to solve definitively (without introducing massive performance regressions).

This demonstrates an oft-overlooked benefit of MDB. Though very few developers exist outside of the Solaris group, developing MDB modules is extremely simple and powerful (there are more than 500 commands and walkers in MDB today). Over time, I think I’ve almost managed to suppress all the painful GDB memories from my college years…

Today, I thought I’d share a real-world experience that might portray DTrace in a slightly different light than you’re used to. The other week, I was helping a customer with the following question:

Why is automountd constantly taking up 1.2% of CPU time?

The first thought that came to mind was a broken automountd. But if that were the case, you’d be more likely to see it spinning and stealing 100% of the CPU. Just to be safe, I asked the customer to send truss -u a.out:: output for the automountd process. As expected, I saw automountd chugging away, happily servicing each request as it came in. Automountd was doing nothing wrong – some process was indirectly sending millions of requests a day to the automounter. Taking a brief look at the kernel code, I responded with the following D script:

#!/usr/sbin/dtrace -s
@lookups[execname, stringof(args[0]->fi_path)] = count();

The customer gave it a shot, and found a misbehaving program that was continuously restarting and causing loads of automount activity. Without any further help from me, the customer could easily see exactly which application was the source of the problem, and quickly fixed the misconfiguration.

Afterwards, I reflected on how simple this exchange was, and how difficult it would have been in the pre-Solaris 10 days. Now, I don’t expect customers to be able to come up with the above D script on their own (though industrious admins will soon be able to wade through OpenSolaris code). But I was able to resolve their problem in just 2 emails. I was reminded of the infamous gtik2_applet2 fiasco described in the DTrace USENIX paperautomountd was just a symptom of an underlying problem, part of an interaction that was prohibitively difficult to trace to its source. One could turn on automountd debug output, but you’d still only see the request itself, not where it came from. To top it off, the offending processes were so short-lived, that they never showed up in prstat(1) output, hiding from traditional system-wide tools.

After a little thought, I imagined a few Solaris 9 scenarios where I’d either set a kernel breakpoint via kadb, or set a user breakpoint in automountd and use mdb -k to see which threads were waiting for a response. But these (and all other solutions I came up with) were:

  • Disruptive to the running system
  • Not guaranteed to isolate the particular problem
  • Difficult for the customer to understand and execute

It really makes me feel the pain our customer support staff must go through now to support Solaris 8 and Solaris 9. DTrace is such a fundamental change in the debugging and observability paradigm that it changes not only the way we kernel engineers work, but also the way people develop applications, administer machines, and support customers. Too bad we can’t EOL Solaris 8 and Solaris 9 next week for the benefit of Sun support…

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012