Eric Schrock's Blog

Month: July 2004

As Adam noted in the Solaris Top 11-20, watchpoints are now much more useful in Solaris 10. Before I go into specific details regarding Solaris 10 improvements, I thought I’d give a little technical background on how watchpoints actually work. This will be my second highly technical entry in as many days; in my next post I promise to tie this into some real-world applications and noticeable improvements in S10.

The idea of watchpoints has been around for a long time. The basic idea is to allow a debugger to set a watchpoint on a region of memory within a process. When that region of memory is accessed, the debugger is notified in order to take appropriate action. This typically serves two purposes. First, it’s useful for interactive debuggers when determining when a region of memory gets modified. Second, it can be used as a protection mechanism to avoid buffer overflows (more on this later).

As with most modern operating systems, Solaris implements a virtual memory system. A complete explanation of how virtual memory works is beyond the scope of a single blog post. The simplest way to explain it is that each process refers to memory by a virtual address, which corresponds to a physical piece of memory. Each piece of memory is called a page, which can be either mapped (resident in RAM) or unmapped (possibly stored on disk). The operating system has control over when and how pages get mapped in or out of memory. If a program tries to access memory that is unmapped, the OS will map in the necessary pages as needed. Once pages are mapped, accesses will be handled directly in hardware until the OS decides to unmap the memory1. There are many benefits of this, including the ability for processes to see a unified flat memory space, inability to access other processes’ memory, and the ability to store unused pages on disk until needed.

To implement watchpoints, we need a way for the operating system to intercept accesses to a specific virtual page within a process. If we leave pages mapped, then accesses will be handled in hardware and the OS will have no say in the matter. So we keep pages with watchpoints unmapped until they are actually accessed. When the process tries to read/write/execute from the watched page, the OS gets notified via a trap2. At this point, we temporarily map in the page and single step over the instruction that triggered the trap. If the instruction touches a watched area (note that there can be more than one watched area within a page), then we notify the debugger through a SIGTRAP signal. Otherwise, the instruction executes normally and the process continues.

Things become a little more complicated in a multithreaded program. If we map in a page for a single thread, then all other threads in the process will be able to access that memory without OS intervention. If another thread accesses the memory while we’re stepping over the instruction, we can miss triggering a watchpoint. To avoid this, we have to stop every thread in the process while we step over the faulting instruction. This can be very expensive; we’re looking into more efficient methods. I won’t spend too much time discussing how the debugger communicates with the OS when setting and reacting to watchpoints. Most of this information can be found in the proc(4) manpage.

With my next post I’ll examine some of the specific enhancements made to watchpoints in Solaris 10.

1This is obviously a very simplistic view of virtual memory. Curious readers should try a good OS textbook or two for more detailed information.

2Traps are quite an interesting subject by themselves. On Solaris SPARC, you can see what traps are ocurring with the very cool trapstat(1M) utility that Bryan wrote.

I’m finally back from vacation, and I’m here to help out with Adam’s Top 11-20 Solaris Features. I’ll be going into some details regarding one of the features I integrated into Solaris 10, pfiles with pathnames (which was edged out by libumem for the #11 spot by a lean at the finish line). This will be a technical discussion; for a good overview of why it’s so useful, see my previous entry.

There were several motivations for this project:

  1. Provide path information for MDB and DTrace.
  2. Make pathname information available for pfiles(1).
  3. Improve the performance of getcwd(3c).

First of all, we needed to record the information in the kernel somewhere. In Solaris, we have what’s known as the Virtual File System (VFS) layer. This is an abstract interface, where each file system fills in the implementation details so no other consumers has to know. Each file is represented by a vnode, which can be thought of as a superclass if you’re familiar with inheritence. The end result of this is that we can open a UFS file in the same way we open a /proc file, and the only one who knows the difference is the underlying filesystem. We can also change things at the VFS layer and not have to worry about each individual filesystem.

To address concerns over performance and the difficulty of bookkeeping, it was necessary to adjust the constraints of the problem appropriately. It is extremely difficult, if not impossible, to ensure that the path is always correct (consider hard links, unlinked files, and directory restructuring). To make the problem easier, we make no claim that the path is currently correct, only that it was correct at one time. Whenever we translate from a path to a vnode (known as a lookup) for the first time, we store the path information within the vnode. The performance hit is negligible (a memory allocation and a few string copies) and it only occurs when first looking up the vnode. We must be prepared for situations where no pathname is available, as some files have no meaningful path (sockets, for example).

With the magic of CTF, MDB and DTrace need no modification. Crash dumps now have pathnames for every open file, and with a little translator magic we end up with a stable DTrace interface like the io provider. We also use this to improve getcwd performance. Normally, we would have to lookup “..”, iterate over each entry until we find the matching vnode, record the entry name, lather, rinse, repeat. Now, we make a take a first stab at it by doing a forward lookup of the cached pathname, and if it’s the same vnode, then we simply return the pathname. getcwd has very stringent correctness requirements, so we have to fall back to the old method when our shortcut fails.

The only remaining difficultly was exporting this information to userland for programs like pfiles to use. For those of you familiar with /proc, this is exactly the type of problem it was designed to solve. We added symbolic links in /proc/<pid>/path for the current working directory, the root directory, each open file descriptor, and each object mapped in the address space. This allows you to run ls -l in the directory and see the pathname for each file. More importantly, the modifications to pfiles become trivial. The only tricky part is security. Because a vnode can have only one name, and there can be hard links to files or changing permissions, it’s possible for the user to be unable to access the path as it was originally saved. To avoid this, we do the equivalent of a resolvepath(2) in the kernel, and reject any paths which cannot be accessed or do not map to the same vnode. The end result of this is that we may lose this information is some exceptional circumstances (the directory layout of a filesystem is relatively static) but as Bart is fond of reminding us: performance is a goal, correctness is a constraint.

In a departure from my usual Solaris propaganda, I thought I’d try a little bit of history. This entry is aimed at all of you C programmers out there that enjoy the novelty of Obfuscated C. If you think you’re a real C hacker, and haven’t heard of the obfuscated C contenst, then you need to spend a few hours browsing their archives of past winners1.

If you’ve been reading manpages on your UNIX system, you’ve probably been using some form of troff2. This is an early typesetting language processor, dating back to pre-UNIX days. You can find some history here. The nroff and troff commands are essentially the same; they are built largely from the same source and differ only in their options and output formats.

The original troff was written by Joe F. Ossanna in assembly language for the PDP-11 in the early 70s. Along came this whizzy portable language known as C, so Ossana rewrote his formatting program. However, it was less of a rewrite and more of a direct translation of the assembly code. The result is a truly incomprehensible tangle of C code, almost completely uncommented. To top it off, Ossana was tragically killed in a car accident in 1977. Rumour has it that attempts were made to enhance troff, before Brian Kernighan caved in and rewrote it from scratch as ditroff.

If you’re curious just how incomprehensible 7000 lines of uncommented C code can be, you can find a later version of it from The Unix Tree, an invaluable resource for the nostalgic among us. To begin with, the files are named n1.c, n2.c, etc. To quote from ‘n6.c’:

register i,*j,k;
extern int chtab[];
if((i = getrq()) == 0)return(0);
for(j=chtab;*j != i;j++)if(*(j++) == 0)return(0);
k = *(++j) | chbits;
int i,j[];
register k;
if(((k = i-'0') >= 1) && (k <= 4) && (k != smnt))return(--k);
for(k=0; j[k] != i; k++)if(j[k] == 0)return(-1);

If this doesn’t convince you to write well-structured, well-commented code, I don’t know what will. The scary thing is that there are at least 18 bugs in our database open against nroff or troff; one of the side-effects of promising full backwards compatibility. Anyone who has the courage to putback nroff changes earns a badge of honor here – it is a dark place that has claimed the free time of a few brave programmers3. Whenever an open bug report includes such choice phrases as this, you know you’re in trouble:

I’ve seen this problem on non-Sun Unix as well, like Ultrix 3.1 so the problem likely came from Berkeley. The System V version of *roff (ditroff ?) doesn’t have this problem.

1One of my personal favorites is this little gem, a 2000 winner ‘natori’. It should be a full moon tomorrow night…

#include <stdio.h>
#include <math.h>
double l;main(_,o,O){return putchar((_--+22&&_+44&&main(_,-43,_),_&&o)?(main(-43,
405859.-4.7+acos(l/2))<1.57))[" #"])):10);}

2On Solaris, most manpages are written in SGML, and can be found in /usr/share/man/sman*.

3I’d like to think that the x86 disassembler is a close second, but maybe that’s just because I’m a survivor.

During S10 development, there have been numerous enhancements to the ptools (see proc(1)). Here are two recent additions that may have slipped through the cracks with all the hype surrounding Solaris 10. They’re not quite as ground breaking as DTrace or Zones, but well-suited for some blog exposure.

pargs -l

The pargs command has a new option to display the command and all its arguments on a single line. This makes it possible to cut and paste to restart running commands with the same set of arguments.

$ pargs -l `pgrep sleep`
/usr/bin/sleep 10 here are some args

java support in pstack

This one won’t hit the streets until build 59, which is due out as the next Solaris Express build. Thanks to the JVM guys, we’ve added support for pstack to display java frames. If you’re using the latest java release (1.5, err… 5.0) and you run pstack on a java process, you’ll get to see all the java functions, including line numbers. Note the java frames with an asterisk in the example below.

$ cat
public class Main {
public static int go(int a) {
if (a == 0) {
for (;;)
return (1 + go(a - 1));
public static void main(String[] argv) {
$ pstack `pgrep java`
144381: /usr/jdk/instances/jdk1.5.0/bin/java Main
-----------------  lwp# 1 / thread# 1  --------------------
9940dfae * Main.go(I)I+0
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.go(I)I+11 (line 10)
99402a3f * Main.main([Ljava/lang/String;)V+10 (line 15)
9f4dbbe4 * StubRoutines (1)
9f4dbbe4 __1cCosUos_exception_wrapper6FpFpnJJavaValue_pnMmethodHandle_pnRJavaCa
llArguments_pnGThread__v2468_v_ (8047130, 8047038, 8047068, 8074538, 804702c, 9f
4dbee8) + 14
9f4dbbe4 __1cCosUos_exception_wrapper6FpFpnJJavaValue_pnMmethodHandle_pnRJavaCa
llArguments_pnGThread__v2468_v_ (9f4dbc90, 8047130, 8047038, 8047068, 8074538) +
9f4dbee8 __1cJJavaCallsEcall6FpnJJavaValue_nMmethodHandle_pnRJavaCallArguments_
pnGThread__v_ (8047130, 8074ac4, 8047068, 8074538) + 28
9f6ee200 __1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_nLJNICallT
ype_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_ (80745f4, 8047130, 0, 0, 8
072681, 804713c) + 180
9f59f7af jni_CallStaticVoidMethod (80745f4, 80750a0, 8072681, 80750b0) + 10f
080526ee main     (0, 806fbf8, 8047a04) + a4c
08051c0a ???????? (2, 8047ae0, 8047b05, 0, 8047b0a, 8047b31) + 8051c0a
-----------------  lwp# 2 / thread# 2  --------------------
9fb53e3c lwp_cond_wait (8160740, 8160728, 0, 0)
9f4b6182 __1cHMonitorEwait6Mil_i_ (81208c8, 1, 0) + 432
9f6bfcad __1cNGCTaskManagerIget_task6MI_pnGGCTask__ (81606c0, 0) + 90
[ ... ]

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012