Eric Schrock's Blog

Category: OpenSolaris

When I first started in the Solaris group, I was faced with two equally
difficult tasks: learning the development model, and understanding the source
code. For both these tasks, the recommended method is usually picking a small
bug and working through the process. For the curious, the first bug I putback
to ON was 4912227
(ptree call returns zero on failure), a simple bug with near zero risk. It
was the first step down a very long road.

As a another first step, someone suggested adding a very simple system call to the
kernel. This turned out to be a whole lot harder than one would expect, and has
so many subtle aspects that experienced Solaris engineers (myself included)
still miss some of the necessary changes. With that in mind, I thought a
reasonable first OpenSolaris blog would be describing exactly how to add a new
system call to the kernel.

For the purposes of this post, we will assume that it’s a simple system call
that lives in the generic kernel code, and we’ll put the code into an existing
file to avoid having to deal with Makefiles. The goal is to print an arbitrary
message to the console whenever the system call is issued.

1. Picking a syscall number

Before writing any real code, we first have to pick a number that will
represent our system call. The main source of documentation here is
syscall.h,
which describes all the available system call numbers, as well as which ones are
reserved. The maximum number of syscalls is currently 256 (NSYSCALL), which
doesn’t leave much space for new ones. This could theoretically be extended – I
believe the hard limit is in the size of sysset_t, whose 16 integers
must be able to represent a complete bitmask of all system calls. This puts our
actual limit at 16*32, or 512, system calls. But for the purposes of our
tutorial, we’ll pick system call number 56, which is currently unused. For my
own amusement, we’ll name our (my?) system call ‘schrock’. So first we add the
following line to syscall.h

#define SYS_uadmin      55
#define SYS_schrock     56
#define SYS_utssys      57

2. Writing the syscall handler

Next, we have to actually add the function that will get called when we
invoke the system call. What we should really do is add a new file
schrock.c to usr/src/uts/common/syscall,
but I’m trying to avoid Makefiles. Instead, we’ll just stick it in getpid.c:

#include <sys/cmn_err.h>
int
schrock(void *arg)
{
char	buf[1024];
size_t	len;
if (copyinstr(arg, buf, sizeof (buf), &len) != 0)
return (set_errno(EFAULT));
cmn_err(CE_WARN, "%s", buf);
return (0);
}

Note that declaring a buffer of 1024 bytes on the stack is a very bad
thing to do in the kernel. We have limited stack space, and a stack overflow
will result in a panic. We also don’t check that the length of the string was
less than our scratch space. But this will suffice for illustrative purposes.
The cmn_err()
function is the simplest way to display messages from the kernel.

3. Adding an entry to the syscall table

We need to place an entry in the system call table. This table lives in sysent.c,
and makes heavy use of macros to simplify the source. Our system call takes a
single argument and returns an integer, so we’ll need to use the
SYSENT_CI macro. We need
to add a prototype for our syscall, and add an entry to the sysent and
sysent32 tables:

int     rename();
void    rexit();
int     schrock();
int     semsys();
int     setgid();
/* ... */
/* 54 */ SYSENT_CI("ioctl",             ioctl,          3),
/* 55 */ SYSENT_CI("uadmin",            uadmin,         3),
        /* 56 */ SYSENT_CI("schrock",		schrock,	1),
/* 57 */ IF_LP64(
SYSENT_2CI("utssys",    utssys64,       4),
SYSENT_2CI("utssys",    utssys32,       4)),
/* ... */
/* 54 */ SYSENT_CI("ioctl",             ioctl,          3),
/* 55 */ SYSENT_CI("uadmin",            uadmin,         3),
        /* 56 */ SYSENT_CI("schrock",		schrock,	1),
/* 57 */ SYSENT_2CI("utssys",           utssys32,       4),

4. /etc/name_to_sysnum

At this point, we could write a program to invoke our system call, but the
point here is to illustrate everything that needs to be done to integrate
a system call, so we can’t ignore the little things. One of these little things
is /etc/name_to_sysnum, which provides a mapping between system call
names and numbers, and is used by dtrace(1M), truss(1), and
friends. Of course, there is one version for x86 and one for SPARC, so you will
have to add the following lines to both the
intel
and
SPARC
versions:

ioctl                   54
uadmin                  55
schrock                 56
utssys                  57
fdsync                  58

5. truss(1)

Truss does fancy decoding of system call arguments. In order to do this, we
need to maintain a table in truss that describes the type of each argument for
every syscall. This table is found in systable.c.
Since our syscall takes a single string, we add the following entry:

{"ioctl",       3, DEC, NOV, DEC, IOC, IOA},                    /*  54 */
{"uadmin",      3, DEC, NOV, DEC, DEC, DEC},                    /*  55 */
{"schrock",     1, DEC, NOV, STG},                              /*  56 */
{"utssys",      4, DEC, NOV, HEX, DEC, UTS, HEX},               /*  57 */
{"fdsync",      2, DEC, NOV, DEC, FFG},                         /*  58 */

Don’t worry too much about the different constants. But be sure to read up
on the truss source code if you’re adding a complicated system call.

6. proc_names.c

This is the file that gets missed the most often when adding a new syscall.
Libproc uses the table in proc_names.c
to translate between system call numbers and names. Why it doesn’t make use of
/etc/name_to_sysnum is anybody’s guess, but for now you have to update
the systable array in this file:

"ioctl",                /* 54 */
"uadmin",               /* 55 */
        "schrock",              /* 56 */
"utssys",               /* 57 */
"fdsync",               /* 58 */

7. Putting it all together

Finally, everything is in place. We can test our system call with a simple
program:

#include <sys/syscall.h>
int
main(int argc, char **argv)
{
syscall(SYS_schrock, "OpenSolaris Rules!");
return (0);
}

If we run this on our system, we’ll see the following output on the
console:

June 14 13:42:21 halcyon genunix: WARNING: OpenSolaris Rules!

Because we did all the extra work, we can actually observe the behavior using
truss(1), mdb(1), or dtrace(1M). As you can see,
adding a system call is not as easy as it should be. One of the ideas that has
been floating around for a while is the Grand Unified Syscall(tm) project, which
would centralize all this information as well as provide type information for
the DTrace syscall provider. But until that happens, we’ll have to deal with
this process.

Technorati Tag:

Technorati Tag:

The last day of FISL has come and gone, thankfully. I’m completely drained, both physically and mentally. As you can probably tell from the comments on yesterday’s blog entry, we had quite a night out last night in Porto Alegre. I didn’t stay out quite as late as some of the Brazil guys, but Ken and I made it back in time to catch about 4 hours of sleep before heading off to the conference. Thankfully I remembered to set my alarm, otherwise I probably would have ended up in bed until the early afternoon. The full details of the night are better told in person…

This last day was significantly quieter than previous days. With the conference winding down, I assume that many people took off early. Most of our presentations today were to an audience of 2 or 3 people, and we even had to cancel some of the early ones as no one was there. I managed to give presentations for Performance, Zones, and DTrace, despite my complete lack of sleep. The DTrace presentation was particularly rough because it’s primarily demo-driven, with no set plan. This turns out to be rather difficult after a night of no sleep and a few too many caipirinhas.

The highlight of the day was when a woman (stunningly beautiful, of course) came up to me while I was sitting in one of the chairs and asked to take a picture with me. We didn’t talk at all, and I didn’t know who she was, but she seemed psyched to be getting her picture taken with someone from Sun. I just keep telling myself that it was my stunning good looks that resulted in the picture, not my badge saying “Sun Microsystems”. I can dream, can’t I?

Tomorrow begins the 24 hours of travelling to get me back home. I can’t wait to get back to my own apartment and a normal lifestyle.

The exhaustion continues to increase. Today I did 3 presentations: DTrace, Zones, and FMA (which turned into OpenSolaris). Every one took up the full hour allotted. And tomorrow I’m going to add a Solaris performance presentation, to bring the grand total to 4 hours of presentations. Given how bad the acoustics are on the exposition floor, my goal is to lose my voice by the end of the night. So far, I’ve settled into a schedule: wake up around 7:00, check email, work on slides, eat breakfast, then get to the conference around 8:45. After a full day of talking and giving presentations, I get back to the hotel around 7:45 and do about an hour of work/email before going out to dinner. We get back from dinner around 11:30, at which point I get to blogging and finishing up some work. Eventaully I get to sleep around 1:00, at which point I have to do the whole thing the next day. Thank god tomorrow is the end, I don’t know how much more I can take.

Today’s highlight was when Dimas (from Sun Brazil) began an impromptu Looking Glass demo towards the end of the day. He ended up overflowing our booth with at least 40 people for a solid hour before the commotion started to die down. Those of us sitting in the corner were worried we’d have to lave to make room. Our Solaris presentations hit 25 or so people, but never so many for so long. The combination of cool eye candy and a native Portuguese speaker really helped out (though most people probably couldn’t hear him anyway).

Other highlights included hanging out with the folks at CodeBreakers, who really seem to dig Solaris (Thiago had S10 installed on his laptop within half a day). We took some pictures with them (which Dave should post soon), and are going out for barbeque and drinks tonight with them and 100+ other open source Brazil folks. I also helped a few other people get Solaris 10 installed on their laptops (mostly just the “disable USB legacy support” problem). It’s unbelievably cool to see the results of handing out Solaris 10 DVDs before even leaving the conference. The top Solaris presentations were understandably DTrace and Zones, though the booth was pretty well packed all day.

Let’s hope the last day is as good as the rest. Here’s to Software Livre!

Another day at FISL, another day full of presentations. Today we did mini-presentations every hour on the hour, most of which were very well attended. When we overlapped with the major keynote sessions, turnout tended to be low, but other than that it was very successful. We covered OpenSolaris, DTrace, FMA, SMF, Security, as well as a Java presentation (by Charlie, not Dave or myself). As usual, lots of great questions from the highly technical audience.

The highlight today was a great conversation with a group of folks very interested in starting an OpenSolaris users group in Brazil. Extremely nice group of guys, very interested in technology and helping OpenSolaris build a greater presence in Brazil (both through user groups and Solaris attendance at conferences). I have to say that after experiencing this conference and seeing the enthusiasm that everyone has for exciting technology and open source, I have to agree that Brazil is a great place to focus our OpenSolaris presence. Hopefully we’ll see user groups pop up here as well as the rest of the world. We’ll be doing everything we can to help from within Sun.

The other, more amusing, highlight of the day was during my DTrace demonstration. I needed an interesting java application to demonstrate the jstack() DTrace action, so I started up the only java application (apart from some internal Sun tools) that I use on a regular basis: Yahoo! Sports Fantasy Baseball StatTracker (the classic version, not the new flash one). I tried to explain that maybe I was trying to debug why the app was lying to me about Tejada going 0-2 so far in the Sox/Orioles game; really he should have hit two homers and I should be dominating this week’s scores1. I was rather amused, but I think the cultural divide was a little too wide. Not only baseball, but fantasy baseball: I don’t blame the audience at all.

Technorati tags:


1 This is clearly a lie. Despite any dreams of fantasy baseball domination, I would never root for my players in a game over the Red Sox. In the end, Ryan’s 40.5 ERA was worth the bottom of the ninth comeback capped by Ortiz’s 3-run shot.

Dave Powell and myself have arrived at FISL, an open source conference in Brazil, along with a crowd of other Sun folks. Dave and I (with introduction from Sun VP Tom Goguen) will be hosting a 4 hour OpenSolaris pre-event tomorrow, June 1st. We’ll be talking about all the cool features available in OpenSolaris, as well as how Solaris development works today and how we hope it will work in the future. If you’re attending the conference, be sure to stop by to learn about OpenSolaris, and what makes Solaris (and Solaris developers) tick. We’ll also be hanging around the Sun booth during the rest of the conference, giving mini-presentations, demos, answering questions, and chatting with anyone who will listen. We’re happy to talk about OpenSolaris, Solaris, Sun, or your favorite scenes from Monty Python and the Holy Grail. Oh yeah, there will be lots of T-shirts and Solaris 10 DVDs as well.

So it looks like my blog made it over to the frontpage of news.com in this article about slipping Solaris 10 features. Don’t get your hopes up – I’m not going to refute Genn’s claims; we certainly are not scheduled for a specific update at the moment. But pay attention to the details: ZFS and Janus will be available in an earlier Solaris Express release. I also find it encouraging that engineers like myself have a voice that actually gets picked up by the regular press (without being blown out of proportion or slashdotted).

I would like to point out that I putback the last major chunk of command redesign to the ZFS gate yesterday 😉 There are certainly some features left to implement, but the fact that I re-whacked all of the userland components (within six weeks, no less) should not be interpreted as any statement of schedule plans. Hopefully I can get into some of the details of what we’re doing but I don’t want to be seen as promoting vaporware (even though we have many happy beta customers) or exposing unfinished interfaces which are subject to change.

I also happen to be involved with the ongoing Janus work, but that’s another story altogether. I swear there’s no connection between myself and slipping products (at least not one where I’m the cause).

Update: So much for not getting blown out of proportion. Leave it to the second tier news sites to turn “not scheduled for an update” into “delayed indefinitely over deficiencies”. Honestly, rewriting 5% of the code should hardly be interpreted as “delayed indefinitely” – so much for legitimate journalism. Please keep in mind that all features will hit Software Express before a S10 Update, and OpenSolaris even sooner.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives