Eric Schrock's Blog

How to add a system call to OpenSolaris

June 14, 2005

When I first started in the Solaris group, I was faced with two equally
difficult tasks: learning the development model, and understanding the source
code. For both these tasks, the recommended method is usually picking a small
bug and working through the process. For the curious, the first bug I putback
to ON was 4912227
(ptree call returns zero on failure), a simple bug with near zero risk. It
was the first step down a very long road.

As a another first step, someone suggested adding a very simple system call to the
kernel. This turned out to be a whole lot harder than one would expect, and has
so many subtle aspects that experienced Solaris engineers (myself included)
still miss some of the necessary changes. With that in mind, I thought a
reasonable first OpenSolaris blog would be describing exactly how to add a new
system call to the kernel.

For the purposes of this post, we will assume that it’s a simple system call
that lives in the generic kernel code, and we’ll put the code into an existing
file to avoid having to deal with Makefiles. The goal is to print an arbitrary
message to the console whenever the system call is issued.

1. Picking a syscall number

Before writing any real code, we first have to pick a number that will
represent our system call. The main source of documentation here is
syscall.h,
which describes all the available system call numbers, as well as which ones are
reserved. The maximum number of syscalls is currently 256 (NSYSCALL), which
doesn’t leave much space for new ones. This could theoretically be extended – I
believe the hard limit is in the size of sysset_t, whose 16 integers
must be able to represent a complete bitmask of all system calls. This puts our
actual limit at 16*32, or 512, system calls. But for the purposes of our
tutorial, we’ll pick system call number 56, which is currently unused. For my
own amusement, we’ll name our (my?) system call ‘schrock’. So first we add the
following line to syscall.h

#define SYS_uadmin      55
#define SYS_schrock     56
#define SYS_utssys      57

2. Writing the syscall handler

Next, we have to actually add the function that will get called when we
invoke the system call. What we should really do is add a new file
schrock.c to usr/src/uts/common/syscall,
but I’m trying to avoid Makefiles. Instead, we’ll just stick it in getpid.c:

#include <sys/cmn_err.h>
int
schrock(void *arg)
{
char	buf[1024];
size_t	len;
if (copyinstr(arg, buf, sizeof (buf), &len) != 0)
return (set_errno(EFAULT));
cmn_err(CE_WARN, "%s", buf);
return (0);
}

Note that declaring a buffer of 1024 bytes on the stack is a very bad
thing to do in the kernel. We have limited stack space, and a stack overflow
will result in a panic. We also don’t check that the length of the string was
less than our scratch space. But this will suffice for illustrative purposes.
The cmn_err()
function is the simplest way to display messages from the kernel.

3. Adding an entry to the syscall table

We need to place an entry in the system call table. This table lives in sysent.c,
and makes heavy use of macros to simplify the source. Our system call takes a
single argument and returns an integer, so we’ll need to use the
SYSENT_CI macro. We need
to add a prototype for our syscall, and add an entry to the sysent and
sysent32 tables:

int     rename();
void    rexit();
int     schrock();
int     semsys();
int     setgid();
/* ... */
/* 54 */ SYSENT_CI("ioctl",             ioctl,          3),
/* 55 */ SYSENT_CI("uadmin",            uadmin,         3),
        /* 56 */ SYSENT_CI("schrock",		schrock,	1),
/* 57 */ IF_LP64(
SYSENT_2CI("utssys",    utssys64,       4),
SYSENT_2CI("utssys",    utssys32,       4)),
/* ... */
/* 54 */ SYSENT_CI("ioctl",             ioctl,          3),
/* 55 */ SYSENT_CI("uadmin",            uadmin,         3),
        /* 56 */ SYSENT_CI("schrock",		schrock,	1),
/* 57 */ SYSENT_2CI("utssys",           utssys32,       4),

4. /etc/name_to_sysnum

At this point, we could write a program to invoke our system call, but the
point here is to illustrate everything that needs to be done to integrate
a system call, so we can’t ignore the little things. One of these little things
is /etc/name_to_sysnum, which provides a mapping between system call
names and numbers, and is used by dtrace(1M), truss(1), and
friends. Of course, there is one version for x86 and one for SPARC, so you will
have to add the following lines to both the
intel
and
SPARC
versions:

ioctl                   54
uadmin                  55
schrock                 56
utssys                  57
fdsync                  58

5. truss(1)

Truss does fancy decoding of system call arguments. In order to do this, we
need to maintain a table in truss that describes the type of each argument for
every syscall. This table is found in systable.c.
Since our syscall takes a single string, we add the following entry:

{"ioctl",       3, DEC, NOV, DEC, IOC, IOA},                    /*  54 */
{"uadmin",      3, DEC, NOV, DEC, DEC, DEC},                    /*  55 */
{"schrock",     1, DEC, NOV, STG},                              /*  56 */
{"utssys",      4, DEC, NOV, HEX, DEC, UTS, HEX},               /*  57 */
{"fdsync",      2, DEC, NOV, DEC, FFG},                         /*  58 */

Don’t worry too much about the different constants. But be sure to read up
on the truss source code if you’re adding a complicated system call.

6. proc_names.c

This is the file that gets missed the most often when adding a new syscall.
Libproc uses the table in proc_names.c
to translate between system call numbers and names. Why it doesn’t make use of
/etc/name_to_sysnum is anybody’s guess, but for now you have to update
the systable array in this file:

"ioctl",                /* 54 */
"uadmin",               /* 55 */
        "schrock",              /* 56 */
"utssys",               /* 57 */
"fdsync",               /* 58 */

7. Putting it all together

Finally, everything is in place. We can test our system call with a simple
program:

#include <sys/syscall.h>
int
main(int argc, char **argv)
{
syscall(SYS_schrock, "OpenSolaris Rules!");
return (0);
}

If we run this on our system, we’ll see the following output on the
console:

June 14 13:42:21 halcyon genunix: WARNING: OpenSolaris Rules!

Because we did all the extra work, we can actually observe the behavior using
truss(1), mdb(1), or dtrace(1M). As you can see,
adding a system call is not as easy as it should be. One of the ideas that has
been floating around for a while is the Grand Unified Syscall(tm) project, which
would centralize all this information as well as provide type information for
the DTrace syscall provider. But until that happens, we’ll have to deal with
this process.

Technorati Tag:

Technorati Tag:

7 Responses

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives