Category: ZFS

A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I’d take a moment to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate yields:

-------------------------------------------------
UFS: kernel= 46806   user= 40147   total= 86953
SVM: kernel= 75917   user=161984   total=237901
TOTAL: kernel=122723   user=202131   total=324854
-------------------------------------------------
ZFS: kernel= 50239   user= 21073   total= 71312
-------------------------------------------------

The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code (kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what those ZFS numbers will look like in 20 years…

Well, I’m back. I’ve been holding off blogging for a while due to ZFS. Now that it’s been released, I’ve got tons of stuff lined up to talk about in the coming weeks.

I first started working on ZFS about nine months ago, and my primary task from the beginning was to redesign the CLI for managing storage pools and filesystems. The existing CLI at the time had evolved rather organically over the previous 3 years, each successive feature providing some immediate benefit but lacking in long-term or overarching design goals. To be fair, it was entirely capable in the job it was intended to do – it just needed to be rethought in the larger scheme of things.

I have some plans for detailed blog posts about some of the features, but I thought I’d make the first post a little more general and describe how I approached the CLI design, and some of the major principles behind it.

Simple but powerful

One of the hardest parts of designing an effective CLI is to make it simple enough for new users to understand, but powerful enough so that veterans can tweak everything they need to. With that in mind, we adopted a common design philosophy:

“Simple enough for 90% of the users to understand, powerful enough for the other 10% to use

A good example of this philosophy is the ‘zfs list’ command. I plan to delve into some of the history behind its development at a later point, but you can quickly see the difference between the two audiences. Most users will just use ‘zfs list’:

$ zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
tank                  55.5K  73.9G   9.5K  /tank
tank/bar                 8K  73.9G     8K  /tank/bar
tank/foo                 8K  73.9G     8K  /tank/foo

But a closer look at the usage reveals a lot more power under the hood:

list [-rH] [-o property[,property]...] [-t type[,type]...]
[filesystem|volume|snapshot] ...

In particular, you can ask questions like ‘what is the amount of space used by all snapshots under tank/home?’ We made sure that sufficient options existed so that power users could script whatever custom tools they wanted.

Solution driven error messages

Having good error messages is a requirement for any reasonably complicated system. The Solaris Fault Management Architecture has proved that users understand and appreciate error messages that tell you exactly what is wrong in plain english, along with how it can be fixed.

A great example of this is through the ‘zpool status’ output. Once again, I’ll go into some more detail about the FMA integration in a future post, but you can quickly see how basic FMA integration really allows the user to get meaningful diagnositics on their pool:

$ zpool status
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool online' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
NAME        STATE     READ WRITE CKSUM
tank        ONLINE       0     0     0
mirror    ONLINE       0     0     0
c1d0s0  ONLINE       0     0     3
c0d0s0  ONLINE       0     0     0

Consistent command syntax

When it comes to command line syntax, everyone seems to have a different idea of what makes the most sense. When we started redesigning the CLI, we took a look at a bunch of other tools in solaris, focusing on some of the more recent ones which had undergone a more rigorous design. In the end, our primary source of inspiration were the SMF (Server Management Facility) commands. To that end, every zfs(1M) and zpool(1M) command has the following syntax:

<command> <verb> <options> <noun> ...

There are no “required options”. We tried to avoid positional parameters at all costs, but there are certain subcommands (zfs get, zfs get, zfs clone, zpool replace, etc) that fundamentally require multiple operands. In these cases, we try to direct the user with informative error messages indicating that they may have forgotten a parameter:

# zpool create c1d0 c0d0
cannot create 'c1d0': pool name is reserved
pool name may have been omitted

If you mistype something and find that the error message is confusing, please let us know – we take error messages very seriously. We’ve already had some feedback for certain commands (such as ‘zfs clone’) that we’re working on.

Modular interface design

On a source level, the initial code had some serious issues around interface boundaries. The problem is that the user/kernel interface is managed through ioctl(2) calls to /dev/zfs. While this is a perfectly fine solution, we wound up with multiple consumers all issuing these ioctl() calls directly, making it very difficult to evolve this interface cleanly. Since we knew that we were going to have multiple userland consumers (zpool and zfs), it made much more sense to construct a library (libzfs) which was responsible for managing this direct interface, and have it present a unified object-based access method for consumers. This allowed us to centralize logic in one place, and the command themselves became little more than glorified argument parsers around this library.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012

Archives