I’ve been meaning to get around to blogging about these features that I
putback a while ago, but have been caught up in a few too many things.
In any case, the following new ZFS features were putback to build 48 of
Nevada, and should be availble in the next Solaris Express
Create Time Properties
An old RFE has been to provide a way to specify properties at create
time. For users, this simplifies admnistration by reducing the number
of commands which need to be run. It also allows some race conditions
to be eliminated. For example, if you want to create a new dataset with
a mountpoint of ‘none’, you first have to create it and the underlying
inherited mountpoint, only to remove it later by invoking ‘zfs set
mountpoint=none’.
From an implementation perspective, this allows us to unify our
implementation of the ‘volsize’ and ‘volblocksize’ properties, and pave
the way for future create-time only properties. Instead of having a
separate ioctl() to create a volume and passing in the two size
parameters, we simply pass them down as create-time options.
The end result is pretty straightforward:
# zfs create -o compression=on tank/home # zfs create -o mountpoint=/export -o atime=off tank/export
‘canmount’ property
The ‘canmount’ property allows you create a ZFS dataset that serves
solely as a mechanism for inheriting properties. When we first created the
hierarchical dataset model, we had the notion of ‘containers’ –
filesystems with no associated data. Only these datasets could contain
other datasets, and you had to make the decision at create-time.
This turned out to be a bad idea for a number of reasons. It
complicated the CLI, forced the user to make a create-time decision that
could not be changed, and led to confusion when files were accidentally
created on the underlying filesystem. So we made every filesystem able
to have child filesystems, and all seemed well.
However, there is power in having a dataset that exists in the hierarchy
but has no associated filesystem data (or effectively none by preventing
from being mounted). One can do this today by setting the ‘mountpoint’
property to ‘none’. However, this property is inherited by child
datasets, and the administrator cannot leverage the power of inherited
mountpoints. In particular, some users have expressed desire to have
two sets of directories, belonging to different ZFS parents (or even to
UFS filesystems), share the same inherited directory. With the new
‘canmount’ property, this becomes trivial:
# zfs create -o mountpoint=/export -o canmount=off tank/accounting # zfs create -o mountpoint=/export -o canmount=off tank/engineering # zfs create tank/accounting/bob # zfs create tank/engineering/anne
Now, both anne and bob have directories at ‘/export/’, except that
they are inheriting ZFS properties from different datasets in the
hierarchy. The adminsitrator may decide to turn compression on for one
group of people or another, or set a quota to limit the amount of space
consumed by the group. Or simply have a way to view the total amount of
space consumed by each group without resorting to scripted du(1).
User Defined Properties
The last major RFE in this wad added the ability to set arbitrary
properties on ZFS datasets. This provides a way for administrators to
annotate their own filesystems, as well as ISVs to layer intelligent
software without having to modify the ZFS code to introduce a new
property.
A user-defined property name is one which contains a colon (:). This
provides a unique namespace which is guaranteed to not overlap with
native ZFS properties. The emphasis is to use the colon to separate a
module and property name, where ‘module’ should be a reverse DNS name.
For example, a theoretical Sun backup product might do:
# zfs set com.sun.sunbackup:frequency=1hr tank/home
The property value is an arbitrary string, and no additional validation
is done on it. These values are always inherited. A local adminstrator
might do:
# zfs set localhost:backedup=9/19/06 tank/home # zfs list -o name,localhost:backedup NAME LOCALHOST:BACKEDUP tank - tank/home 9/19/06 tank/ws 9/10/06
The hope is that this will serve as a basis for some innovative products
and home grown solutions which interact with ZFS datasets in a
well-defined manner.
4 Responses
This ZFS things sounds pretty interesting. Unfortunately, I am a total n00b with ZFS. It seems the best, useful, detailed information about it comes from user blogs. I think I need zfs for a project I am going to be working on. I will be setting up a new Solaris 10 X4500 server with some large number of SATA II disks. I can’t figure out if ZFS helps me meet one criteria. This will be a new home dir server for our masses. I need to be able to to do a simple disk -> disk backup of the everything. Two raidz pools? Or do I? is there enough fault tolerance for multiple disk failures? I see some potential issues with faulted spared and faulted pool members. I also need access to be controlled by our MS Active Directory. Is ZFS compatible or simple transparent? Hmmm..
You can also use RAIDZ2 which is double parity protection so 2 disks can fail in a RAID group. However RAIDZ2 is not available in S10 yet only in Solaris Express. You can also create many smaller RAIDZ groups in one bigger ZFS pool or use RAID-10.
When it comes to backup disk->disk – then see ‘zfs send’ and ‘zfs recv’ in manual for zfs.
For more documentation see http://opensolaris.org/os/community/zfs/
Who gives a hoot about ZFS new features if it can’t boot on a Mac Book Pro ??
One thing that I cannot see is whether ZFS has a capability similar to Microsoft’s VSS, where snapshots are application aware when applications are notified when a snap is desired and tell the infrastructre that they are able to be snapped in a consistent state. Database and application aware snapshots are more critical that being crash consistent, which is what most snapshots support.