When ZFS was first developed, the engineering team had the notion that pooled storage would make filesystems cheap and plentiful, and we’d move away from the days of /export1, /export2, ad infinitum. From the ZFS perspective, they are cheap. It’s very easy to create dozens or hundreds of filesystems, each which functions as an administrative control point for various properties. However, we quickly found that other parts of the system start to break down once you get past 1,000 or 10,000 filesystems. Mounting and sharing filesystems takes longer, browsing datasets takes longer, and managing automount maps (for those without NFSv4 mirror mounts) quickly spirals out of control.
For most users this isn’t a problem – a few hundred filesystems is more than enough to manage disparate projects and groups on a single machine. There was one class of users, however, where a few hundred filesystems wasn’t enough. These users were university or other home directory environments with 20,000 or more users, each which needed to have a quota to guarantee that they couldn’t run amok on the system. The traditional ZFS solution, creating a filesystem for each user and assigning a quota, didn’t scale. After thinking about it for a while, Matt developed a fairly simple architecture to provide this functionality without introducing pathological complexity into the bowels of ZFS. In build 114 of Solaris Nevada, he pushed the following:
PSARC 2009/204 ZFS user/group quotas & space accounting
This provides full support for user and group quotas on ZFS, as well as the ability to track usage on a per-user or per-group basis within a dataset.
This was later integrated into the 2009.Q3 software release, with an additional UI layer. From the ‘general’ tab of a share, you can query usage and set quotas for individual users or groups quickly. The CLI allows for automated batch operations. Requesting a single user or group is significantly faster than requesting all the current usage, but you an also get a list of the current usage for a project or share. With integrated identity management, users and groups can be specified either by UNIX username or Windows name.
There are some significant differences between user and group quotas and traditional ZFS quotas. The following is an excerpt from the on-line documentation on the subject:
- User and group quotas can only be applied to filesystems.
- User and group quotas are implemented using delayed enforcement. This means that users will be able to exceed their quota for a short period of time before data is written to disk. Once the data has been pushed to disk, the user will receive an error on new writes, just as with the filesystem-level quota case.
- User and group quotas are always enforced against referenced data. This means that snapshots do not affect any quotas, and a clone of a snapshot will consume the same amount of effective quota, even though the underlying blocks are shared.
- User and group reservations are not supported.
- User and group quotas, unlike data quotas, are stored with the regular filesystem data. This means that if the filesystem is out of space, you will not be able to make changes to user and group quotas. You must first make additional space available before modifying user and group quotas.
- User and group quotas are sent as part of any remote replication. It is up to the administrator to ensure that the name service environments are identical on the source and destination.
- NDMP backup and restore of an entire share will include any user or group quotas. Restores into an existing share will not affect any current quotas. (There is currently a bug preventing this from working in the initial release, which will be fixed in a subsequent minor release.)
This feature will hopefully allow the Sun Storage 7000 series to function in environments where it was previously impractical to do so. Of course, the real person to thank is Matt and the ZFS team – it was a very small amount of work to provide an interface on top of the underlying ZFS infrastructure.