Eric Schrock's Blog

Fishworks Storage Configuration

November 12, 2008

Since our initial product was going to be a NAS appliance, we knew early on that
storage configuration would be a critical part of the initial Fishworks experience. Thanks to the power
of ZFS storage pools, we have the ability to present a radically simplified interface,
where the storage “just works” and the administrator doesn’t need to worry about
choosing RAID stripe widths or statically provisioning volumes. The first decision
was to create a single storage pool (or really one per head in a cluster)1,
which means that the administrator only needs to make this decision once, and
doesn’t have to worry about it every time they create a filesystem or LUN.

Within a storage pool, we didn’t want the user to be in charge of making
decisions about RAID stripe widths, hot spares, or allocation of devices. This
was primarily to avoid this complexity, but also represents the fact that we
(as designers of the system) know more about its characteristics than you.
RAID stripe width affects performance in ways that are not immediately
obvious. Allowing for JBOD failure requires careful selection of stripe widths.
Allocation of devices can take into account environmental factors (balancing
HBAs, fan groups, backplance distribution) that are unknown to the user.
To make this easy for the user, we pick several different profiles that
define parameters that are then applied to the current configuration to figure
out how the ZFS pool should be laid out.

Before selecting a profile, we ask the user to verify the storage that
they want to configure. On a standalone system, this is just a check
to make sure nothing is broken. If there is a broken or missing disk, we
don’t let you proceed without explicit confirmation. The reason we do
this is that once the storage pool is configured, there is no way to add those
disks to the pool without changing the RAS and performance characteristics
you specified during configuration. On a 7410 with multiple JBODs, this verification step is slightly
more complicated, as we allow adding of whole or half JBODs. This step is
where you can choose to allocate half or all of
the JBOD to a pool, allowing you to split storage in a cluster or reserve
unused storage for future clustering options.

Fundamentally, the choice of redundancy is a business decision. There is
a set of tradeoffs that express your tolerance of risk and relative cost. As
Jarod told us very early on in the project: “fast, cheap, or reliable – pick two.”
We took this to heart, and our profiles are displayed in a table with
qualitative ratings on performance, capacity, and availability. To further
help make a decision, we provide a human-readable description of the
layout, as well as a pie chart showing the way raw storage will be used
(data, parity, spares, or reserved). The last profile parameter is called
“NSPF,” for “no single point of failure.” If you are on a 7410 with multiple
JBODs, some profiles can be applied across JBODs such that the loss
of any one JBOD cannot cause data loss2. This often forces arbitrary stripe
widths (with 6 JBODs your only choice is 10+2) and can result in
less capacity, but with superior RAS characteristics.

This configuration takes just two quick steps, and for the common case
(where all the hardware is working and the user wants double parity RAID),
it just requires clicking on the “DONE” button twice. We also support adding
additional storage (on the 7410), as well as unconfiguring and importing
storage. I’ll leave a complete description of the storage configuration
screen for a future entry.

[1] A common question we get is “why allow only one storage pool?” The actual
implementation clearly allows it (as in the failed over active-active cluster), so it’s
purely an issue of complexity. There is never a reason to create multiple
pools that share the same redundancy profile – this provides no additional value
at the cost of significant complexity. We do acknowledge that mirroring
and RAID-Z provide different performance characteristics, but we hope that with the
ability to turn on and off readzilla and (eventually) logzilla usage on a per-share basis,
this will be less of an issue. In the future, you may see support for multiple pools, but
only in a limited fashion (i.e. enforcing different redundancy profiles).

[2] It’s worth noting that all supported configurations of the 7410 have
multiple paths to all JBODs across multiple HBAs. So even without NSPF, we
have the ability to survive HBA, cable, and JBOD controller failure.

3 Responses

  1. Eric,
    Did you guys disable Nagle’s algorithm on the Vmware USS simulator? iSCSI performance isn’t that great with the standard target as is, but should be a better. COMSTAR iSCSI performs much better on OpenSolaris…I hear you are putting this in.
    If you’re not the one to address this, please forward it on.

  2. Does the system support in-place drive capacity upgrades? Playing around with the vmware image and making the disks bigger seems to work, but all the space gets allocated parity… So far the system looks great regardless =p

  3. Rad –
    No, we have not made any significant performance changes to iSCSI. We too are looking forward to COMSTAR integration.
    Erik –
    Yes, in-place upgrades should work. You must upgrade all the drives in a stripe or mirror for the increase in capacity to be reflected.

Recent Posts

April 21, 2013
February 28, 2013
August 14, 2012
July 28, 2012