We needed to build a network storage system to store a few hundred gigabytes of data. Our goals were pretty basic at first : reliability and serving the data to windows clients. So started a search for several fileserver systems - we looked basically at OpenFiler and FreeNAS. Both were good enough, and had a sizeable community behind them.
OpenFiler was looking a little more desirable because of the Linux base (FreeNAS has a FreeBSD base). However, I could not wrap my head around the snapshotting system. Moreover, the backup methodology was a bit too weird. Incremental backup was simply too difficult (Amanda?).
Eventually, my search led me to ZFS - it is a filesystem built to simplify enterprise-scale data management. It handles everything - striping, mirroring, backups, incremental-backups, etc. at the filesystem level. It is free and opensource - developed by the wizards at Sun. Moreover, companies like Netapp (whose enterprise systems cost hundreds of thousands of dollars) have been embroiled with Sun over intellectual property rights on ZFS - I dont want to go into which one was first, just that they are both technically great.
What’s the catch? ZFS has an opensource license which makes it incompatible with Linux’s GPL2.
Enter the ZFS-FUSE project - a way to get around the GPL2 by linking the ZFS filesystem with a user-mode driver (rather than the kernel). Both FreeNAS (based on FreeBSD) and Nexenta (a distribution built on OpenSolaris) have ZFS included - but I really preferred using a Linux based distribution.
ZFS-Fuse can be installed as part of the Ubuntu package manager, however it is a very old release (0.5.0). Instead, I checked it out from the repository and built it - well actually I redid the build system and contributed that patchset back to zfs-fuse.
how ZFS-Fuse works
- Create a pool using sudo zpool create tank /dev/sdb1 . I like to think of a zfs pool as a one-to-one relationship between usable free space and it’s underlying hardware. At this point, you can create zpools of various types - like RAIDZ, etc. ZFS-Fuse will create a directory called /tank in your root.
- Going forward you can seamlessly grow the pool using sudo zpool add tank /dev/sdb2 . No need to use third-party software to resize disks, etc.
- When you attach a blank disk to your compute, the raw disk (without any partitions) is addressable using monikers like /dev/sda /dev/sdb , etc. while partitions (after creation) are addressable using /dev/sda1 /dev/sdb1 , etc. It is advisable to create one partition (using gparted), since if you dont, the zfs volumes are not portable across linux and solaris - this is because zfs implicitly creates a partition on Solaris, while Linux doesnt.
- Once a pool is created/managed, you never need to bother about the hardware again.
- Create a filesystem using zfs create tank/filesystem1 . I like to think of a filesystem as a logical unit of management (as opposed to a pool, which has a hardware relationship) - in the same pool, I can have a read-only filesystem, an encrypted filesystem, etc.
- To set the filesystem as readonly, do sudo zfs set readonly=yes tank/filesystem1
- Delete a filesystem (or pool) by sudo zfs destroy tank/filesystem (or sudo zpool destroy tank). You may have to use the -r flag to destroy the filesystem and all its contained snapshots.
In ZFS snapshots are instantaneous using sudo zfs snapshot tank/fileserver@snapshot1 .
ZFS is a copy-on-write filesystem - which means that if you have snapshotted something and you overwrite the file, it will not actually overwrite the file, it makes copies (however it is smart enough not to copy the entire file, only at the block level).
In Solaris, you can navigate your snapshots by going in a special .zfs directory and having all snapshots available as aseparate directory. In ZFS-Fuse on Linux, this facility is not available (yet). However one can create a clone (which is also instantaneous and does not occupy extra space) by doing sudo zfs clone tank/fileserver1@snapshot1 tank/restore and you will instantaneously get a new directory /tank/restore which has the contents of your snapshot1.
To backup, you need to create a snapshot - this allows your users to keep working and for you to know that you are backing up a known state.
- Full backup - sudo zfs send tank/fileserver1@snapshot2 > /media/externaldisk/backup.snapshot2
- Incremental backup - sudo zfs send -i tank/fileserver1@sourcesnapshot tank/fileserver1@newsnapshot > /media/externalsdisk/backup.incremental_wrt_newsnapshot
- Full restore - sudo zfs receive tank/newfileserver < /media/externaldisk/backup.snapshot
- Incremental restore - sudo zfs receive tank/newfileserver < /media/externaldisk/backup.snapshot_wrt_original
_(_you may have to use the -f flag to force, since even doing an ls seems to make zfs think that the restore has been modified)
After you have added a new hard-disk and created a new partition on it, you can mirror your pool by doing sudo zpool attach tank /dev/existingpartition /dev/newpartition
One of the things that happened to me was that actually a 320 GB hdd doesnt have 320gb exactly - the addressable disk space varies according to model and/or manufacturer. A mirror can only be added if its size is greater than or equal to existing pool. What happened to me was that the new disk I got actually had a few hundred mb lesser than the original - which meant I got an error of disk is not big enough. What I then did was to create a new pool on the newer disk - copy over everything, destroy the previous pool and add that as the mirror.
My current el-cheapo setup
Two 320GB SATA hard disk as a mirror and a 80GB IDE harddisk as the OS. I snapshot everyday for a week and then cleanup the next week. We do a full backup every monday and incremental backup every weekday. Your paranoia level may differ. If anybody screws up and wants to recover an older version of the fileserver - I simply clone a previous snapshot (as detailed above) and make it available.
I share this pool using samba. You may need to set the startup sequence (Linux distribution dependent) to start samba after zfs-fuse has already started up.