ZFS should become the default for Linux distributions
I recently switched my work laptop over to Fedora Workstation. This is a big deal because the company I work at was, and still largely is, a Microsoft shop. Two years ago, this wouldn't have been a nice experience, but as more and more core IT services move into the cloud and as more and more applications become web apps, the choice of OS matters less and less.
I had experimented with OmniOS (a distribution of illumos) before, primarily because of its support for OpenZFS and Zones. I still have to sink more time into zones, but I was positively impressed with ZFS. Compared to the tools that both Windows and Linux provide, ZFS is a breeze to use. Even, and I would say, especially via the command line.
Windows Storage Spaces: not confidence inspiring
On Windows, you seem to have the choice between old school partitions and Storage Spaces. I'm using a Windows Storage Spaces pool on my gaming PC, and I'm not super happy with it. Yes, you can combine drives of different sizes, and you can instruct it to re-balance data across the drives. That's something that ZFS is not very good at. The problem is that when things go wrong, as they inevitably do with hard drives, I found Storage Spaces to be very opaque and uncooperative. I had a 4 disk pool with 3-way redundancy. That means that I should be able to lose one drive not just not lose any data, but also still have a functional pool. But when one of the drives failed, the entire pool refused to mount. No amount of me removing the faulty drive from the pool would let me mount it again. I didn't even get an error message, the pool would just immediately go offline again.
Linux LVM: manual labour
Linux LVM is fine, I guess. My interactions with it were typically limited to me expanding a partition whenever it would inevitably fill up with docker images. It promises a lot of the same flexibility as Windows Storage Spaces (thin provisioning), but like Storage Spaces and unlike ZFS it does not integrate with the layer above it: the file system.
Yeah, yeah, UNIX philosophy, "do one thing and do it well," I know. The problem is that when you divide the task (storage) into fiefdoms (RAID, LVM, file system), the whole becomes less than the sum of its parts. A task that, from the user's perspective should be simple (this directory needs more space) becomes a multistep puzzle. The user is being asked to play integration layer between the architectural layers on every. single. interaction. with the storage system.
ZFS: storage for a civilized age
I'm specifically talking about OpenZFS, is the open source fork of the version of ZFS that was originally invented at Sun Microsystems and still in use at Oracle. I'm mentioning this nugget of history because the two implementations, [Oracle] ZFS and OpenZFS are slowly drifting apart, and things you read in Oracle Linux documentation might not apply to OpenZFS.
Outside of Oracle, OpenZFS is the only ZFS. FreeBSD and OpenBSD both use OpenZFS. Even the open source Solaris fork, illumos, uses OpenZFS. This is interesting because ZFS was originally developed for Solaris. If you are feeling particularly adventurous, you can even run OpenZFS on Windows.
My own ZFS journey started with reading documentation. The man pages for ZFS and all the utilities around ZFS are very good (man pages). I opted for a more structured approach with the excellent book FreeBSD Mastery: ZFS by Michael W Lucas and Allan Jude. Don't be scared by the "FreeBSD" in the title. You can skip the few parts that are FreeBSD-specific. As OpenZFS is shared between FreeBSD and Linux, everything that book has to say about ZFS is transferable to Linux. What I found particularly useful is that the book provides examples scenarios of how ZFS' various features can be put to use.
ZFS uses the same layers of abstraction (virtual RAID devices, pools, filesystems) as LVM and Storage Spaces, but they are allowed to be aware of each other. That means that the file system doesn't rely on the crutch of a "partition" abstraction that needs to be sized in advance (even if thin-provisioned). As a user, I can still restrict how much space a file system can use (via quotas) but the (sensible) default is that your file systems just get to use the space that is available in the pool.
One thing I particularly like about ZFS administration is that it's built on a simple vocabulary. You have a hierarchy of datasets (filesystems) that have properties (key-value pairs). Properties can be things like the quota, which compression algorithm to use or where the filesystem should get mounted. Properties are inherited from parent datasets. That means that, for instance, a quota on a parent dataset will also apply to all child datasets. It just works the way you would expect it to work.
Moreover, I don't need to learn a new command for every task ("use mount for mounting, resize2fs for resizing, etc."), instead, I learn one concept (reading and writing properties) and apply it to all storage management tasks. For instance to change where a dataset is mounted, I set the mountpoint property. To change the quota, I set the quota property. And when I change the mountpoint property, I don't need to fiddle with a separate tool or configuration file to actually make that change in the mount path happen. The change gets applied immediately.
Another cool feature is that you can add your own custom properties to datasets. These could be simple notes for your future self, or they could be used by scripts. For instance, you could use a custom property to mark the datasets that you want your automated backup script to skip. Or provide the frequency with which you want each dataset backed up.
Why isn't ZFS the default?
So with all these cool features and the excellent usability of ZFS, why is it not the default for Linux distributions today? Why are Linux distributions for general purpose computer systems not installed onto and booted from ZFS by default? Well, there are some challenges.
ZFS cannot ever be merged into the Linux kernel because it's licensed under the CDDL, which is not compatible with the GPL. Ok, that's a bummer, but just because it cannot be part of the Linux kernel source tree, doesn't mean that Linux distributions cannot ship with ZFS.
Most popular Linux distributions already package OpenZFS in their package repositories. The Ubuntu installer already offers ZFS as an "advanced" "experimental" installation option, for instance. The Ubuntu installer also highlights one more challenge. Grub, the bootloader used by Ubuntu, only supports a small subset of ZFS features. To work around this, the Ubuntu installer creates two ZFS pools. A small boot pool, which is configured to not enable advanced ZFS features, and a larger pool for the main file system. Not a deal-breaker but rather inelegant.
A more sophisticated setup is the excellent ZFSBootMenu. It's a small bootloader that is entirely built around ZFS. It automatically enumerates all ZFS pools connected to the system and lets you choose which to boot. If things go wrong, it can alternatively boot an earlier snapshot of the system. My favourite feature is that it reads the kernel command line from a custom property of the root file system.
There are deployments, where ZFS doesn't make sense. ZFS is rather RAM hungry. The rule of thumb is that it wants 1GB of RAM for each TB of storage. With today's RAM prices, that's not a problem for workstations or servers, but it rules out ZFS for embedded, IoT, edge or budget cloud servers.
But outside of those smaller scale scenarios, I don't see any fundamental reason why ZFS could not become the default for Linux distributions. It's always been the default for Solaris and illumos. It's de-facto the default for FreeBSD. I don't have the answer. Is it the licensing? Is it the limited support in the Grub bootloader? Is it the belief that a combination of Btrfs and volume managers will eventually be good enough?
Fedora on ZFS
Personally, I'm very happy¹ with my Fedora Workstation on ZFS setup. It was fun and educational to set up. It also gives me peace of mind because I can easily make incremental backups of the entire system. I can heartily recommend the guide for installing Fedora Workstation 40 on ZFS with ZFSBootMenu. It includes instructions for various scenarios (with/without encryption, SATA/NVMe, etc.) and provides important background information for first-time users of ZFS.
¹The laptop has issues with power management and with my docking station. But that's not related to the file system.