
Understanding how much space a ZFS dataset is using is not as straightforward as it first seems, but let's dive in and decipher how to calculate used space on ZFS.
The first thing to understand is how a copy-on-write filesystem with snapshots works. Let's use an example with timestamps to illustrate this:
| Time | Action | Used Space | 
|---|---|---|
| 10:00 | the dataset is created and is empty | 0MB | 
| 10:01 | a 100MB file, file1.txt, is added to the dataset | 100MB | 
| 10:02 | a snapshot, snap1, is created | 100MB | 
| 10:03 | 10MB of the contents of file1.txtis changed | 110MB | 
| 10:04 | a snapshot, snap2, is created | 110MB | 
| 10:05 | another 100MB file, file2.txt, is added to the dataset | 210MB | 
| 10:06 | snap1is destroyed | 200MB | 
| 10:07 | file1.txtis deleted | 200MB | 
| 10:08 | snap2is deleted | 100MB | 
This probably seems straightforward until 10:03, when you edit file1.txt and change 10MB of the contents. At this point, snap1 contains the old 10MB of data in the file and the live dataset contains the new 10MB. Then when snap2 is created, it contains the new 10MB. Moreover, both snap1 and snap2 contain the remaining 90MB in the file that is unchanged; this is shared between both snapshots. At 10:06 when snap1 is destroyed, all the data that was only referenced by snap1 is removed (the original 10MB of file1.txt) and that space is freed up. Similarly at 10:07 when file1.txt is deleted, the used space remains the same because file1.txt still exists in snap2.
A useful way to think about this is to treat snapshots like hardlinks in a filesystem - as long as at least one hardlink exists to a file, it continues to exist.
Once we have a more complex set of files, and a large number of snapshots, it can be hard to determine which files are referenced by each snapshot. Fortunately, ZFS properties for each dataset make this easier to decipher. You can use zfs get all name/of/dataset to see all of them, but let's examine a few of them below.
First, let's create a zpool, enable compression, create a dataset, and cd into it:
 1root@demo:~# dd if=/dev/zero of=/root/pool.img bs=1M count=2000  
 22000+0 records in  
 32000+0 records out  
 42097152000 bytes (2.1 GB, 2.0 GiB) copied, 27.8838 s, 75.2 MB/s  
 5root@demo:~# zpool create tank /root/pool.img  
 6root@demo:~# zfs set compression=on tank  
 7root@demo:~# zfs create tank/example  
 8root@demo:~# zfs list -r tank  
 9NAME           USED  AVAIL     REFER  MOUNTPOINT  
10tank           146K  1.75G       24K  /tank  
11tank/example    24K  1.75G       24K  /tank/example  
12root@demo:~# zpool list tank  
13NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT  
14tank  1.88G   198K  1.87G        -         -     0%     0%  1.00x    ONLINE  -  
15root@demo:~# cd /tank/example
Useful Attributes: compression and compressratio
If you're following these best practices, you will have enabled compression on your zpool. This means that the actual amount of space used by a file is probably smaller than the filesize itself since the data will be stored compressed. You can see if compression is enabled and if so which algorithm is used by checking the compression property. Moreover, you can see how well the algorithm is doing at compressing your specific data using compressratio. Keep this in mind when looking at other ZFS properties below - unless otherwise noted they are telling you the compressed size of the data.
If data can't be compressed, then it will appear to be the size you expect it to be and compressratio will be equal to 1.00x. For example, a file with completely random content cannot be compressed:
 1root@demo:/tank/example# dd if=/dev/urandom of=file1.txt bs=1M count=100  
 2100+0 records in  
 3100+0 records out  
 4104857600 bytes (105 MB, 100 MiB) copied, 2.21311 s, 47.4 MB/s  
 5root@demo:/tank/example# ls -lh    
 6total 78M  
 7-rw-r--r-- 1 root root 100M Jul 18 03:00 file1.txt  
 8root@demo:/tank/example# zfs get logicalused tank/example  
 9NAME          PROPERTY     VALUE  SOURCE  
10tank/example  logicalused  100M   -  
11root@demo:/tank/example# zfs list -r tank
12NAME           USED  AVAIL     REFER  MOUNTPOINT  
13tank           100M  1.65G       24K  /tank  
14tank/example   100M  1.65G      100M  /tank/example  
15root@demo:/tank/example# zfs get compressratio tank/example  
16NAME          PROPERTY       VALUE  SOURCE  
17tank/example  compressratio  1.00x  -
Compare that to a newly-created file that can be compressed:
1root@demo:/tank/example# ls -lh file2.txt    
2-rw-r--r-- 1 root root 66M Jul 18 03:11 file2.txt  
3root@demo:/tank/example# zfs get compressratio tank/example  
4NAME          PROPERTY       VALUE  SOURCE  
5tank/example  compressratio  1.13x  -  
6root@demo:/tank/example# zfs list -r tank/example  
7NAME           USED  AVAIL     REFER  MOUNTPOINT  
8tank/example   147M  1.61G      147M  /tank/example
In this case, a 66MB file only takes up 47MB of space due to the compression.
Useful Attributes: used, usedbysnapshots, and usedbydataset
The used attribute represents the total amount of used space for a dataset. This is useful for getting a quick total but doesn't distinguish between data present in snapshots and live data actually present now in the dataset. This distinction is crucial when thinking about free space - let's say you have a 2GB zpool and want to store a 100MB tarball in it that is regenerated every hour. If you start taking snapshots of this dataset hourly, you'll completely fill up the 2GB zpool after 20 hours. Let's try this out and catch it before we run out of free space:
1root@demo:/tank/example# zfs list -t snapshot -r tank/example  
2NAME                 USED  AVAIL     REFER  MOUNTPOINT  
3tank/example@snap1   100M      -      100M  -  
4tank/example@snap2   100M      -      100M  -  
5tank/example@snap3   100M      -      100M  -  
6tank/example@snap4   100M      -      100M  -  
7tank/example@snap5   100M      -      100M  -
At this point, we've taken 5 snapshots and each contains 100MB of unique data. Moreover, there's a new 100MB file in the dataset that was created after snap5, so there's a total of 600MB used:
1root@demo:/tank/example# zfs list tank/example  
2NAME           USED  AVAIL     REFER  MOUNTPOINT  
3tank/example   600M  1.16G      100M  /tank/example
To make this a lot clearer, let's look at the dataset using -o space, which shows usedbysnapshots and usedbydataset alongside used:
1root@demo:/tank/example# zfs list -o space tank/example  
2NAME          AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  
3tank/example  1.16G   600M      500M    100M             0B         0B
Excellent! We can now see that 500MB of the 600MB used is all from files that no longer exist in the live dataset but only exist in snapshots. The actual live data is only 100MB. This allows you to adjust your snapshot policy to avoid filling up your zpool.
Conversely, you could have a dataset which has data that changes infrequently. In this case, usedbysnapshots will be small and usedbydataset will be large.
Useful Attribute: logicalused
If you use zfs send and zfs receive (or a tool like syncoid) to sync a copy of your dataset to another zpool, you might be surprised to find that used doesn't match; this may have to do with different settings (e.g. compression, recordsize, etc) between the datasets or perhaps a significant amount of fragmentation on the source zpool. In any event, it can be useful to compare the datasets using the logicalused property, which ignores factors like compression.
Deleting snapshots to free up space
Since multiple snapshots can point at the same data, it can be hard to determine how much space would be freed up by deleting a particular snapshot. Fortunately, you can simulate what would happen (and how much space would be freed) by running zfs destroy -nv:
1root@demo:/tank/example# zfs destroy -nv tank/example@snap1  
2would destroy tank/example@snap1  
3would reclaim 100M
Conclusion
While used space on ZFS isn't as simple as looking up a single value, ZFS properties for each dataset provide an illuminating view into where space is being used and how to free up space.
