Filesystem selection for SSDs

Solid-State System


The major Linux filesystems already offer pretty good support for solid-state drives, but there is still some scope for improvement.

By Marcel Hilzinger

claranatoli, 123RF

Linus Torvalds has a clear-cut opinion on "rules" when partitioning or writing to solid-state drives (SSDs): "If the flash vendor talks about `limits' in the wear leveling, and how you have to write certain ways, just start running away. Don't walk. Run away as fast as you can." This is how the Linux godfather of Linux put it in 2008 when talking about his own approach to (bad) SSDs at the Real World Technologies forum [1].

SSDs attempt to spread write access across the whole disk as far as possible (wear leveling), in contrast to conventional hard disks and filesystems, which always write the same data at the same place. First- or second-generation disks only survived around 100,000 write cycles because of a fairly poor design and insufficient capacity. For a permanently stressed 8GB disk, this would mean a theoretical shortest life expectancy of 115 days. The Linux community soon spread the word to avoid journaling filesystems on SSDs.

Never Say Die

Intel introduced its X25 series (Figure 1), and Linus was given one to test, to put an end to these worries. If you buy an SSD today, you do not need to do without a state-of-the-art filesystem - on the contrary.

Figure 1: State-of-the-art SSDs, such as Intel's 34nm technology Intel X25-M SATA model, can survive decades of normal use thanks to sophisticated wear-leveling algorithms.

The current crop of SSDs will last more or less a lifetime. Even with the most intensive write operations imaginable, the calculated life expectancy of a 64GB SSD is around 51 years, assuming the current average of 2 million cycles and a write speed of 80MBps [2]. The rule to avoid journaling filesystems on SSDs is a thing of the past; it just applies to first-generation Eee PCs and really cheap solid-state disks.

Writing technology has improved vastly since 2008; today's disks don't have to worry about when and where to write data. Filesystem optimizations thus often run the risk of counteracting the SSD's own write method (wear leveling) or are hardly used because of a lack of support by hard disk manufacturers - as is the case with ext4's ATA TRIM support (discussed below).

Ext4 filesystem developer Theodore Ts'o investigated write access on the ext2/3/4 filesystems and concluded that journaling only causes 10 percent more write accesses on average [3]. Add to this the new capabilities offered by ext4 and other recent filesystems that avoid writing data to disk unless absolutely necessary (delayed allocation). Thus, you can safely say that ext4 is the best and most mature filesystem for solid-state disks today (Table 1).

Incidentally, it is worthwhile, as far as performance is concerned, to mount all of your partitions with the noatime and nodiratime mount options. This avoids unnecessary write access on browsing the filesystem tree. Some distributions use norelatime as a default: This tells the kernel only to update the access time for files that have a more recent mtime or ctime - that is, files that really have changed. You can measure the performance gains here on any hardware.

Safe Side

If you are worried about the life expectancy of an older SSD, you can use ext4 without journaling. To do so, you need to create a new filesystem with the following command:

# mke2fs -t ext4 -O ^has_journal <i>/dev/sdXX<i>

Make sure you replaced /dev/sdXX with the name of the device file. Ext4 without journaling combines the speed of ext2 with the extended capabilities of current filesystems. Without a journal, you will need a filesystem check if your machine crashes, but this doesn't typically take too long: First-generation SSDs have a maximum capacity of 8 or 16GB and very high read speeds.

Optimization or Bust

Today's optimization attempts by filesystem developers aren't targeted mainly at extending the life of an SSD, but at making the disk faster or preventing wear. The latter is likely to occur on most solid-state disks for which all of the disk's blocks have been occupied once.

The SSD specifications envisage an ATA TRIM to refresh the disk; this tells the disk which unused blocks it can reset and reuse. Although Ted Ts'o added a matching TRIM function to the ext4 filesystem some while back [4], it simply acts as a messaging function for the kernel block layer because of a lack of kernel support.

The latest Linux distributions thus lack TRIM support, in contrast to Windows 7, although it will be introduced with the future 2.6.33 kernel release. Users who are unfazed by the idea of experimenting can use version 2.6.33-rc4 of the kernel, which already implements TRIM support. SSD trimming only works if the disk has a matching firmware version, and this is currently only true of a couple of Intel and OCZ models.

Besides ext4 maintainer Ts'o, the developers of the Btrfs filesystem are also working on a special SSD mode [5]. The SSD mount option was introduced some time back. It forces the filesystem to write to unused space wherever possible. As of kernel 2.6.31, mount automatically enables the matching option when Btrfs detects an SSD.

In addition to this quasi-standard option are also the -o ssd_spread and -o discard flags. According to the documentation [6], ssd_spread works faster on cheaper SSD hardware because it attempts to find free space. To get Btrfs to release unused blocks for trimming, you can use discard. This process can have a negative effect on many disks and is thus disabled by default.

Brute Force

It might take a few months for the kernel and filesystems to introduce across-the-board support for SSDs, which is what prompted hdparm developer Mark Lord to add a wiper.sh [7] script to the current version of the hard disk tool. What the script does is search the filesystem for free blocks and report them to the SSD firmware. This tells the disk that it can use the blocks for wear leveling or a general memory cleanup. This feature was introduced with version 9.27 of hdparm (October 2009 release), and most distributions include packages for it. Having said this, wiper.sh only works with more expensive SSD models, such as the OCZ Vortex series and the Intel X25; our lab device refused to cooperate (Figure 2).

Figure 2: The wiper script will not work with SSDs that do not have matching TRIM support.

You can use the wiper script with ext4 and XFS in normal mode, whereas ext2/3 and ReiserFS have to mount the partition read-only. The readme file points out that it is not a good idea to use wiper.sh with mounted disks. Additionally, the feature is still classified as experimental - make sure you create a backup of all your data on a second disk before you start. DiskTRIM (Figure 3) gives Ubuntu users a ready-to-run Debian package with a graphical interface for wiper.sh [8].

Figure 3: Despite the GUI, DiskTRIM is not a tool designed for newcomers.

Because of some Btrfs quirks, the wiper script is not suitable for the Btrfs filesystem. The Btrfs developers rely instead on the mount options described earlier to optimize write access to SSDs. Besides optimizing the existing filesystem, some initial work has been done on completely new filesystems optimized for flash memory - for example, NILFS2 [9] or LogFS [10]. However, their performance is not in the same league as ext4 or Btrfs.

If you buy a state-of-the-art SSD from a brand name manufacturer like Kingston/Intel, OCZ, or Samsung, you don't need to worry about disk lifetime. Street prices for 64GB drives are around US$ 135 for most manufacturers. In a multiple-day stress test, a 32GB 2.5 warp disk by Patriot Memory (street price around US$ 125) showed no sign of wear, although this can be expected sooner or later in the case of highly I/O-intensive applications.

Right now, the only way to avoid stressing SSDs on Linux is with the use of the wiper.sh script extension for hdparm. Kernel and filesystem developers are all working hard on integrating tools to resolve issues with SSD support on Linux in the very near future.

INFO
[1] Linus on SSDs: http://www.realworldtech.com/forums/index.cfm?action=detail&id=93409&threadid=92678&roomid=2
[2] SSD myths:http://www.storagesearch.com/ssdmyths-endurance.html
[3] SSDs and journaling:http://thunk.org/tytso/blog/?p=328
[4] TRIM support for ext4:http://www.linux-mag.com/id/7272
[5] Btrfs discard function:http://btrfs.wiki.kernel.org/index.php/Changelog#v2.6.32_.28December_2009.29
[6] Btrfs SSD optimizations: http://btrfs.wiki.kernel.org/index.php/FAQ#Is_Btrfs_optimized_for_SSD.3F
[7] hdparm: http://sourceforge.net/projects/hdparm/
[8] DiskTRIM: https://sourceforge.net/projects/disktrim/
[9] NILFS2 http://www.nilfs.org/en/
[10] LogFS: http://logfs.org/logfs/