Wednesday, December 28, 2011

Can a guy get a little TRIM?


I don't want to try to convince you to use an SSD. I'll just assume you already have one. I'm not going to tell you about all the myriad ways you can improve your performance. I'll just tell you about one, TRIM, and why almost everyone should be using it.

Writing to flash is very different from writing to a traditional HDD. You must first erase the flash cells before writing to them. You'll probably remember this from days of flashing your BIOS. You must erase an entire block at a time (usually 256k), but can write to individual pages (usually 4k) within the block. The flash cells also have a limited lifetime of erase cycles. If certain parts of a filesystem were written to more often than others, you will burn out those flash cells faster and then the whole drive will be unusable. The way SSD's get around this is a process called wear leveling. Instead of reading/erasing/writing to the same block, it writes to free pages in the same or other blocks. This is known as dynamic wear levelling. The drive keeps a mapping of Logical Block Addresses (LBA's) to physical pages in flash and swaps the used page for an unused one. There are also extra "hidden" blocks that the OS is unaware of that the drive can use for this purpose. Drives will also move infrequenty written blocks so that they can get more use. This is know as static wear levelling. When a page is moved the old location is marked stale. Then a process called garbage collection is used to move the good pages to a new block and then erase the block so that there will be free blocks when new data needs to be written. This may also be done in the background during idle time. All of this leads to something called write amplification. This is basically when you want to write some small quantity of data, say 4k, to the drive, but the drive ends up writing much more because it needs to do wear levelling and garbage collection. Obviously this causes reduced performance and more erase/write cycles and lowers the drives longevity.

So the problem here is that the drive only knows that a page is stale once the OS tells it to write to an existing page and it performs wear levelling and places that page somewhere else. Normally when the OS deletes data on a disk it just updates a map somewhere saying an inode is no longer in use. What TRIM does is tell the SSD that the page is now stale much earlier so it can make better choices when wear levelling and it won't copy data around with garbage collection that the OS doesn't care about. This will lower write amplification and increase performance and drive longevity. This works especially well for cases where data is read and over written frequently, like databases.

At the OS level TRIM is usually referred to by a more generic term 'discard'. This is because TRIM is specific to the ATA command set, but discard support can be used for other types of devices. The way to enable this in Linux is by simply adding 'discard' to the mount options for the filesystem. As of RHEL/CentOS 6.1 only Ext4 and LVM have support for discard, but in the kernel.org mainline kernel most FS's have it. Support for other OS's varies. If you decide to try this out, I suggest you benchmark before and after and also make sure you have the lastest firmware for your SSD.

Disclaimer: TRIM is not a magical wand you can wave. It will not fix all your problems nor will it always make the drive perform better. Some newer drives that use compression won't see any benefit and some firmware implementations may do "the wrong thing" with TRIM information, or perhaps nothing at all. Also all your layers need to be TRIM aware. That includes LVM and RAID.

Thursday, December 15, 2011

What's in a name?

I'm starting this blog to share some of my professional experiences with the world. I hope some people find it insightful, but if not I needed a break from real work anyway. I'll keep this post short, but I wanted to advise you to take anything I write with a grain of salt, because it's all low sodium.