tmpfs considered harmful

To clarify, this is about using tmpfs for /tmp being harmful. Specialized use of tmpfs in other situations is fine.

Fedora (beginning with 18) has changed /tmp so it is now a separate tmpfs partition. I’ve asked FESCO to consider reverting this.

tmpfs is usually a bad idea.

Computer storage is hierarchical for performance reasons: CPU registers are ultra-fast but very few in number. Cache is faster than main memory. Disk drives are slow, but huge and persistent.

These are unavoidable facts, caused by the design of electronics. Every time we move down a step in this hierarchy, it introduces complexity. One of the hardest parts of writing a compiler is register allocation. Complex NP-complete algorithms have to be written, and only the most expert programmers and theorists tackle this. (For JIT allocation it’s still very much an open research area). Countless thousands of hours have been spent by kernel developers to optimize cache performance. When we have to decide what to keep in memory and what to spill to disk, complex paging algorithms are designed and tuned ad nauseam.

tmpfs introduces an entirely artificial and unnecessary step that all programmers and users have to worry about. It’s limited to at most the available memory + swap space (although by default to half of RAM). Swap is usually stored on a separate partition that cannot expand, so you can run out of swap (and hence /tmp), still have plenty of free disk space, and your machine will crash.

Everyone must now be careful never to store a file in /tmp that might grow large, nor too many files at the same time. Every last little utility must be checked, including ones that aren’t part of your distro. Every user must be “re-educated” not to use /tmp for temporary files.

Everyone must worry about whether their files need to survive a reboot. (For example the default mutt configuration stores newly drafted emails in /tmp, so you’d better hope your machine doesn’t suffer a power failure while you’re writing. Or that you don’t have to compose an email larger than half available memory.)

All of this is quite unnecessary. We already have a memory-backed storage system that transparently spills to the filesystem. It’s called … the filesystem.

The case for performance has not been made either, but even if we assume that tmpfs is vastly better than the filesystem, it’s better to fix the filesystem to make it faster than to spend any more time on tmpfs. Fixes to the filesystem benefit everyone.

It’s good that Debian reverted this change, and Fedora should too.

About these ads

47 Comments

Filed under Uncategorized

47 responses to “tmpfs considered harmful

  1. roaima

    Well thought out arguments. Good on you.
    Chris

  2. Ugh. I’m not a fan of this either. Leave it up to the users to make this adjustment during install time or afterwards if they really want it.

  3. Jim Williams

    “Everyone must worry about whether their files need to survive a reboot.”

    This is not new. For many, many years the semantics of /tmp and /var/tmp have been that the contents of the former can not be expected to survive a reboot, while the later is required to. Programs that hope to recover data after a crash *must* store that data in /var/tmp, not /tmp. /var/tmp is also preferred for larger temporary files. /var is expected to vary in size quite a bit, thus the name. /tmp is for small things that are only needed for a short time.

    Having said that, I do have reservations about using tmpfs for /tmp. By default on one system I was using, /tmp was a tmpfs that was mounted noexec, which caused the Nvidia driver installation program to fail, since it unpacks the install script into /tmp. That’s probably Nvidia’s fault, but I’m not aware of any formal standard that specifies if /tmp and/or /var/tmp should or should not be mounted noexec. I should look that up.

    • vidarh2

      Consider this: If /tmp is regularly whether by system or user action, that means anyone can drop files in there to replace files that were recently wiped, regardless of their relative user permissions, given that everyone has write access there.

      If various applications start putting executables in /tmp and later executing them, you’re significantly increasing the odds of someone accidentally creating a privilege escalation hole (by creating an executable there and later executing what might by then be an entirely different executable).

      For a case like the Nvidia driver, the time gap might be small enough to be really hard and unlikely to be exploitable, but it’s still a bad idea to execute anything out of a directory that is world-writable.

      • Geoffrey Thomas

        That is easy to work around: create a directory in /tmp mode 700 and chdir into it (or open it and use openat, or whatever). Then you’re not redoing the directory lookup.

        In any case, the discussion at hand is about cleaning /tmp at reboot (which was already being done), not about cleaning it while the system is running, which is its own challenge.

  4. bitsmith

    I’m confused: is this about /tmp, swap area, or /dev/shm?

  5. I’m not sure the size argument makes sense. I don’t think /tmp was ever intended to store large files, and on most of the machines I manage you will fill it with a couple gigabytes at most, often much less. On most current desktop machines tmpfs will allow much more than this.
    I’m quite convinced the choice to store /tmp on tmpfs should be a local decision, though, best left to the sysadmin and not to the distribution. Maybe add a tmpfs option in the filesystem dialog in anaconda would be a better solution?

  6. Tim

    I disagree. With increases in RAM capacity and the ever-pressing need for security (combined with the proliferation of SSDs), I think this is a good idea.

    For the programs that use /tmp to store files that they expect to persist through a reboot, that program is broken (Mutt in this case). Programs should only expect to store files in /tmp for the life of their process.

    I think your arguments are dated. Perhaps using half the RAM for tmpfs is too much though. That is a harder determination to set.

    • rich

      Used virtualization much? RAM is the single pressing issue with virtual machines.

      • vidarh2

        I disagree. I manage clusters with hundreds of VMs. Getting servers with plenty (hundreds of GB) or RAM is cheap. The problem we have with consolidating our processing further is consistently *disk IO*. I spend a lot of time tuning our setup to use *more* memory to avoid hitting disk as often, because scaling up the memory turns out to be vastly cheaper for us than scaling up disk IO capacity.

    • Level 4 Admin

      I have to agree with Tim, there are times when this is a good way of limiting disk I/O read writes by floating everything into RAM. We operate 500 heavily used servers, lots of disk space and lots of RAM but huge files constantly going in and out. By caching /tmp into RAM we lower server loads by 200-300% and read/writes are more efficient like so:
      # Move /tmp to RAM
      tmpfs /tmp tmpfs defaults,nosuid 0 0

      Absolute life saver. Would only recommend this when a nice chunk of RAM is available. Along with data-writeback gives us much better performance then without. So is tmpfs bad? Only if you know nothing about being a sys admin.

  7. The main reason for tmp as tmpfs is to lower usage of SSD disks . For example when you are watching something at youtube the stream is saved in /tmp. It is wasteful to spoil those preciaus SSD writes.

    • Absolutely not. tmpfs (and tmpfs on /tmp) existed _long_ before modern SSDs (where ‘modern’ would be ‘native to IDE or SATA, not by way of a CF adapter).

      tmpfs’s chief value is its ephemeral nature; data _can’t_ persist across a reboot…And that’s important!

      If you’re looking for ways to reduce writes to an SSD, I can think of several. First, look at your mount options. Reduce the frequency of journal commits. Make sure things are configured for write-back, rather than write-through.

      All that said, I’ve been pondering the creation of a dummy filesystem which starts off empty when mounted and, at mount time, doesn’t care about the state of things in its area of the disk. While the system is running, data is stored in the page cache (as it is with tmpfs), and iff the kernel wants to evict it, the data is flushed to disk. This allows for an ephemeral filesystem bounded explicitly by normal filesystem semantics. (You could even use a once-per-mount key to encrypt the data, so anything left on disk would be meaningless after a reboot.)

  8. I am surprised that nobody in the comments and in the ticket compared the failure modes when /tmp becomes full. /tmp on disk: applications (including unrelated ones if /tmp is not on its own partition) are faced with “disk full” errors, and that’s all. The admin can easily fix the problem. /tmp on tmpfs: if more than half of RAM is used by applications, then (assuming no swap for simplicity) the system becomes very sluggish due to memory pressure (up to the point that it is impossible to login on a VT), then OOM killer kicks in and kills applications. Of course, the first failure mode sucks less.

    As probably all of the participants of the discussion realize, there are two main use cases for temporary files: small things to pass between the programs, and big things that are written to files to be reused by the same progrem specifically because they may or may not fit in RAM (i.e. if the programmer knew that they fit, then he’d just malloc() a buffer). Then, as it was already said, /tmp on tmpfs changes semantics of /tmp: it is no longer safe to store bigger-than-ram files there. However, a golden rule of API design is that if you change the well-established (even unintentional) semantics, you must rename the interface. So indeed, please invent a different name for a RAM-backed place to store small temporary files for IPC. Oh, wait, we already have /run/user/$USER just for that.

    • vidarh2

      The semantics of /tmp have never guaranteed that you can store bigger-than-ram files there. If you use /tmp that way, you’re begging for surprises.

  9. On smaller systems even /var can be symlinked to a tmpfs /tmp so a spinning hard drive can be powered down when not in use. This is a legitimate use of /tmp on tmpfs even for laptops.

  10. My funtoo install uses tmpfs for /run which contains udev, lock and openrc stuff. It’s very small. Is this also a mistake?

    • Geoffrey Thomas

      No, /run was defined to be a tmpfs, and is also not publicly writable, unlike /tmp. The argumenta about /tmp not being a tmpfs (which are not that great) definitely don’t apply to /run, which was never _not_ a tmpfs, and which users don’t write to.

  11. Michal

    I get tmpfs mounted on /tmp quite often when my disk space gets exhausetd. /tmp is supposed to be small separate partition which is not the case on the problematic system. On a single user system something like half a gigabyte should suffice for /tmp. /tmp typically holds a few tens of megabytes of junk.

    The failure mode of full /tmp is that nothing works so not particularly awesome. If you get OOM by using half of your ram worth of VM for /tmp you have other problems than programs writing to /tmp.

    Software storing huge data in /tmp is not going to work with either tmpfs or the recommended small separate disk partition.

    Now this tmpfs thing should be tunable. The partition tool could offer some way to allocate tmpfs for tmp, set the size, and even adjust swap requirements if deemed necessary. Or allocate a plain partition as usual.

    But if a default has to be picked tmpfs is as good as any other. It has its advantages and disadvantages as asny other.

    • > /tmp typically holds a few tens of megabytes of junk.

      It is not relevant what /tmp _typically_ holds. It is relevant that there was no _standards-mandated_ prohibition to store bigger-than-RAM temporary files there (in FHS there was only prohibition to store files that are supposed to survive a reboot), and now it is introduced.

      • And before someone invokes the “ability to store bigger-than-ram files was never guaranteed by the standards” argument: in fact, I am not sure that the ability to store even a single byte is guaranteed. So indeed, my reference to the standards was a stupid double-edged sword :(

  12. lol

    I was using “qemu -snapshot” for a vm which had a swap file. “-snapshot” created a qcow2 overlay file in /tmp. when the vm did some swapout the tmpfs exploded. I can’t file a bug against qemu because it’s not its. Shame.

    • That certainly is a bug with qemu. At the very least, it should allow you to configure where those overlay files go. And as noted elsewhere, the place for large temporary files (like that) is /var/tmp, not /tmp.

  13. OC

    This is common on Solaris – /tmp is in memory & yes I’ve seen it cause real problems for people who don’t realise that they are consuming memory by leaving their stuff in /tmp.

    As already stated on many systems /tmp is cleared out on start up whereas /var/tmp persists.

  14. H. Peter Anvin

    /tmp is not memory. /tmp is swap. It is a filesystem, which has the advantage of never having to worry about keeping the on-disk data structures consistent.

    There has been a culture in Linux to disregard the recommendations to make sure the system has enough swap, which hurts you more if you are using tmpfs. This is the real problem (and, some people claim, poor replacement algorithms, but if so, *those* should be fixed… there is absolutely no reason why “regular” filesystems should have different policies than anonymous swap, which is include tmpfs; it’s all just write-out.)

    I have seen real-life applications been sped up by as much as 30 times by using /tmp as tmpfs, so this is a very real improvement.

  15. lzap

    +1 And the ssd argument is weak. Linux will eventually write files to the swap, even if os has enough memory.

  16. Tony L.

    One thing to remember is that a “standard” if you will that I’ve seen used a lot is setting of TMPDIR env variable. Most well-behaved applications use this variable when determine where to store temporary files. Our batch applications create huge temporary files (10s of GB) so we had to move away from the standard /tmp (tmpfs or partition) a long time ago. The batch applications and our auxiliary tools (i.e. perl) use TMPDIR and it works quite well. Our service provider balked at us having a huge /tmp area (tmpfs or otherwise) as well, so using TMPDIR to redefine where temporary files go has been a win-win for us.

  17. I suspect you didn’t really mean “NP-Complete” up there, did you? If you did, I will respond with why I raise this point.

    • rich

      Is graph colouring not NP-complete?

      • Mea culpa, you do know what you are talking about :-) (just get tired of reading blog posts where people mention NP-this-or-that/Turing-blah/etc without necessarily understanding the matter; you clearly know more than me: I did not even know/remember that graph colouring is used in compiler design)

  18. dude

    Lot of noise, very little contents.

    What’s that comment about NP complexity? All algorithms you find in real-life programs are NP complete, so why do you mention that? Do you know what NP complete actually means?

    tmpfs has an enforced size limit (via a mount option) that is much smaller than the available RAM. So you will hit that limit way before you have to think about OOM or your swap space. So your fears about OOM are really pointless.

    You can optimize file systems as much as you like. The quintessential difference between tmpfs and a file system is simply that tmpfs relieves the kernel from making sure the data ever hits the disk, while a real file system must guarantee that. And that has drastic implications on performance, and on power efficiency.

    People have posted gazillions of performance measurements regarding tmpfs vs. ext4. You can of course say that the measurements are invalid. But heck, every *single time* tmpfs wins by huge margins. That should tell you at least something!

    Then, so far exactly 1 bug in rhbz has been reported that has anything to do with the tmpfs move. ONE BUG! That’s next to nothing. So far all our experience says it’s unproblematic.

    And regarding your comment about Debian, well, they are the most conservative distro around, do you really expect them to adopt something new as first?

    Honestly, your complaints are entirely without merit as it appears. You even have kernel folks commenting here telling you who little sense you make.

    • rich

      All algorithms in real life are NP-complete?! Not sure what universe you’re living in here. BTW Lennart you can post under your real name if you want.

  19. Tom

    You seem to be missing the point:

    Using a regular filesystem, there are two reasons files are written to disk:
    1) because you are running out of RAM; or
    2) because you need to make sure the data is persistent.

    On /tmp, we explicitly do not want the data to be persistent, so anytime case (2) happens is a waste of resources. Moreover, in order to deal with persistence, we have all sorts of things like journals, copy-on-write, etc. All of this needless logic can be omitted if the filesystem knows that you don’t care about persistence (i.e., you use tmpfs backed by swap), and in some cases this can be seen clearly in practice: http://www.toofishes.net/blog/mysql-and-tmp-tmpfs/.

    This issue is a fundamental one, and I don’t see how you could possibly hope to optimize regular filesystems to be able to compete with tmpfs on performance.

    As to your worry about running out of space: add a swap-file, or convert your /tmp partition into swapspace (in case /tmp is already on a separate partition). You should end up with exactly the same space constraints…

    Lastly, if you have a special case where /tmp should not be tmpfs (or you just don’t want it for whatever reason), then it is really trivial to override the default locally.

    • rich

      My point is that we are adding an extra storage class with lots of different properties, then you placing every traditional use of /tmp into this storage class. That’s not to say there are some applications that could benefit from tmpfs — and today those applications can use /dev/shm (for a long time a tmpfs) to get those benefits. But tmpfs changes /tmp greatly, and forces that change on all users and programmers out there.

      • Tom

        Unless I’m missing something, the properties of tmpfs fit very well the definition of /tmp as specified by the FHS.

        You mention space, and you mention reboot. The former is not a concern as you can just add more swap (the total amount of storage on your computer does stay the same…). And the latter should not be a concern as the FHS specifically states that you cannot assume anything written to /tmp to survive after the program writing it has terminated (and most distros implement this by clearing /tmp on boot). If you want persistence there is /var/tmp.

        Lastly, I already pointed out that this is not forced on users, as the default can easily be overridden.

      • rich

        You can’t just “add more swap”. In VMs, RAM and swap is very tight. Even on physical machines swap and filesystem are not interchangeable. Swap is does not just occupy spare filesystem space (if it did then this argument against tmp-on-tmpfs wouldn’t be necessary). As for the rest of what you say, read my analysis: don’t load very complex storage decisions on programmers and users.

  20. Pingback: I’m joining the club of people complaining about /tmp on tmpfs | Oh Brave New World!

  21. Mark

    I’d also like to leave a comment.

    Fedora’s raison d’etre is to gather community input around its enterprise-oriented derivative, RHEL. So, it does not surprise me that Fedora’s design and defaults are chosen having the enterprises in mind as the main target. They do have plenty of RAM, since their goal is not to swap.

    Being Debian and Fedora the same operating system, the real question is, why would one want enterprise-oriented defaults on a low-resource device? GNU/Linux being versatile does not imply that an arbitrary distro will run or will perform nice on low hardware. I can make a distro that refuses to run on an amount of RAM that is less than some X.

    • rich

      Rather busy now, but you’re wrong because of virtualization where you want to pack as many VMs into memory as possible.

  22. Roy Badami

    I’d always understood the historical rationale behind the original tmpfs was to try to avoid unnecessary disk activity. Filesystems normally offer guarantees about the maximum time before data is committed to physical disk (in modern Linux, the commit mount option; back in SunOS 4, the update process), and about metadata consistency in the event of unclean shutdowns (UFS did synchronous metadata writes; modern filesystems, of course, use journals).

    So you end up with extra I/O, which would not have been requested by the virtual memory layer, but which happens solely for robustness in the even of crahses. If you don’t care about persisting this data across reboots, this is unnecessary I/O.

    Another option might be to use a filesystem that is optimised for speed at the expense of *no* guarantees about unclean shutdowns, and have the init scripts mkfs it every time you boot – but I’m not sure what you’d use. You could tune the value of commit arbitrarily high and turn off write barriers, but neither ext4 nor ext3 seem to have a mount option to disable the journal. Anyone for ext2?

    So tmpfs certainly did serve a legitimate purpose. Even now, it should reduce I/O, but whether it’s worth it anymore is another question.

    roy

  23. Nothing doing, but I just ran out of tmp space, and I still have another 80 GB on my drive. Now I cannot run a program I had wanted to use. What were they thinking?! You mean that I have to reboot in order to regain that 4GB of space. tmpfs is one of the worst ideas fedora has done, along with systemd. I am going to be undoing that “feature”

    • tomegund

      @Elmo: by default /tmp has maximal size set at half of your ram. You can change this at runtime: “mount -o remount,size=75% /tmp”. If you are short on RAM and want to use some of that hard drive space, you can simply add a swap partition, or a swap file if a partition is inconvenient.

  24. Pingback: New in libguestfs: Allow cache mode to be selected | Richard WM Jones

  25. Pingback: Masking systemd services in a guest | Richard WM Jones

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s