Figuring out why I'm going over hard-drive quota

Ram Rachum asked:

I suck at system administration, so if I’m getting something basic wrong, please let me know.

Here is something that drives me nuts. At work, we have a big NFS server that serves all the employees of our company. Everyone has a certain number of GBs that they’re allowed to write to it. I often get “quota exceeded” errors, because I run some programs that generate a lot of temporary files and then delete them, but before they can delete them they hit the quota.

After talking with our sysadmins, I learned that my quota was already increased to well beyond what I need for these tests, but it seems that I’m spending this quota in places other than my home folder. The sysadmin explained to me that every file in the NFS server which has my username as an owner, counts against my quota.

I wanted to get a list of these files so I could delete a lot of files that I don’t need anymore. But he told me that the only way is to do a search of the entire filesystem of the entire company, going through everyone’s home folders. i.e., a time-consuming process. He’s doing this search right now.

What sounds weird to me is this: When Linux gives me a “quota exceeded” error, it seems to be able to know instantly that I’m going over my quota. Not a time-consuming process. So how come I can’t get the list of files that are counted against my quota, without doing a long search?

My answer:

I can think of two things that might be causing your quota problems.

First, you should know that quotas are implemented by creating a tiny database on the filesystem, which is updated each time a file is created, modified or deleted. (Actually there are two of them, one for user quotas and one for group quotas.) When quotas were first turned on, this database was initialized by checking the usage of every file on the filesystem and recording the results per user and/or per group in these files. Because they are kept up to date by the filesystem driver every time there is activity, looking up a user’s current quota usage is fast.

There is a problem. The quota database can be corrupted if the filesystem isn’t unmounted cleanly, for instance if there’s a hard power off. When this happens, the admin should run quotacheck to verify and rebuild the database when rebooting the system, but this might not have happened. Or cosmic rays or hard drive failure could corrupt them.

Running quotacheck, however, requires that the filesystem be unmounted, or at minimum mounted read-only, so it’s unavailable for use while the quota database is being rebuilt. This could take a long time, so it is something that unfortunately rarely gets done. The NFS server admin should schedule downtime to check the filesystem quotas, and should consider changing procedures so that quotacheck is always run when rebooting after a crash.

Second, based on your description, it’s possible that you’ve hit the inode quota. In addition to restricting the amount of disk space, quotas can also restrict the number of files that can be created. If you create large numbers of temporary files, then this may be what is happening. You (or the NFS server admin) should also check this. Run quota -s to see what the database thinks you have used compared to your limits.

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.