LINUX: Unlock page on afs_linux_read_cache errors
When afs_linux_read_cache is called with a non-NULL task, it is
responsible for unlocking 'page' (unless it's unlocked in a background
task), even if we encounter an error. Currently we almost always do
unlock the given page for a non-NULL task, but if we manage to hit one
of the codepaths that 'goto out', we skip over the unlock_page() call
near the end of the function, and the page never gets unlocked.
As a result, the page stays locked forever. That generally means any
future access to the same file will block forever, and when we try to
flush the relevant vcache, we will block waiting for the page lock
while holding GLOCK. (This can happen via the background daemon via
e.g. afs_ShakeLooseVCaches -> osi_TryEvictVCache -> afs_FlushVCache ->
osi_VM_FlushVCache -> vmtruncate -> ... -> truncate_inode_pages_range
-> __lock_page on Linux 2.6.32-754.2.1.el6.) This quickly brings the
whole client to a halt until the machine can be forcibly rebooted.
To solve this, just move the 'out:' label to before the page unlock.
Add a few locking-related comments around the relevant code to help
explain some relevant details.
The relevant code has changed and been refactored over the years, but
this problem has probably existed ever since this code was originally
converted to using the readpage() of the underlying cache fs, in
commit
88a03758 (Use readpage, not read for fastpath access).
Reviewed-on: https://gerrit.openafs.org/13672
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit
eed79e2d28dcab889d01869e57dec14fd30d421c)
Change-Id: I6391897473e701bd81eb334935317dc5009612da
Reviewed-on: https://gerrit.openafs.org/13765
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>