Benjamin Kaduk [Sun, 11 Dec 2016 22:38:18 +0000 (17:38 -0500)]
Update openafs-client directives for 1.8.0
klog and knfs are gone, so don't install them and don't
update alternatives to point to them. We have to remove
the klog.afs alternative from the 1.8 preinst since the
1.6 prerm will not do so for upgrades.
Benjamin Kaduk [Tue, 6 Dec 2016 22:07:40 +0000 (17:07 -0500)]
Update libafsdep files for in-kernel fortuna
Commit 0d67b00ff9db added heimdal's rand-fortuna PRNG to the kernel
module on all architectures, even though it is only needed on the small
subset that do not provide a cryptographically strong random number
generator to kernel module consumers. This was done to ensure that
the build infrastructure for it gets regularly exercised by developers.
However, not all build infrastructure was exercised at the time of
that submission; in particular, the make_libafs_tree.pl script was
not tested. This led to a situation where the libafs tree generated
by that script omitted several files that were now referenced by
the kernel build due to the fortuna import.
To remedy the situation, list the additional files that are needed,
so that they will be copied into the build area for this class of
kernel module builds.
Since the libafs-tree functionality is used to build the Debian
kernel-module source packages, this fix is needed in order to have
a tree that can be built into debian packages without patching.
This is a little silly, because if rxi_FlushWrite has anything to do,
it just acquires/drops call->lock again.
This seems like a very minor performance penalty, but in the right
situation it can become more noticeable. Specifically, when an Rx call
on the server ends successfully, rx_EndCall will rxi_FlushWrite (to
send out the last Rx packet to the client) before marking the call as
finished. If the client receives the last Rx packet and starts a new
Rx call on the same channel before the server locks the call again,
the client can receive a BUSY packet (because it looks like the
previous call on the server hasn't finished yet). Nothing breaks, but
this means the client waits 3 seconds to retry.
This situation can probably happen with various rates of success in
almost any situation, but I can see it consistently happen with 'vos
move' when running 'vos' on the same machine as the source fileserver.
It is most noticeable when moving a large number of small volumes
(since you must wait an extra 3+ seconds per volume, where nothing is
happening).
To avoid this, create a new variant of rxi_FlushWrite, called
rxi_FlushWriteLocked. This just assumes the call lock is already held
by the caller, and avoids one extra lock/unlock pair. This is not the
only place where we unlock/lock the call during the rx_EndCall
situation described above, but it seems to be easiest to solve, and
it's enough (for me) to avoid the 3-second delay in the 'vos move'
scenario. Ideally, Rx should be able to atomically 'end' a call while
sending out this last packet, but for now, this commit is easy to do.
Note that rxi_FlushWrite previously didn't do much of note before
locking the call. It did call rxi_FreePackets without the call lock,
but calling that with the call lock is also fine; other callers do
that.
Andrew Deason [Mon, 9 Mar 2015 23:01:29 +0000 (18:01 -0500)]
LINUX: Don't compile syscall code with keyrings
osi_syscall_init() is not currently called if we have kernel keyrings
support, since we don't need to set up or alter any syscalls if we
have kernel keyrings (we track PAGs by keyrings, and we use ioctls
instead of the AFS syscall now).
Since we don't call it, this commit makes us also not compile the
relevant syscall-related code. This allows new platforms to be added
without needing to deal with any platform-specific code for handling
32-bit compat processes and such, since usually we don't need to deal
with intercepting syscalls.
To do this, we just define osi_syscall_init and osi_syscall_cleanup as
noops if we have keyrings support. This allows us to reduce the #ifdef
clutter in the actual callers.
Note that the 'afspag' module does currently call osi_syscall_init
unconditionally, but this seems like an oversight. With this change,
the afspag module will no longer alter syscalls when we have linux
keyrings support.
Change-Id: I219b92d89303975765743712587ff897b55a2631
Reviewed-on: https://gerrit.openafs.org/11936 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Marcio Barbosa [Mon, 28 Nov 2016 14:42:44 +0000 (09:42 -0500)]
afs: release the packets used by rx on shutdown
When the OpenAFS client is unmounted on DARWIN, the blocks of packets
allocated by RX are released. Historically, the memory used by those
packets was never properly released.
As we can see, ‘rx_mallocedP’ is a global pointer that stores the
first address of the last allocated block of packets. As a result, when
‘rxi_FreeAllPackets’ is called, only the last block is released.
However, 230dcebcd61064cc9aab6d20d34ff866a5c575ea moved the global
pointer in question to the end of the last block. As a result, when the
OpenAFS client is unmounted on DARWIN, the ‘rxi_FreeAllPackets’
function releases the wrong block of memory. This problem was exposed
on OS X 10.12 Sierra where the system crashes when the OpenAFS client
is unmounted.
To fix this problem, store the address of every single block of packets
in a queue and release one by one when the OpenAFS client is unmounted.
Mark Vitale [Mon, 7 Nov 2016 19:16:50 +0000 (14:16 -0500)]
dir: do not leak contents of deleted directory entries
Deleting an AFS directory entry (afs_dir_Delete) merely removes the
entry logically by updating the allocation map and hash table. However,
the entry itself remains on disk - that is, both the cache manager's
cache partition and the fileserver's vice partitions.
This constitutes a leak of directory entry information, including the
object's name and MKfid (vnode and uniqueid). This leaked information
is also visible on the wire during FetchData requests and volume
operations.
Modify afs_dir_Delete to clear the contents of deleted directory
entries.
Patchset notes:
This commit only prevents leaks for newly deleted entries. Another
commit in this patchset prevents leaks of partial object names upon
reuse of pre-existing deleted entries. A third commit in this
patchset prevents yet another kind of directory entry leak, when
internal buffers are reused to create or enlarge existing directories.
All three patches are required to prevent new leaks. Two additional
salvager patches are also included to assist administrators in the
cleanup of pre-existing leaks.
[kaduk@mit.edu: style nit for sizeof() argument]
Reviewed-on: https://gerrit.openafs.org/12460 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f591f6fae3d8b8d44140ca64e53bad840aeeeba0)
Change-Id: I41f76649f4bed609793b944db32c5ae62aa07458
Reviewed-on: https://gerrit.openafs.org/12465 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Mon, 7 Nov 2016 05:29:22 +0000 (23:29 -0600)]
afs: do not leak stale data in buffers
Similar to the previous commit, zero out the buffer when fetching
a new slot, to avoid the possibility of leaving stale data in
a reused buffer.
We are not supposed to write such stale data back to a fileserver,
but this is an extra precaution in case of bugs elsewhere -- memset
is not as expensive as it was in the 1980s.
Reviewed-on: https://gerrit.openafs.org/12459 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit a26c5054ee501ec65db3104f6a6a0fef634d9ea7)
Change-Id: Id60559ed84581e2f6a50cd4313f64780b8a0bafd
Reviewed-on: https://gerrit.openafs.org/12464 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Mark Vitale [Fri, 13 May 2016 04:01:31 +0000 (00:01 -0400)]
dir: fileserver leaks names of file and directories
Summary:
Due to incomplete initialization or clearing of reused memory,
fileserver directory objects are likely to contain "dead" directory
entry information. These extraneous entries are not active - that is,
they are logically invisible to the fileserver and client. However,
they are physically visible on the fileserver vice partition, on the
wire in FetchData replies, and on the client cache partition. This
constitutes a leak of directory information.
Characterization:
There are three different kinds of "dead" residual directory entry
leaks, each with a different cause:
1. There may be partial name data after the null terminator in a live
directory entry. This happens when a previously used directory entry
becomes free, then is reused for a directory entry with a shorter name.
This may be addressed in a future commit.
2. "Dead" directory entries are left uncleared after an object is
deleted or renamed. This may be addressed in a future commit.
3. Residual directory entries may be inadvertently picked up when a new
directory is created or an existing directory is extended by a 2kiBi
page. This is the most severe problem and is addressed by this commit.
This third kind of leak is the most severe because the leaked
directory information may be from _any_ other directory residing on the
fileserver, even if the current user is not authorized to see that
directory.
Root cause:
The fileserver's directory/buffer package shares a pool of directory
page buffers among all fileserver threads for both directory reads and
directory writes. When the fileserver creates a new directory or
extends an existing one, it uses any available unlocked buffer in the
pool. This buffer is likely to contain another directory page recently
read or written by the fileserver. Unfortunately the fileserver only
initializes the page header fields (and the first two "dot" and "dotdot"
entries in the case of a new directory). Any residual entries in the
rest of the directory page are now logically "dead", but still
physically present in the directory. They can easily be seen on the
vice partition, on the wire in a FetchData reply, and on the cache
partition.
Note:
The directory/buffer package used by the fileserver is also used by the
salvager and the volserver. Therefore, salvager activity may also leak
directory information to a certain extent. The volserver vos split
command may also contribute to leaks. Any volserver operation that
creates volumes (create, move, copy, restore, release) may also have
insignificant leaks. These less significant leaks are addressed by this
commit as well.
Exploits:
Any AFS user authorized to read directories may passively exploit this
leak by capturing wire traffic or examining his local cache as he/she
performs authorized reads on existing directories. Any leaked data will
be for other directories the fileserver had in the buffer pool at the
time the authorized directories were created or extended.
Any AFS user authorized to write a new directory may actively exploit
this leak by creating a new directory, flushing cache, then re-reading
the newly created directory. Any leaked data will be for other
directories the fileserver had in the buffer pool within the last few
seconds. In this way an authorized user may sample current fileserver
directory buffer contents for as long as he/she desires, without being
detected.
Directories already containing leaked data may themselves be leaked,
leading to multiple layers of leaked data propagating with every new or
extended directory.
The names of files and directories are the most obvious source of
information in this leak, but the FID vnode and uniqueid are leaked as
well. Careful examination of the sequences of leaked vnode numbers and
uniqueids may allow an attacker to:
- Discern each layer of old directories by observing breaks in
consecutive runs of vnode and/or uniqueid numbers.
- Infer which objects may reside on the same volume.
- Discover the order in which objects were created (vnode) or modified
(uniqueid).
- Know whether an object is a file (even vnode) or a directory (odd
vnode).
Prevent new leaks by always clearing a pool buffer before using it to
create or extend a directory.
Existing leaks on the fileserver vice partitions may be addressed in a
future commit.
Reviewed-on: https://gerrit.openafs.org/12458 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 70065cb1831dbcfd698c8fee216e33511a314904)
Change-Id: Ifa9d9266368ed3775898b7628ca980edcb230356
Reviewed-on: https://gerrit.openafs.org/12463 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Sun, 6 Nov 2016 21:06:02 +0000 (15:06 -0600)]
bos: allow salvage -salvagedirs with -all
Allow the -salvagedirs option on bos salvage when invoked with the -all
option to salvage the whole server. The -salvagedirs -all options will
rebuild every directory on the server.
Reviewed-on: https://gerrit.openafs.org/12457 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 1637c4d7c1ce407390f65509a3a1c764a0c06aa6)
[not actually cherry picked, but is the equivalent functionality]
Change-Id: I3978a5c4a704e0a0f2aab1cfad75573c16496a4d
Reviewed-on: https://gerrit.openafs.org/12462 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Sun, 6 Nov 2016 20:31:22 +0000 (14:31 -0600)]
dafs: honor salvageserver -salvagedirs
Do not ignore the -salvagedirs option when given to the salvageserver.
When the salvageserver is running with this option, all directories will
be rebuilt by salvages spawned by the dafs salvageserver, including all
demand attach salvages and salvages of individual volumes initiated by
bos salvage.
This does not affect the whole partition salvages initiated by bos
salvage -all.
Reviewed-on: https://gerrit.openafs.org/12456 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 9e66234951cca3ca77e94ab431f739e85017a23a)
Change-Id: I121299a5524cb46a519aead7818b0a7bd2fd4f69
Reviewed-on: https://gerrit.openafs.org/12461 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Mark Vitale [Mon, 7 Nov 2016 19:16:50 +0000 (14:16 -0500)]
dir: do not leak contents of deleted directory entries
Deleting an AFS directory entry (afs_dir_Delete) merely removes the
entry logically by updating the allocation map and hash table. However,
the entry itself remains on disk - that is, both the cache manager's
cache partition and the fileserver's vice partitions.
This constitutes a leak of directory entry information, including the
object's name and MKfid (vnode and uniqueid). This leaked information
is also visible on the wire during FetchData requests and volume
operations.
Modify afs_dir_Delete to clear the contents of deleted directory
entries.
Patchset notes:
This commit only prevents leaks for newly deleted entries. Another
commit in this patchset prevents leaks of partial object names upon
reuse of pre-existing deleted entries. A third commit in this
patchset prevents yet another kind of directory entry leak, when
internal buffers are reused to create or enlarge existing directories.
All three patches are required to prevent new leaks. Two additional
salvager patches are also included to assist administrators in the
cleanup of pre-existing leaks.
[kaduk@mit.edu: style nit for sizeof() argument]
Change-Id: Iabaafeed09a2eb648107b7068eb3dbf767aa2fe9
Reviewed-on: https://gerrit.openafs.org/12460 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Mon, 7 Nov 2016 05:29:22 +0000 (23:29 -0600)]
afs: do not leak stale data in buffers
Similar to the previous commit, zero out the buffer when fetching
a new slot, to avoid the possibility of leaving stale data in
a reused buffer.
We are not supposed to write such stale data back to a fileserver,
but this is an extra precaution in case of bugs elsewhere -- memset
is not as expensive as it was in the 1980s.
Change-Id: I344e772e9ec3d909e8b578933dd9c6c66f0a8cf6
Reviewed-on: https://gerrit.openafs.org/12459 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Fri, 13 May 2016 04:01:31 +0000 (00:01 -0400)]
dir: fileserver leaks names of file and directories
Summary:
Due to incomplete initialization or clearing of reused memory,
fileserver directory objects are likely to contain "dead" directory
entry information. These extraneous entries are not active - that is,
they are logically invisible to the fileserver and client. However,
they are physically visible on the fileserver vice partition, on the
wire in FetchData replies, and on the client cache partition. This
constitutes a leak of directory information.
Characterization:
There are three different kinds of "dead" residual directory entry
leaks, each with a different cause:
1. There may be partial name data after the null terminator in a live
directory entry. This happens when a previously used directory entry
becomes free, then is reused for a directory entry with a shorter name.
This may be addressed in a future commit.
2. "Dead" directory entries are left uncleared after an object is
deleted or renamed. This may be addressed in a future commit.
3. Residual directory entries may be inadvertently picked up when a new
directory is created or an existing directory is extended by a 2kiBi
page. This is the most severe problem and is addressed by this commit.
This third kind of leak is the most severe because the leaked
directory information may be from _any_ other directory residing on the
fileserver, even if the current user is not authorized to see that
directory.
Root cause:
The fileserver's directory/buffer package shares a pool of directory
page buffers among all fileserver threads for both directory reads and
directory writes. When the fileserver creates a new directory or
extends an existing one, it uses any available unlocked buffer in the
pool. This buffer is likely to contain another directory page recently
read or written by the fileserver. Unfortunately the fileserver only
initializes the page header fields (and the first two "dot" and "dotdot"
entries in the case of a new directory). Any residual entries in the
rest of the directory page are now logically "dead", but still
physically present in the directory. They can easily be seen on the
vice partition, on the wire in a FetchData reply, and on the cache
partition.
Note:
The directory/buffer package used by the fileserver is also used by the
salvager and the volserver. Therefore, salvager activity may also leak
directory information to a certain extent. The volserver vos split
command may also contribute to leaks. Any volserver operation that
creates volumes (create, move, copy, restore, release) may also have
insignificant leaks. These less significant leaks are addressed by this
commit as well.
Exploits:
Any AFS user authorized to read directories may passively exploit this
leak by capturing wire traffic or examining his local cache as he/she
performs authorized reads on existing directories. Any leaked data will
be for other directories the fileserver had in the buffer pool at the
time the authorized directories were created or extended.
Any AFS user authorized to write a new directory may actively exploit
this leak by creating a new directory, flushing cache, then re-reading
the newly created directory. Any leaked data will be for other
directories the fileserver had in the buffer pool within the last few
seconds. In this way an authorized user may sample current fileserver
directory buffer contents for as long as he/she desires, without being
detected.
Directories already containing leaked data may themselves be leaked,
leading to multiple layers of leaked data propagating with every new or
extended directory.
The names of files and directories are the most obvious source of
information in this leak, but the FID vnode and uniqueid are leaked as
well. Careful examination of the sequences of leaked vnode numbers and
uniqueids may allow an attacker to:
- Discern each layer of old directories by observing breaks in
consecutive runs of vnode and/or uniqueid numbers.
- Infer which objects may reside on the same volume.
- Discover the order in which objects were created (vnode) or modified
(uniqueid).
- Know whether an object is a file (even vnode) or a directory (odd
vnode).
Prevent new leaks by always clearing a pool buffer before using it to
create or extend a directory.
Existing leaks on the fileserver vice partitions may be addressed in a
future commit.
Change-Id: Ia980ada6a2b1b2fd473ffc71e9fd38255393b352
Reviewed-on: https://gerrit.openafs.org/12458 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Sun, 6 Nov 2016 21:06:02 +0000 (15:06 -0600)]
bos: re-add -salvagedirs for use with -all
The MR-AFS support code had a -salvagedirs option that was passed
through to the salvager (when running, and when -all was used),
that was removed in commit a9301cd2dc1a875337f04751e38bba6f1da7ed32
along with the rest of the MR-AFS commands and options.
However, it is useful in its own right, so add it back and allow
the use of -salvagedirs -all to rebuild every directory on the server.
Change-Id: Ifc9c0e4046bf049fe04106aec5cad57d335475e3
Reviewed-on: https://gerrit.openafs.org/12457 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Sun, 6 Nov 2016 20:31:22 +0000 (14:31 -0600)]
dafs: honor salvageserver -salvagedirs
Do not ignore the -salvagedirs option when given to the salvageserver.
When the salvageserver is running with this option, all directories will
be rebuilt by salvages spawned by the dafs salvageserver, including all
demand attach salvages and salvages of individual volumes initiated by
bos salvage.
This does not affect the whole partition salvages initiated by bos
salvage -all.
Change-Id: I4dd515ffa8f962c61e922217bee20bbd88bcd534
Reviewed-on: https://gerrit.openafs.org/12456 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Anders Kaseorg [Sat, 5 Nov 2016 00:17:32 +0000 (20:17 -0400)]
Remove NULL checks for AFS_NONNULL parameters
Recent GCC warns about opr_Assert(p != NULL), where p is an
__attribute__((__nonnull__)) parameter, just like clang did before those
clang warnings were silenced by 11852, 11853.
Now, we could go and add more autoconf tests and pragmas to silence the
GCC versions of these warnings. However, I maintain that silencing the
warnings is the wrong approach. The asserts in question have no
purpose. They do not add any safety, because GCC and clang are
optimizing them away at compile time (without proof!—they take the
declaration at its word that NULL will never be passed). Just remove
them.
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
In file included from casestrcpy.c:17:0:
casestrcpy.c: In function ‘opr_lcstring’:
casestrcpy.c:26:31: error: nonnull argument ‘d’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:26:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c:26:18: error: nonnull argument ‘s’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:26:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c: In function ‘opr_ucstring’:
casestrcpy.c:46:31: error: nonnull argument ‘d’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:46:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c:46:18: error: nonnull argument ‘s’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:46:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c: In function ‘opr_strcompose’:
/…/openafs/include/afs/opr.h:28:12: error: nonnull argument ‘buf’ compared to NULL [-Werror=nonnull-compare]
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^
/…/openafs/include/afs/opr.h:37:25: note: in expansion of macro ‘__opr_Assert’
# define opr_Assert(ex) __opr_Assert(ex)
^~~~~~~~~~~~
casestrcpy.c:98:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(buf != NULL);
^~~~~~~~~~
kalocalcell.c: In function ‘ka_CellToRealm’:
/…/openafs/include/afs/opr.h:28:12: error: nonnull argument ‘realm’ compared to NULL [-Werror=nonnull-compare]
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^
/…/openafs/include/afs/opr.h:37:25: note: in expansion of macro ‘__opr_Assert’
# define opr_Assert(ex) __opr_Assert(ex)
^~~~~~~~~~~~
kalocalcell.c:117:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(realm != NULL);
^~~~~~~~~~
Dave Botsch [Thu, 17 Nov 2016 18:22:17 +0000 (13:22 -0500)]
Mac OS Sierra deprecates syscall()
The syscall() function has been deprecated in MacOS 10.12 - Sierra. After
discussions with developers, it would appear that syscall() isn't really
needed, anymore, so we can just do away with it.
Dave Botsch [Thu, 3 Nov 2016 16:22:21 +0000 (12:22 -0400)]
Define OSATOMIC_USE_INLINED to get usable atomics on DARWIN
In Mac OS 10.12, legacy interfaces for atomic operations have been
deprecated. Defining OSATOMIC_USE_INLINED gets us inline implementations
of the OSAtomic interfaces in terms of the <stdatomic.h> primitives.
This is a transition convenience.
Also indent preprocessor directives within the main DARWIN block to
improve readability.
Michael Meffie [Sat, 5 Nov 2016 16:42:19 +0000 (12:42 -0400)]
SOLARIS: convert from ancient _depends_on to ELF dependencies
The ancient way of declaring module dependencies with _depends_on has
been deprecated since SunOS 2.6 (circa 1996). The presence of the old
_depends_on symbol triggers a warning message on the console starting
with Solaris 12, and the kernel runtime loader (krtld) feature of using
the _depends_on symbol to load dependencies may be removed in a future
version of Solaris.
Convert the kernel module from the ancient _depends_on method to modern
ELF dependencies. Remove the old _depends_on symbol and specify the -dy
and -N <name> linker options to set the ELF dependencies at link time,
as recommended in the Solaris device driver developer guidelines [1].
This commit does not change the declared dependencies, which may be
vestiges of ancient afs versions.
Mark Vitale [Thu, 4 Aug 2016 22:42:27 +0000 (18:42 -0400)]
LINUX: do not use d_invalidate to evict dentries
When working within the AFS filespace, commands which access large
numbers of OpenAFS files (e.g., git operations and builds) may result in
active files (e.g., the current working directory) being evicted from the
dentry cache. One symptom of this is the following message upon return
to the shell prompt:
"fatal: unable to get current working directory: No such file or
directory"
Starting with Linux 3.18, d_invalidate returns void because it always
succeeds. Commit a42f01d5ebb13da575b3123800ee6990743155ab adapted
OpenAFS to cope with the new return type, but not with the changed
semantics of d_invalidate. Because d_invalidate can no longer fail with
-EBUSY when invoked on an in-use dentry. OpenAFS must no longer trust it
to preserve in-use dentries.
Modify the dentry eviction code to use a method (d_prune_aliases) that
does not evict in-use dentries.
Change-Id: I1826ae2a89ef4cf6b631da532521bb17bb8da513
Reviewed-on: https://gerrit.openafs.org/12363 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Wed, 18 May 2016 04:36:12 +0000 (00:36 -0400)]
salvager: fix error message for invalid volumeid
If the specified volumeid is invalid (e.g. volume name was specified
instead of volume number), the error is reported via Log(). However,
commit 24fed351fd13b38bfaf9f278c914a47782dbf670 moved the log opening
logic from before this check to after it, effectively making this Log()
call a no-op.
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
rxperf.c: In function ‘rxperf_server’:
rxperf.c:930:4: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (ptr && *ptr != '\0')
^~
rxperf.c:932:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
break;
^~~~~
rxperf.c: In function ‘rxperf_client’:
rxperf.c:1102:4: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (ptr && *ptr != '\0')
^~
rxperf.c:1104:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
break;
^~~~~
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
curseswindows.c: In function ‘gator_cursesgwin_drawchar’:
curseswindows.c:574:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:576:9: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
curseswindows.c:579:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:581:9: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
curseswindows.c: In function ‘gator_cursesgwin_drawstring’:
curseswindows.c:628:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:630:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
curseswindows.c:633:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:635:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
Anders Kaseorg [Sat, 5 Nov 2016 00:44:00 +0000 (20:44 -0400)]
src/afsd/afsd.c: Fix misleading indentation
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
afsd.c: In function ‘afsd_run’:
afsd.c:2176:6: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (enable_rxbind)
^~
afsd.c:2178:3: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
afsd_syscall(AFSOP_ADVISEADDR, code, addrbuf, maskbuf, mtubuf);
^~~~~~~~~~~~
afsd.c:2487:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (afsd_debug)
^~
afsd.c:2490:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
afsd_syscall(AFSOP_GO, 0);
^~~~~~~~~~~~
Anders Kaseorg [Sat, 5 Nov 2016 00:39:34 +0000 (20:39 -0400)]
src/ubik/uinit.c: Fix misleading indentation
Fixes this warning (error with --enable-checking) from GCC 6.2:
uinit.c: In function ‘internal_client_init’:
uinit.c:96:2: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (code)
^~
uinit.c:98:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
return code;
^~~~~~
Anders Kaseorg [Sat, 5 Nov 2016 00:38:08 +0000 (20:38 -0400)]
src/rx/rx_packet.c: Fix misleading indentation
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
rx_packet.c: In function ‘rxi_ReceiveDebugPacket’:
rx_packet.c:2009:9: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (rx_stats_active)
^~
rx_packet.c:2011:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
s = (afs_int32 *) & rx_stats;
^
rx_packet.c:2017:9: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (rx_stats_active)
^~
rx_packet.c:2019:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
rxi_SendDebugPacket(ap, asocket, ahost, aport, istack);
^~~~~~~~~~~~~~~~~~~
Anders Kaseorg [Sat, 5 Nov 2016 00:36:51 +0000 (20:36 -0400)]
src/rxgen/rpc_parse.c: Fix misleading indentation
Fixes this warning (error with --enable-checking) from GCC 6.2:
rpc_parse.c: In function ‘analyze_ProcParams’:
rpc_parse.c:861:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (tokp->kind != TOK_RPAREN)
^~
rpc_parse.c:863:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
*tailp = decls;
^
Anders Kaseorg [Sat, 5 Nov 2016 00:18:52 +0000 (20:18 -0400)]
regen.sh: Use libtoolize -i, and .gitignore generated build-tools
Recent libtoolize actually deletes build-tools/missing, which Git was
treating as a change to the working copy. Besides, we should let
libtoolize copy in its more recent version of config.guess, config.sub,
and install-sh.
Mark Vitale [Thu, 4 Aug 2016 22:18:15 +0000 (18:18 -0400)]
LINUX: split dentry eviction from osi_TryEvictVCache
To make osi_TryEvictVCache clearer, and to prepare for a future change
in dentry eviction, split the dentry eviction logic into its own routine
osi_TryEvictDentries.
No functional difference should be incurred by this commit.
Change-Id: I5b255fd541d09159d70f8d7521ca8f2ae7fe5c2b
Reviewed-on: https://gerrit.openafs.org/12362 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Joe Gorse <jhgorse@gmail.com>
Mark Vitale [Thu, 20 Oct 2016 04:49:37 +0000 (00:49 -0400)]
Linux 4.9: inode_change_ok() becomes setattr_prepare()
Linux commit 31051c85b5e2 "fs: Give dentry to inode_change_ok() instead
of inode" renames and modifies inode_change_ok(inode, attrs) to
setattr_prepare(dentry, attrs).
Mark Vitale [Fri, 16 Sep 2016 23:01:19 +0000 (19:01 -0400)]
Linux 4.9: inode_operation rename now takes flags
In Linux 3.15 commit 520c8b16505236fc82daa352e6c5e73cd9870cff,
inode_operation rename2() was added. It takes the same arguments as
rename(), with an added flags argument supporting the following values:
RENAME_NOREPLACE: if "new" name exists, fail with -EEXIST. Without
this flag, the default behavior is to replace the "new" existing file.
RENAME_EXCHANGE: exchange source and target; both must exist.
OpenAFS never implemented a .rename2() routine because it was optional
when introduced at Linux v3.15.
In Linux 4.9-rc1 the following commits remove the last in-tree uses of
.rename() and converts .rename2() to .rename(). aadfa8019e81 vfs: add note about i_op->rename changes to porting 2773bf00aeb9 fs: rename "rename2" i_op to "rename" 18fc84dafaac vfs: remove unused i_op->rename 1cd66c93ba8c fs: make remaining filesystems use .rename2 e0e0be8a8355 libfs: support RENAME_NOREPLACE in simple_rename() f03b8ad8d386 fs: support RENAME_NOREPLACE for local filesystems
With these changes, it is now mandatory for OpenAFS afs_linux_rename()
to accept a 5th flag argument.
Add an autoconfig test to determine the signature of .rename(). Use this
information to implement afs_linux_rename() with the appropriate number
of arguments. Implement "toleration support" for the flags option by
treating a zero flag as a normal rename; if any flags are specified,
return -EINVAL to indicate the OpenAFS filesystem does not yet support
any flags.
Mark Vitale [Wed, 14 Sep 2016 22:01:22 +0000 (18:01 -0400)]
Linux 4.9: deal with demise of GROUP_AT
Linux commit 81243eacfa40 "cred: simpler, 1D supplementary groups"
refactors the group_info struct, removing some members (which OpenAFS
references only through the GROUP_AT macro) and adding a gid member.
The GROUP_AT macro is also removed from the tree.
Add an autoconfigure test for the new group_info member gid and define a
replacement GROUP_AT macro to do the right thing under the new regime.
Anders Kaseorg [Sun, 9 Oct 2016 10:39:12 +0000 (06:39 -0400)]
tests/util/ktime-t.c: Specify EST offset in TZ
This fixes test failures observed on new Debian build servers that no
longer install tzdata by default. As the tests expect, EST is defined
as UTC−05:00 with no daylight saving time.
Reviewed-on: https://gerrit.openafs.org/12414 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit e17cd5df703b8a924591f92c76636dd9e0d9eaf9)
Anders Kaseorg [Sun, 9 Oct 2016 10:39:12 +0000 (06:39 -0400)]
tests/util/ktime-t.c: Specify EST offset in TZ
This fixes test failures observed on new Debian build servers that no
longer install tzdata by default. As the tests expect, EST is defined
as UTC−05:00 with no daylight saving time.
Andrew Deason [Mon, 24 Sep 2012 18:03:34 +0000 (13:03 -0500)]
LINUX: Define printf/uprintf as variadic macros
Instead of defining the string 'printf' itself, make printf (and
uprintf) variadic macros. This avoids renaming printf to printk for
things like '__attribute__((format(printf,X,Y)))'.
Note that this is Linux-specific; compilers on other platforms may not
support variadic macros.
This avoids many warnings in the Linux kernel module build if we
include Linux headers after AFS headers.
Reviewed-on: http://gerrit.openafs.org/8150 Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 179096d9b2c461f02236bbf670b46597ff2d4c3c)
Change-Id: I5c1c80cb5bd6996b0329969e16f9359fa1dcbc91
Reviewed-on: https://gerrit.openafs.org/12365 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Mon, 22 Aug 2016 23:53:34 +0000 (19:53 -0400)]
tests: avoid passing NULL strings to vprintf
Some libc implementations will crash when NULL string arguments are given to
*printf. Avoid passing NULL string arguments in the make check tests that did
so, and pass the string "(null)" instead.
Reviewed-on: https://gerrit.openafs.org/12377 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 2fe3a28c6ec0ff9d19ddec5500b3a5e69b483210)
Change-Id: Id8f1635444b5b49e3250addf36b64fccafd59941
Reviewed-on: https://gerrit.openafs.org/12396 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Mon, 19 Sep 2016 01:29:34 +0000 (21:29 -0400)]
ubik: Return an error from ContactQuorum when inquorate
Currently, when we need to contact all other servers in the ubik
quorum (to create a write transaction, and send db changes, etc), we
call the ContactQuorum_* family of functions. To contact each server,
those functions follow an algorithm like the following pseudocode:
{
int rcode = 0;
int code;
int okcalls = 0;
for (ts = ubik_servers; ts; ts = ts->next) {
if (ts->up) {
code = contact_server(ts);
if (code) {
rcode = code;
} else {
okcalls++;
}
}
}
This means that if we successfully contact a majority of ubik sites,
we return success, even if some sites returned an error. If most sites
fail, then we return an error (we arbitrarily pick the last error we
got).
This means that in most situations, a successful write transaction is
guaranteed to have been transmitted to a majority of ubik sites, so
the written data cannot be lost (at least one of the sites that got
the new data will be in a future elected quorum).
However, if a site is already known to be down (ts->up is 0), then we
skip trying to contact that site, but we also don't set any errors.
This means that if a majority of sites are already known to be down
(ts->up is 0), then we can indicate success for a write transaction,
even though the relevant data has not been written to a majority of
sites. In that situation, it is possible to lose data.
Most of the time this is not possible, since a majority of sites must
be 'up' for the sync site to be elected and to allow write
transactions at all. There are a few ways, though, in which we can get
into a situation where most other sites are 'down', but we still let a
write transaction go through.
An example scenario:
Say we have sites A, B, and C. All 3 sites come up at the same time,
and A is the lowest IP so it starts an election (after around BIGTIME
seconds). Right after A is elected the sync site, sites B and C will
have 'lastYesState' set to 0, since site A hasn't yet sent out a
beacon as the sync site.
A client can then start a write to the ubik database on site A, which
site A will allow since it's the sync site (and presumably all the
relevant recovery flags are set). Site A will try to contact sites B
and C for a DISK_Begin call, but lastYesState is set to 0 on those
sites. This will cause DISK_Begin to return UNOQUORUM
(urecovery_AllBetter will return 0, because uvote_HaveSyncAndVersion
will return 0, because lastYesState is not set).
So site A will get a UNOQUORUM error from sites B and C, and so site A
will set 'ts->up' to 0 for sites B and C, and will return UNOQUORUM to
the client. The client may then try to retry the call (because
UNOQUORUM is not treated as a 'global' error in ubikclient.c's
ubik_Call_New), or another client write request could come in. Now
that 'ts->up' is unset for both sites B and C, we skip trying to
contact any remote sites, and the ContactQuorum functions will return
success. So the ubik write will go through successfully, but the new
data will only be on site A.
At this point, if site A crashes, then sites B and C will elect a
quorum, and will not have the modifications that were written to site
A (so the data written to site A is lost). If site A stays up, then it
will go through database recovery, sending the entire database file to
sites B and C.
In addition, it's very possible in this scenario for a client to write
to the database, and then try to read back data and confusingly get a
different result. For example, if someone issues the following two
commands while triggering the above scenario:
$ pts createuser testuser
$ pts examine testuser
If the second command contacts site B or C, then it will always fail,
saying that the user doesn't exist (even though the first command
succeeded). This is because sites B and C don't have the new data
written to site A, at least temporarily. While this confusing behavior
is not completely avoidable in ubik (this can always happen
'sometimes' due to network errors and such), with the scenario
described here, it happens 100% of the time.
The general scenario described above can also happen if sites B and C
are suddenly legitimately unreachable from site A, instead of throwing
the UNOQUORUM error. All of the steps are pretty much the same, but
there is a bit of a delay while we wait for the DISK_Begin call to
fail.
To fix this, do not let 0 be returned if a quorum has not been
reached. In some sense, UNOQUORUM could *always* be returned in
that case, but it is more in keeping with historical behavior to
return a "real" error if there is one available.
It is somewhat questionable whether we should even be propagating
errors received from calls like DISK_Begin/DISK_Commit to the ubik
client (e.g. if we get a -1 from trying to contact a remote site, we
return -1 to the client, so the client may think it couldn't reach the
site at all). But this commit does not change any of that logic, and
should only change behavior when a majority of sites have 'ts->up'
unset. A later commit might effect the change to always return
UNOQUORUM and ignore the actual error values from the DISK_ calls,
but that is not needed to fix the immediate issue.
An important note:
Before this commit, there was a window of about 15 seconds after a
sync site is elected where a write to the ubik db would appear to be
successful, but would only modify the ubik db on the sync site.
(Details described above.) With this commit, writes during that
15-second window will instead fail, because we cannot guarantee that
we won't lose that data. If someone relies on 'udebug' data from the
sync site to let them know when writes will go through successfully,
this commit could appear to cause new errors.
[kaduk@mit.edu: transfer long commit message describing the issue
from an alternative fix, and tidy up accordingly]
Reviewed-on: https://gerrit.openafs.org/12289 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit fac0b742960899123dca6016f6ffc6ccc944f217)
Change-Id: Ic9b4ceada6c743dde49aba82217bb3a9f440bb69
Reviewed-on: https://gerrit.openafs.org/12389 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Mon, 22 Aug 2016 23:53:34 +0000 (19:53 -0400)]
tests: avoid passing NULL strings to vprintf
Some libc implementations will crash when NULL string arguments are given to
*printf. Avoid passing NULL string arguments in the make check tests that did
so, and pass the string "(null)" instead.
Michael Meffie [Sat, 6 Aug 2016 14:41:24 +0000 (10:41 -0400)]
afsd: fix afsd -help crash
afsd crashes after the usage is displayed with the -help option.
$ afsd -help
Usage: ./afsd [-blocks <1024 byte blocks in cache>] [-files <files in cache>]
...
Segmentation fault (core dumped)
The backtrace shows the crash occurs when calling afsconf_Open() with an
invalid pointer argument, even though afsconf_Open() is not even needed
when -help is given.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:32
#1 0x00007ffff726fc36 in *__GI___strdup (s=0x0) at strdup.c:42
#2 0x0000000000408383 in afsconf_Open (adir=0x0) at cellconfig.c:444
#3 0x00000000004054d5 in afsd_run () at afsd.c:1926
#4 0x0000000000407dc5 in main (argc=2, argv=0x7fffffffe348) at afsd_kernel.c:577
afsconf_Open() is called with an uninitialized pointer because commit d72df5a18e0bb8bbcbf23df3e8591072f0cdb770 changed the libcmd
cmd_Dispatch() to return 0 after displaying the command usage when the
-help option is specified. (That fix was needed for scripts which use
the -help option to inspect command options with the -help option.)
The afsd_kernel main function then incorrectly calls the afsd_run()
function, even though mainproc() was not called, which sets up the afsd
option variables. The afsconf_Open() is the first function we call in
afsd_run().
Commit f77c078a291025d593f3170c57b6be5f257fc3e5 split afsd into afsd.c
and afsd_kernel.c to support libuafs (and fuse). This split the parsing
of the command line arguments and the running of the afsd command into
two functions. The mainproc(), which originally did both, was split
into two functions; one (still called mainproc) to check the option
values given and setup/auto-tune values, and another (called afsd_run)
to do the actual running of the afsd command. The afsd_parse() function
was introduced as a wrapper around cmd_Dispatch() which "dispatches"
mainproc.
With this fix, take the opportunity to rename mainproc() to the now more
accurately named CheckOptions() and change afsd_parse() to parse the
command line options with cmd_Parse(), instead of abusing
cmd_Dispatch().
Change the main fuction to avoid running afsd_run() when afsd_parse()
returns the CMD_HELP code which indicates the -help option was given.
afsd.fuse splits the command line arguments into afsd recognized options
and fuse options (everything else), so only afsd recognized arguments
are passed to afsd_parse(), via uafs_ParseArgs(). The -help argument is
processed as part of that splitting of arguments, so afsd.fuse never
passes -help as an argument to afsd_parse(). This means we to not need
to check for CMD_HELP as a return value from uafs_ParseArgs(). But
since this is all a bit confusing, at least check the return value in
uafs_ParseArgs().
Andrew Deason [Wed, 8 Jan 2014 00:24:54 +0000 (18:24 -0600)]
SOLARIS: Support VSW_STATS
Specify the VSW_STATS flag to the vfsdef_t structure we give to
Solaris. This turns on statistics that can be retrieved via fsstat(1M)
and allows the fsinfo::: DTrace provider to work with AFS files.
We don't need to actually maintain these statistics; Solaris does that
for us. This flag just signifies that our vfs_t structure is capable
of storing the information. Since we get our vfs_t from Solaris (via
domount(), it gives us a vfs_t when it calls our afs_mount function)
and do not allocate a vfs_t ourselves, we are safe and this is fine to
do.
Reviewed-on: http://gerrit.openafs.org/10679 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit b0f433986ce344bf153cce1f6372de20750e052b)
Change-Id: I2403703f9caeb190563360d8571ee0be46890f4d
Reviewed-on: https://gerrit.openafs.org/12371 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Thu, 20 Aug 2015 17:55:02 +0000 (13:55 -0400)]
Make setting of CFLAGS_NOSTRICT make sense
Previously, we would set -fno-strict-aliasing only when
--enable-checking was given to configure but not
--enable-checking=all. The intent seems to have been to
only warn about strict aliasing violations when --enable-checking=all
is in use, but that there was no need to disable the strict-aliasing
diagnostics when -Werror was not enabled.
Unfortunately, -fno-strict-aliasing affects not only the diagnostics
emitted by the compiler, but also the code generation! So we were
leaving the normal (no --enable-checking) case with the compiler
assuming C's strict aliasing rules. The OpenAFS codebase has
historically not been strict-aliasing safe (for example,
commit 15e8678661ec49f5eac3954defad84c06b3e0164 refers to a
runtime crash using a certain compiler version, which is diagnosed
as the compiler using the C strict aliasing rules to make
optimizations that exposed the invalid program code.
To avoid futher surprises due to new compiler optimizations
that utilize the C strict aliasing rules, always disable
strict aliasing except when --enable-checking=all is used.
Reviewed-on: https://gerrit.openafs.org/11988 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 687b4d8af07dbcf187dea685e75b420884727efd)
Change-Id: I03b64465a29243f2b4fdaa12e962f078c45ae344
Reviewed-on: https://gerrit.openafs.org/12308 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Sun, 1 May 2016 16:24:30 +0000 (11:24 -0500)]
ubik: Don't RECFOUNDDB if can't contact most sites
Currently, the ubik recovery code will always set UBIK_RECFOUNDDB
during recovery, after asking all other sites for their dbversions.
This happens regardless of how many sites we were actually able to
successfully contact, even if we couldn't contact any of them.
This can cause problems when we are unable to contact a majority of
sites with DISK_GetVersion. Since, if we haven't contacted a majority
of sites, we cannot say with confidence that we know what the best db
version available is (which is what UBIK_RECFOUNDDB represents; that
we've found which database is the one we should be using). This can
also result in UBIK_RECHAVEDB in a similar situation, indicating that
we have the best db version locally, even though we never actually
asked anyone else what their db version was.
For example, say site A is the sync site going through recovery, and
DISK_GetVersion fails for the only other sites B and C. Site A will
then set UBIK_RECFOUNDDB, and will claim that site A has the best db
version available (UBIK_RECHAVEDB). This allows site A to process ubik
write transactions (causing the db to be labelled with a new epoch),
or possibly to send the db to the other sites via DISK_SendFile, if
they quickly become available during recovery. Ubik write transactions
can succeed in this situation, because our ContactQuorum_* calls will
succeed if we never try to contact a remote site ('rcode' defaults to
0).
This situation should be rather rare, because normally a majority of
sites must be reachable by site A for site A to be voted the sync site
in the first place. However, it is possible for site A to lose
connectivity to all other sites immediately after sync site election.
It is also possible for site A to proceed far enough in the recovery
process to set UBIK_RECHAVEDB before it loses its sync site status.
As a result of all of this, if a site with an old database comes
online and there are network connectivity problems between the other
sites and a ubik write request comes in, it's possible for the "old"
database to overwrite the "new" database. This makes it look as if the
database has "rolled back" to an earlier version.
This should be possible with any ubik database, though how to actually
trigger this bug can change due to different ubik servers setting
different network timeouts. It is probably the most likely with the
VLDB, because the VLDB is typically the most frequently written
database.
If a VLDB reverts to an earlier version, it can result in existing
volumes to appear to not exist in the VLDB, and can result in new
volumes re-using volume IDs from existing volumes. This can result in
rather confusing errors.
To fix this, ensure that we have contacted a majority of sites with
DISK_GetVersion before indicating that we have located the best db
version. If we've contacted a majority of sites, then we are
guaranteed (under ubik assumptions) that we've found the best version,
since previous writes to the database should be guaranteed to hit a
majority of sites (otherwise they wouldn't be successful).
If we cannot reach a majority of sites, we just don't set
UBIK_RECFOUNDDB, and the recovery process restarts. Presumably on the
next iteration we'll be able to contact them, or we'll lose sync site
status if we can't reach the other sites for long enough.
Reviewed-on: https://gerrit.openafs.org/12281 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit d3dbdade7e8eaf6da37dd6f1f53d9f1384626071)
Change-Id: I4f4e7255efd3e16e3acfec8f90bf2019cab1fb63
Reviewed-on: https://gerrit.openafs.org/12339 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Tue, 2 Aug 2016 20:52:42 +0000 (16:52 -0400)]
revert: "LINUX: Fix oops during negative dentry caching"
Commit fd23587a5dbc9a15e2b2e83160b947f045c92af1 was done to fix an oops
when parent_vcache_dv() was called without the GLOCK held. Since the
lockless code paths have been removed, and parent_vcache_dv() is always
called with the GLOCK held, revert the extra locked flag argument and
the calls obtain and release the GLOCK within parent_vcache_dv().
This commit made it possible to execute afs_linux_dentry_revalidate
without taking the GLOCK under some circumstances. However, it
achieved this by examining structure members outside of the GLOCK that
were previously only examined under the GLOCK (such as vcp->f.states
and vcp->f.m.DataVersion).
While that does of course improve performance, it is not known to be
completely safe. Revert this commit so we may implement a fastpath
through afs_linux_dentry_revalidate using more trusted lockless
techniques (atomics, RCU, etc).
Andrew Deason [Thu, 26 Jun 2014 22:47:46 +0000 (15:47 -0700)]
afs: Create afs_SetDataVersion
Several different places in the codebase change avc->f.m.DataVersion
for a particular vcache, when we've noticed that the DV for the vcache
has changed. Consolidate all of these occurrences into a single
afs_SetDataVersion function, to make it easier to change what happens
when we notice a change in DV number.
This should incur no behavior change; it is just simple code
reorganization.
Change-Id: I5dbf2678d3c4b5a2fbef6ef045a0b5bfa8a49242
Reviewed-on: https://gerrit.openafs.org/11791 Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Daria Phoebe Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Thomas Keiser <tkeiser@gmail.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Mon, 23 May 2016 02:54:30 +0000 (21:54 -0500)]
ubik: Return an error from ContactQuorum when inquorate
Currently, when we need to contact all other servers in the ubik
quorum (to create a write transaction, and send db changes, etc), we
call the ContactQuorum_* family of functions. To contact each server,
those functions follow an algorithm like the following pseudocode:
{
int rcode = 0;
int code;
int okcalls = 0;
for (ts = ubik_servers; ts; ts = ts->next) {
if (ts->up) {
code = contact_server(ts);
if (code) {
rcode = code;
} else {
okcalls++;
}
}
}
This means that if we successfully contact a majority of ubik sites,
we return success, even if some sites returned an error. If most sites
fail, then we return an error (we arbitrarily pick the last error we
got).
This means that in most situations, a successful write transaction is
guaranteed to have been transmitted to a majority of ubik sites, so
the written data cannot be lost (at least one of the sites that got
the new data will be in a future elected quorum).
However, if a site is already known to be down (ts->up is 0), then we
skip trying to contact that site, but we also don't set any errors.
This means that if a majority of sites are already known to be down
(ts->up is 0), then we can indicate success for a write transaction,
even though the relevant data has not been written to a majority of
sites. In that situation, it is possible to lose data.
Most of the time this is not possible, since a majority of sites must
be 'up' for the sync site to be elected and to allow write
transactions at all. There are a few ways, though, in which we can get
into a situation where most other sites are 'down', but we still let a
write transaction go through.
An example scenario:
Say we have sites A, B, and C. All 3 sites come up at the same time,
and A is the lowest IP so it starts an election (after around BIGTIME
seconds). Right after A is elected the sync site, sites B and C will
have 'lastYesState' set to 0, since site A hasn't yet sent out a
beacon as the sync site.
A client can then start a write to the ubik database on site A, which
site A will allow since it's the sync site (and presumably all the
relevant recovery flags are set). Site A will try to contact sites B
and C for a DISK_Begin call, but lastYesState is set to 0 on those
sites. This will cause DISK_Begin to return UNOQUORUM
(urecovery_AllBetter will return 0, because uvote_HaveSyncAndVersion
will return 0, because lastYesState is not set).
So site A will get a UNOQUORUM error from sites B and C, and so site A
will set 'ts->up' to 0 for sites B and C, and will return UNOQUORUM to
the client. The client may then try to retry the call (because
UNOQUORUM is not treated as a 'global' error in ubikclient.c's
ubik_Call_New), or another client write request could come in. Now
that 'ts->up' is unset for both sites B and C, we skip trying to
contact any remote sites, and the ContactQuorum functions will return
success. So the ubik write will go through successfully, but the new
data will only be on site A.
At this point, if site A crashes, then sites B and C will elect a
quorum, and will not have the modifications that were written to site
A (so the data written to site A is lost). If site A stays up, then it
will go through database recovery, sending the entire database file to
sites B and C.
In addition, it's very possible in this scenario for a client to write
to the database, and then try to read back data and confusingly get a
different result. For example, if someone issues the following two
commands while triggering the above scenario:
$ pts createuser testuser
$ pts examine testuser
If the second command contacts site B or C, then it will always fail,
saying that the user doesn't exist (even though the first command
succeeded). This is because sites B and C don't have the new data
written to site A, at least temporarily. While this confusing behavior
is not completely avoidable in ubik (this can always happen
'sometimes' due to network errors and such), with the scenario
described here, it happens 100% of the time.
The general scenario described above can also happen if sites B and C
are suddenly legitimately unreachable from site A, instead of throwing
the UNOQUORUM error. All of the steps are pretty much the same, but
there is a bit of a delay while we wait for the DISK_Begin call to
fail.
To fix this, do not let 0 be returned if a quorum has not been
reached. In some sense, UNOQUORUM could *always* be returned in
that case, but it is more in keeping with historical behavior to
return a "real" error if there is one available.
It is somewhat questionable whether we should even be propagating
errors received from calls like DISK_Begin/DISK_Commit to the ubik
client (e.g. if we get a -1 from trying to contact a remote site, we
return -1 to the client, so the client may think it couldn't reach the
site at all). But this commit does not change any of that logic, and
should only change behavior when a majority of sites have 'ts->up'
unset. A later commit might effect the change to always return
UNOQUORUM and ignore the actual error values from the DISK_ calls,
but that is not needed to fix the immediate issue.
An important note:
Before this commit, there was a window of about 15 seconds after a
sync site is elected where a write to the ubik db would appear to be
successful, but would only modify the ubik db on the sync site.
(Details described above.) With this commit, writes during that
15-second window will instead fail, because we cannot guarantee that
we won't lose that data. If someone relies on 'udebug' data from the
sync site to let them know when writes will go through successfully,
this commit could appear to cause new errors.
[kaduk@mit.edu: transfer long commit message describing the issue
from an alternative fix, and tidy up accordingly]
Change-Id: If6842d7122ed4d137f298f0f8b7f20350b1e9de6
Reviewed-on: https://gerrit.openafs.org/12289 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
There are some variations here and there, but all locations usually
involve at least some code like that. But they all do the same general
thing: invalidate a vcache so we hit the net the next time we need
that vcache.
In order to make it easier to modify what happens when we invalidate a
vcache, and just to improve the code, take all of these instances and
put the functionality in a single function, called afs_StaleVCache,
which marks the vcache as 'stale'.
To handle a few different situations that must be handled, we have
some flags that can also be passed to the new function. These are
primarily necessary to handle variations in the circumstances under
which we hit this code path; for instance, we may already have
afs_xcbhash locked, or we may be invalidating the entire osidnlc (if
we're invalidating vcaches in bulk, for example).
This should result in the same general behavior in all cases. The only
slight differences in a few cases is that we hold locks for a few more
operations than we used to; for example, we may clear an osidnlc entry
while holding the vcache lock. But these are minor and shouldn't
result in any actual differences in behavior.
So, this commit should just be code reorganization and should incur no
behavior change. However, this reorganization is complex, and should
not be considered a simple risk-free refactoring.
[kaduk@mit.edu: implement Tom Keiser's suggestion of a third argument
to afs_StaleVCacheFlags, add AFS_STALEVC_CLEARCB and
AFS_STALEVC_SKIP_DNLC_FOR_INIT_FLUSHED]
Change-Id: I2b2f606c56d5b22826eeb98471187165260c7b91
Reviewed-on: https://gerrit.openafs.org/11790 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Wed, 17 Aug 2016 14:57:48 +0000 (10:57 -0400)]
CODING: one-line if statements should not have braces
Update the style guide with a declaration of the prevailing and
preferred brace style for one-line if statements and loops. Provide an
example and counter-example.
Change-Id: Iafeea977203b76c0e67385779fb4ed57f3c6699a
Reviewed-on: https://gerrit.openafs.org/12370 Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>