"Briefly, 'host' structures are allocated without clearing all of the
contents to '0'. Only part of the structure is cleared, according to the
HOST_TO_ZERO macro. Unfortunately I put the new tmay_ fields right below
the 'index' field for some reason, so this means they aren't zeroed and
can contain garbage. This means we can easily segfault in the fileserver
when we try to access the pointers in there.
"We access uninitialized memory for every 'host' that is allocated. So
the chance of us corrupting memory is the chance that a particular
pointer-sized area of memory from 'malloc' is not already NULL.
"That seems pretty likely, but it's not so frequent as to have the
fileserver effectively "constantly" crashing at the site that noticed.
So it has not been a fire drill, but it has been noticeable (we heard
about it I think yesterday, and got details today when it happened
again). The noticing incident was a segfault, but an abort or sigbus are
probably also likely.
"Of course, the chances of noticing go way up with more clients. I expect
the chances dramatically increase if you have more than 512 client hosts
hit the box, since the first block of 512 are allocated before we really
do anything. For the next 512, it seems much more likely that 'malloc'
will give us back non-zeroed data. But this is just theory.
"With the incident I know about, the crash happened semi-quickly after
the server started (a few minutes). But it seems likely to occur after
the server has been up for a long time, if/when you cross the next line
of 512 hosts.
"I am also concerned that this can easily be corrupting memory without
being noticed via a crash (or it takes a while to crash), since we are
potentially free'ing invalid pointers, or stomping over someone else's
memory, etc etc."
Benjamin Kaduk [Fri, 24 Jan 2014 17:00:20 +0000 (12:00 -0500)]
FBSD: catch up to 1997 and include if_var.h with if.h
The commit message for upstream's r257244 change includes:
- Make the prophecy from 1997 happen and remove if_var.h inclusion
from if.h.
Despite the clear public posting, we were caught unawares. We made
it down to the cellar despite the missing stairs, but "Beware of
the Leopard" caused us to turn back, apparently.
Since if.h is included in many places and if_var.h is not present
on all OSes, pull the if.h inclusion into the common kernel headers
for afs/ and rx/ , and add in if_var.h (as well as the sys/socket.h
prerequisite).
Michael Meffie [Sat, 15 Feb 2014 17:03:43 +0000 (12:03 -0500)]
viced: fix get-statistics64 buffer overflow
Range check the statsVersion argument of the GetStatisitics64 RPC to
avoid a buffer overflow in the fileserver, or a huge memory allocation,
by a rogue client.
Andrew Deason [Fri, 21 Feb 2014 21:30:49 +0000 (15:30 -0600)]
rx: Avoid rxi_Delay on RXS_CheckResponse failure
Currently we rxi_Delay whenever RXS_CheckResponse fails for any
reason. This can result in disastrous performance degradations if a
client keeps sending "bad" responses, since rxi_Delay'ing here will
delay the Rx listener thread. This means we cannot receive any packets
for about a second, which can easily cause us to drop a lot of
incoming packets.
Instead, send the abort after 1 second by scheduling an event. This
will retain existing behavior from the point of view of the client
(it will get the abort after 1 second), but avoids hanging the Rx
listener thread.
Andrew Deason [Fri, 21 Feb 2014 21:26:35 +0000 (15:26 -0600)]
rx: Split out rxi_SendConnectionAbortLater
Take the functionality in rxi_SendConnectionAbort that schedules a
delayed abort, and split it out into a new function,
rxi_SendConnectionAbortLater. This allows callers an easy interface to
send such a delayed abort with their own delay.
This commit should incur no change in behavior; it is just code
reorganization.
Client host too busy while handling request from host %s:%d viceid %d fid %lu.%lu.%lu, failing request
Cannot get CPS for client while handling request [...], failing request
Cannot reconnect to ptserver while handling request [...], failing request
While the new messages are more informative, and (in my opinion)
better describe what is happening in those situations, they do look
very different from the old messages. This can break scripts that try
to parse these logs, but in general it is also not clear to
administrators that these messages still refer to the same events.
So instead, put these messages back the way they were. Still include
the extra information, of course, but revert the language to look more
like the old messages. Now we log:
CallPreamble: Couldn't get client while handling request from host %s:%d viceid %d fid %lu.%lu.%lu, failing request
CallPreamble: Couldn't get CPS while handling request [...], failing request
CallPreamble: couldn't reconnect to ptserver while handling request [...], failing request
Thanks to Ben Kaduk for bringing this up.
Reviewed-on: http://gerrit.openafs.org/10857 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: D Brashear <shadow@your-file-system.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 0e9bb718ce231ffd73fe11810d8dc1d3902e4b2d)
Change-Id: I35c8369a7efba0c08c000a24e14385209082cfe0
Reviewed-on: http://gerrit.openafs.org/10953 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Fri, 18 Oct 2013 00:22:48 +0000 (20:22 -0400)]
viced: Improve client error log messages
Commit 6c41b1f740e16b5b9adfe9026630595be6f0699e improved a few log
messages to include the client ip and port of the request triggering
that log message. Include the viceid and fid (if applicable), too, so
an administrator may more easily identify the cause.
This creates the function LogClientError, so we can use a common
function for logging very similar information. This also modifies
h_FindClient_r to give the viceid to the caller, even in the case of
error. In addition, this modifies CallPreamble to accept a fid and
modifies all callers to accomodate.
Stephan Wiesand [Wed, 12 Mar 2014 09:47:17 +0000 (10:47 +0100)]
doc: bos setrestricted -mode 0 does make sense
Commit 070230ab76e1df338db3f2a7971111ca976a0c1a added documentation of
the mode parameter to bos setrestricted, claiming that the value 0 is
useless, and commit eee0bf5871944d919951cc8b7b4908ee909c3b62 added
documentation of the restrictmode entry in BosConfig, claiming that it
can only be set back to 0 with an editor. Both claims are wrong, since
bos setrestricted -mode 0 will do exactly that (if it succeeds, which
it only can if the server is running in unrestricted mode, which can
be achieved by sending it the FPE signal). Fix the man pages
accordingly.
Reviewed-on: http://gerrit.openafs.org/10885 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit da549eea21941681c075796512a27a830259c835)
Change-Id: Iea8f252829ba6192176da0ceba464cbc41aad53c
Reviewed-on: http://gerrit.openafs.org/10955 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Stephan Wiesand [Fri, 7 Mar 2014 10:03:36 +0000 (11:03 +0100)]
doc: improve man pages related to bos restricted mode
Mention the restrictmode entry and the commands for setting and
querying it in the BosConfig man page, and add/fix cross references
between the BosConfig, bos, bos_getrestricted and bos_setrestricted
ones.
Reviewed-on: http://gerrit.openafs.org/10874 Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Tested-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit eee0bf5871944d919951cc8b7b4908ee909c3b62)
Change-Id: I25d2f23d75a9074ae478f86296bb13c1b2dda95b
Reviewed-on: http://gerrit.openafs.org/10883 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Fri, 7 Feb 2014 14:55:31 +0000 (06:55 -0800)]
fs: display cell not available on ESRCH
The cache manager pioctls abuse ESRCH to represent errors due to
unavailable cell information. Give a more sensible error message to
the user when a pioctl returns an ESRCH error, instead of "no such
process", which is the conventional meaning of ESRCH.
The new error message is consistent with the Windows implementation
of fs.
For example, on a host with a misconfigured ThisCell and/or CellServDB.
$ fs wscell
fs: No such process
becomes:
$ fs wscell
fs: Cell name not recognized.
Reviewed-on: http://gerrit.openafs.org/10824 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 8beba712d95b637225f215534a759961ff4d80a9)
Change-Id: I0cf6f6e0939a1075332049361153bc8a0b0958ce
Reviewed-on: http://gerrit.openafs.org/10949 Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Tue, 11 Mar 2014 16:40:33 +0000 (12:40 -0400)]
libafs: reset global icl set pointers on shutdown
Avoid panicking when an icl tracing function is called after
shutdown_icl.
There is a window during shutdown in which pioctls can be requested
after the shutdown_icl is issued. Reset the global icl set pointers
so tracing is disabled after the shutdown_icl, instead of using
pointers to freed memory.
Removed the unneeded afs_icl_FindSet calls and use the global
pointers which were set during the initialization.
Reviewed-on: http://gerrit.openafs.org/10884 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 64dd6dd018eb7413636ed6416bd244bb81893d9e)
Change-Id: I65671ee60e3cdf11d9921585dcd67df7cc22c88f
Reviewed-on: http://gerrit.openafs.org/10932 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Marc Dionne [Wed, 19 Mar 2014 15:15:13 +0000 (11:15 -0400)]
Linux: Do drop dentry if lookup returns ENOENT
Commit 997f7fce437787a45ae0584beaae43affbd37cce switched to using
d_invalidate instead of d_drop to prevent unhashing dentries
which are only temporarily invalid and may still be referenced
by someone having a current working directory pointing to it.
This could result in getting ENOENT from getcwd() after some
transient problems, even when the directory is there and
accessible.
The change had the side effect of potentially leaving something
visible when it has actually been removed, for instance a mountpoint
removed by "fs rm".
If afs_lookup returns ENOENT, we want to forcibly drop (unhash)
the dentry, even if it has current users.
Reviewed-on: http://gerrit.openafs.org/10928 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 389473032cf0b200c2c39fd5ace108bdc05c9d97)
Change-Id: Ifeda5a38a01bc136d3ecef01227ecd354da7cc3e
Reviewed-on: http://gerrit.openafs.org/10948 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Mon, 12 Aug 2013 22:37:29 +0000 (17:37 -0500)]
viced: Avoid endless BCB loop
Without this commit, when we break callbacks for a fid, we loop over
all callbacks for the fid, break a few of them, and then start over.
We do this repeatedly until we run out of callbacks. If a client sees
a callback break, and then establishes a new callback promise while
the fileserver is still breaking callbacks, the fileserver can break
the same callback for the same host again and again. This can continue
forever, if the client establishes its new callback promises quickly
enough.
So to avoid this, when we start breaking callbacks, flag all of the
callback structures that we want to look at. Then when we repeatedly
loop through all of the callbacks for the fid, only look at the
flagged callback structures.
This adds a 'flags' field to struct CallBack, and defines a single
flag, CBFLAG_BREAKING.
This is an alternative fix to the issue also fixed in 843d705c. This
implementation avoids allocating extra memory under locks, and has the
slight benefit of not breaking callbacks that were elsewhere deleted
during the BCB. This comes at the cost of a single extra traversal
through our callback list, and the cost of claiming one of the bits in
the CallBack structure.
Reviewed-on: http://gerrit.openafs.org/10172 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit 47124f337b43f8731bfbe3bd71e42d046a4d1075)
Change-Id: I522e0cecd0a9a10bf9eafaae669f4f0005ced893
Reviewed-on: http://gerrit.openafs.org/10755 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Fri, 14 Mar 2014 15:13:15 +0000 (11:13 -0400)]
libafs: DARWIN: update for Xcode 5.1
(1) remove -mlong-branch from amd64 build
Random internet postings suggest that it has triggered a warning
since at least Xcode 3.2, and the gcc manual page suggests that
it is only applicable on ppc, anyway.
(2) remove -mpreferred-stack-boundary=4 from the amd64 build
The evidence here shows up less readily in an internet search,
but it seems that Apple's compilers will force the stack alignment
to 16 bytes regardless of what is passed here. One poster had
trouble with -mpreferred-stack-boundary being unused in Xcode 4.4.1
This change only fixes warnings reported as errors by buildbot; it
does not attempt to fully synchronize with the flags that Xcode 5.1
uses for kernel module builds.
Reviewed-on: http://gerrit.openafs.org/10896 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit cb18fbde6536942e4bc87bd5943a13cc14fbe332)
Change-Id: Ic66d9028e4f15bd7a9d7c80db84087879560f4d2
Reviewed-on: http://gerrit.openafs.org/10926 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Thu, 13 Mar 2014 20:37:10 +0000 (16:37 -0400)]
Do not use garbage-collection for DARWIN ObjC apps
Xcode 5.1 does not support this feature.
Reviewed-on: http://gerrit.openafs.org/10890 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 52a9d1929518feab17b81b0a9db7ba45f69d5071)
Change-Id: Ia383e1f9c4ee4ae19ed81cfedb1541399d7546b3
Reviewed-on: http://gerrit.openafs.org/10925 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Thu, 13 Mar 2014 19:30:42 +0000 (15:30 -0400)]
Remove static const char copyright[]
We do not have copyright strings in our other executables for the other
copyright statements applicable to them, so these are rather exceptional.
They also cause build failures with OS X Xcode 5.1 and --enable-checking .
Andrew Deason [Thu, 4 Apr 2013 22:35:01 +0000 (17:35 -0500)]
viced: Avoid issuing redundant TMAY requests
Currently, if a new Rx connection comes in from a host we already have
a host struct for, we make a TellMeAboutYourself (TMAY) call to the
given host, to verify the UUID (and caps, interface info, etc) is what
we expect it to be. That is, if it's still the "same" host that we
know about. This is necessary because we otherwise have no way of
telling if the Rx connection is from the same host, or from a new host
that just happens to have the same IP address (e.g. in the case that
hosts are moving around and changing IPs). We do this while the host
is locked, so we only issue these TMAY calls one at a time.
If a large number of Rx connections come in from the same host at
around the same time, this can result in a lot of TMAY requests being
issued against the host, even for hosts that never change IPs and
never do anything strange. In these situations, issuing so many TMAYs
is useless. If we have several calls waiting to lock the host to issue
a TMAY, some of the extra TMAY calls are provably useless. So instead
of calling TMAY repeatedly, remember what the last successful TMAY
result was, and reuse it for the "provably useless" calls.
Note that this 'cache' stores the actual raw results of
TellMeAboutYourself. We could save some memory by storing just how we
interpret that data later on in h_GetHost_r, but this way results in
way simpler h_GetHost_r logic. Since, we can use the same code paths
as for a "real" TMAY call.
Michael Meffie [Sat, 15 Mar 2014 15:31:27 +0000 (11:31 -0400)]
doc: fix typo in volinfo man page
Reviewed-on: http://gerrit.openafs.org/10904 Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com> Tested-by: Ken Dreyer <ktdreyer@ktdreyer.com>
(cherry picked from commit 3a0c348d6ebc375f11d2bab70de9a00f5905fe94)
Change-Id: I8a39bdc1cb4bff509d54ef7c76d4b8735505c0e1
Reviewed-on: http://gerrit.openafs.org/10931 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Fri, 31 Jan 2014 22:46:12 +0000 (16:46 -0600)]
afs: Throttle byte-range locks warnings per-file
Currently, the warning messages about byte-range locks are throttled
only according to what the last PID of the locking process was. So, if
that same process performs a bunch of byte-range locks a bunch of
times, we log this warning message at most once every 2 minutes.
However, if we have even just one other process also performing
byte-range locks, the throttling can become pretty useless as
lastWarnPid ping-pongs back and forth between the two different PIDs.
This can happen if multiple unrelated byte-range-lock-using pieces of
software just happen to be running on the same machine, or if a piece
of software uses byte-range locks after forking into separate
processes.
To avoid flooding the log in situations like this, keep track of the
last warn time in the relevant vcache, so we don't get frequent
warnings for byte-range lock requests on the same file.
Reviewed-on: http://gerrit.openafs.org/10796 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 9f90b12e14e5511cb1c11cbc4d85cfa291be861f)
Change-Id: Ia5426e97fa431e6b9eeb1c82e03c74c959249e9a
Reviewed-on: http://gerrit.openafs.org/10839 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Fri, 31 Jan 2014 22:40:35 +0000 (16:40 -0600)]
afs: Include FID in DoLockWarning
Provide the FID that is being locked when we warn about byte-range
locks, so the user can find what file the process is trying to lock.
Reviewed-on: http://gerrit.openafs.org/10795 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 4f253106dc5d1a5280b0a5be393df0e87e00a661)
Change-Id: I369e9505583c1b6b820b5bc54b8e4908ab8bf3e5
Reviewed-on: http://gerrit.openafs.org/10838 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Fri, 31 Jan 2014 22:36:44 +0000 (16:36 -0600)]
afs: Refactor DoLockWarning
Change DoLockWarning around a little bit, so subsequent changes are
easier to follow. Move lastWarnTime/lastWarnPid so they are only
usable within this function.
This commit should incur no functional change.
Reviewed-on: http://gerrit.openafs.org/10794 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit c73883e7846fa0421cfac29830c27c9b6aacf5ed)
Change-Id: Ie419aa5110f9c72f99514c8159c10582747601db
Reviewed-on: http://gerrit.openafs.org/10837 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Tue, 12 Oct 2010 22:46:36 +0000 (17:46 -0500)]
viced: Add options for interrupting clients
Add the -offline-timeout and -offline-shutdown-timeout options to the
fileserver, to implement interrupting clients accessing volumes we are
trying to take the volume offline. Document the new options.
Currently this is only implemented for read operations. Implementing
this for write operations and callback breaks will require more work.
This also removes the VGetVolumeTimed interface from the volume
package, since the fileserver was the only user and with this change
the fileserver now uses the VGetVolumeWithCall interface.
Andrew Deason [Fri, 29 Oct 2010 16:29:37 +0000 (11:29 -0500)]
vol: Interrupt RX calls accessing offlining vols
When we are waiting for a volume to go offline, only wait a certain
amount of time for it to go offline before we interrupt all RX calls
associated with that volume. This amount of time is configurable in
the new offline_timeout and offline_shutdown_timeout volume package
option fields.
just try to give up callbacks at shutdown. at this point if
you're running 1.4.5 or older, you're sad anyway.
Reviewed-on: http://gerrit.openafs.org/3404 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@dementia.org>
(cherry picked from commit cee2c677d7de66a510d05978e3b41dcd5d8aca78)
Change-Id: I56e6b9e0e5f2921126a468854a1735b257e05219
Reviewed-on: http://gerrit.openafs.org/6272 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Andrew Deason <adeason@sinenomine.net> Tested-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Marc Dionne [Fri, 5 Jul 2013 16:50:36 +0000 (12:50 -0400)]
bos: Do encryption if requested
Commit d008089a79 didn't replace the processing of the aencrypt
flag passed to the GetConn() function, causing all bos connections
to be un-encrypted. This causes "addkey" to fail with an error
from the server, and "listkeys" to silently ignore the -showkey
option to display keys.
Set the AFSCONF_SECOPTS_ALWAYSENCRYPT flag, and don't set
AFSCONF_SECOPTS_FALLBACK_NULL since fallback is not acceptable if
the caller requested enrcyption.
Simon Wilkinson [Fri, 8 Mar 2013 16:15:51 +0000 (16:15 +0000)]
bos: Remove theoretical overflow in DateOf
DateOf copies the results of ctime into a static buffer. Typically
ctime will return a 26 byte string, but if you pass it a year larger
than 9999 (which we shouldn't), you can get a 32 (or more) byte string.
Get rid of this unlikely event by using strlcpy for the copy. We already
truncate at 24 bytes when we remove the \n, so this shouldn't cause any
further problems.
Really, this whole thing should be rewritten to use strftime.
Simon Wilkinson [Fri, 8 Mar 2013 13:01:28 +0000 (13:01 +0000)]
bos: Don't overflow cellname buffer
Don't overflow the fixed sized cellname buffer when copying the
information in from the command line - instead, just use a
dynamically allocated buffer.
Michael Meffie [Thu, 6 Mar 2014 16:42:52 +0000 (11:42 -0500)]
doc: fix typo on ka-forwarder man page
Reviewed-on: http://gerrit.openafs.org/10873 Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Tested-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 189a17146e789f2cf716ed3a477ed6f54776df12)
Change-Id: Ic4e2f4cc2947946a5e41bb71152ef6d5683048f4
Reviewed-on: http://gerrit.openafs.org/10875 Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Mon, 23 Dec 2013 17:10:36 +0000 (12:10 -0500)]
vol: reset nextVnodeUnique when uniquifier rolls over
The on disk uniquifier counter is set to 200 more than the current
nextVnodeUnique counter when the volume information is updated to disk. When
the nextVnodeUnique is near UINT32_MAX, then the uniquifier counter rolls
over. This can happen during a volume header update due to
VBumpVolumeUsage_r().
With this change, the nextVnodeUnique customer is reset to 2 and the
uniquifier is reset to 202 when a roll over occurs. (uniquifier of 1 is
reserved for the root vnode.)
With this change, the number of possible uniquifier numbers is limited to
200 less than UINT32_MAX.
The following shows a series of vnode creation/deletions to illustrate
the uniquifier rollover before this commit:
Michael Meffie [Mon, 23 Dec 2013 16:42:19 +0000 (11:42 -0500)]
vol: fix nextVnodeUnique roll over
Fixes for the per volume nextVnodeUnique counter roll over. Uniquifier number 1
is reserved for the root vnode, so reset the unique count to 2 when the
nextVnodeUnique counter rolls over.
Update the disk backed V_uniquifier count when the in-memory nextVnodeUnique
counter rolls over during the creation of a new vnode. If the nextVnodeUnique
rolls over when V_uniquifier is UINT32_MAX, then the V_uniquifier is not updated
and remains at UINT32_MAX until the next VUpdateVolume_r() call for the volume.
This bug is usually masked by the VBumpVolumeUsage(), which on every 128 volume
accesses, bumps the V_uniquifier to be 200 more than the current
nextVnodeUnique counter. This causes the V_uniquifier to roll over before
reaching UINT32_MAX. (The number of access before updating the headers is set
in the usage_threshold volume package option, which is currently set to 128 by
default.)
The following shows the unique counters for a series of vnode
creation/deletions before this commit. The nextVnodeUnique rolls over to 1,
and the uniquifier is not reset. The `usage_threshold' was set to a value
greater than 200 to avoid the VBumpVolumeUsage() calls during this test run.
Andrew Deason [Wed, 18 Sep 2013 21:56:23 +0000 (16:56 -0500)]
vol: Nuke parent vol special inodes
When we "nuke" a volume, we delete all inodes we can find that are for
the given volume id. This currently means that if we nuke an RW volume
id, we delete all of the inodes for file data for the entire volume
group (since they're all stored in the VG id), but we do not delete
the special inodes for any non-RW volumes in that volume group. Those
special inodes left behind are not very useful, since we just deleted
all of the actual file data.
Currently this means that on namei, it's impossible to nuke the
special inodes for non-RW volumes, since the namei nuke will only look
in the subdir for the given volume id. If you give it the RW volume
id, it won't delete the special inodes as menioned above; if you give
it the RO volume id, it will only look in the RO subdir, and won't
find the RO special inodes in the RW subdir.
If a volume group is damaged in such a way that the salvager cannot
fix it (due to a bug), this means that it is impossible to get rid of
that volume group completely from the partition on namei without
manually running "rm -rf" on the relevant AFSIDat directory. Normally
we have a failsafe of running 'vos zap -force', but that doesn't work
for non-RW special inodes, as mentioned above.
So, in order to allow this 'vos zap -force' failsafe to work in
hopefully all situations, also delete the special inodes for the
parent volume. Use similar logic as exists in the salvager's
OnlyOneVolume function.
Andrew Deason [Thu, 3 Oct 2013 17:51:41 +0000 (12:51 -0500)]
salvager: Handle multiple/inconsistent linktables
The ListAFSSubDirs code in namei_ops.c currently detects
incorrectly-named linktable files, and whines about them and says the
salvager will handle them. However, the salvager doesn't really handle
them, since we just use the first linktable we find (FindLinkHandle)
without checking any of the information about it.
So, check for these. Fix FindLinkHandle to only consider a linktable
the "real" linktable to use if it actually matches the volume group id
we're salvaging. Also delete any inconsistent linktables via the new
function CheckDupLinktable later on.
Note that inconsistently-named linktables have been known to have been
created in the past due to a bug in the salvager (fixed by ae227049),
and possibly due to other unknown issues.
Reviewed-on: http://gerrit.openafs.org/10322 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 602e8eb2000be02ef2a6627633b7ba80ea847762)
Change-Id: I472e250bbe5dcb4de44111ac705c9a319abf2b44
Reviewed-on: http://gerrit.openafs.org/10811 Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Fri, 30 Aug 2013 19:21:16 +0000 (14:21 -0500)]
namei: Ignore misplaced files
The namei salvaging/ListViceInodes code currently ignores files where
we cannot derive an inode number from a given filename. However, if a
file is a valid inode filename, but is in the wrong directory, we
still record it. This can cause the salvager to abort, since it
assumes inode e.g. 12345 is present, but when it tries to open 12345,
namei translates the inode to a nonexistant path, and we bail out.
It is unknown how a namei directory structure can reach this state,
but try to handle it. To be on the safe side, just ignore the files,
and log a message about them. That way, if the files are required for
reconstructing the volume or contain important data, they are still
available if needed. And if they contain incorrect or old data, we
don't screw up the volume by trying to use them.
Thanks to Sabah S. Salih for reporting a related issue.
Reviewed-on: http://gerrit.openafs.org/10214 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 1096582bde6156bb469f2e397cbc40d13a8f2822)
Change-Id: I9252877fbfe01328ac4a8692ebe28a86913b9713
Reviewed-on: http://gerrit.openafs.org/10810 Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Thu, 3 Oct 2013 17:38:08 +0000 (12:38 -0500)]
salvager: Ignore linktable-only RW volumes
In general, the salvager will try to salvage any volume if we find an
inode for that volume. However, for namei, we'll always have at least
one inode for the RW volume, even if we only have e.g. an RO volume at
a particular site, since the linktable special inode is always marked
as for the RW volume id. So, if we salvage a volume group that only
has an RO, normally we would also try to salvage the corresponding RW,
even if it doesn't exist. We would then recreate the "missing"
metadata files, so after salvaging, the RW appears to exist as a
normal volume.
The salvager currently tries to avoid this by skipping salvaging the
RW if we find more than one volume in the volume group, and if the RW
only has one special inode, and that one special inode is the
linktable. This solves the problem most of the time, but misses a few
corner cases:
- If we found more than one linktable, we'll try to salvage the RW
anyway. This shouldn't happen, but certain cases of corruption can
cause incorrectly-named linktables, resulting in multiple
linktables.
- If we only find one volume (the RW), we'll still salvage the RW,
even if the only inode for it is a single linktable. This can
happen due to botched salvages in the past, or interrupted deletes
and such. It's just cruft.
In any situation like those, we cause an RW volume to be created where
there previously was none. This can be a problem, since the RW volume
is unknown to the administrator, and does not appear in the VLDB. Such
"phantom" volumes can be very confusing and can cause problems in the
future. For example, if that same RW volume is moved to the server
with the "phantom" RW volume, we now have two of the same RW volume on
the same server on different partitions, which is a big problem.
So, to avoid these corner cases, check all of the special inodes to
see if all of them are linktables. Also perform this check if we don't
have any non-special inodes (even if we only see 1 volume), to catch
the "cruft" case above.
Andrew Deason [Tue, 1 Oct 2013 22:31:44 +0000 (17:31 -0500)]
namei: Set inconsistent linktable linkCount to 0
Currently, if we detect an inconsistent linktable filename (where the
filename indicates it's for a different volume than the directory path
indicates), we don't set the linkCount for the inode info. This means
that our caller will get random garbage for the linkCount.
In many cases this value is ignored, but for the salvager, if this is
the only linktable file we find, we treat it as the linktable we
should be using. Thus, if linkCount contains undefined data, we might
try to INC or DEC the linktable a bunch of times, depending on what
random stack garbage the linkCount is filled with.
The salvager shouldn't be INC/DEC'ing these linktables according to
the their linkCount anyway, but in the meantime, at least ensure that
this doesn't contain stack garbage, so we ensure that we won't try to
INC or DEC this thousands or millions of times.
Andrew Deason [Sat, 23 Feb 2013 04:46:12 +0000 (22:46 -0600)]
viced: Improve CallPreamble error messages
These messages are not very useful right now. At least try to say what
host we sent an error to, so we know which host may be experiencing
some troubles as a result.
Reviewed-on: http://gerrit.openafs.org/9381 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 6c41b1f740e16b5b9adfe9026630595be6f0699e)
Change-Id: I4e9cf5e0d038c572895b4a31bfdff481ea0b3286
Reviewed-on: http://gerrit.openafs.org/10756 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Tue, 28 Jan 2014 00:03:59 +0000 (18:03 -0600)]
afs: Translate VNOSERVICE to ETIMEDOUT
Some fileservers will kill calls that are taking too long with the
VNOSERVICE abort code. Our logic for retrying calls is already aware
of this usage, but if we cannot retry the call, we still just return
VNOSERVICE as an error code to our caller.
Don't return this raw, since has the same value as ENOBUFS, which can
cause a confusing error message from logs or applications ("No buffer
space available"). Return ETIMEDOUT instead.
Reviewed-on: http://gerrit.openafs.org/10766 Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: Andrew Deason <adeason@sinenomine.net>
(cherry picked from commit 335a70653adb59795f262663af3972de016c068d)
Change-Id: Ia0b4dbfb61353c08917898c3cb9128625023f311
Reviewed-on: http://gerrit.openafs.org/10814 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Thu, 26 Dec 2013 21:42:46 +0000 (16:42 -0500)]
afs: Treat vc_error as a CheckCode-translated code
The vcache field vc_error is generally treated as an error code that
has been translated through afs_CheckCode, but this is inconsistent in
a few places. Fix this in a few ways:
- Adjust afs_nfsrdwr so we do not call afs_CheckCode on vc_error,
translating the error code twice.
- Change afs_close to store vc_error in code_checkcode, and have the
logging code check for specific values in code_checkcode as well.
Log unknown values of code and code_checkcode, so we can
distinguish between e.g. a 'code' value of VBUSY, and a
'code_checkcode' value of ETIMEDOUT.
Reviewed-on: http://gerrit.openafs.org/10634 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit 34e4a4fed356fbda9fc8ace1d01a080bd09238b0)
Change-Id: Icceee0c82b0704e0d445f96946b493b4be424506
Reviewed-on: http://gerrit.openafs.org/10813 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Fri, 20 Dec 2013 18:16:37 +0000 (12:16 -0600)]
afs: Return raw code from background daemons
Currently, a background daemon processing a 'store' request will
return any error code in the 'code' field in the brequest structure,
for processing by anyone that's waiting for the response. Since any
waiter will not have access to the treq for the request, they won't be
able to call afs_CheckCode on that return code, so the background
daemon calls afs_CheckCode before returning its error code.
Currently, afs_close uses the 'code' value from the background daemon
as if it were not passed through afs_CheckCode. That is, if all
background daemons are busy, we get our 'code' directly from
afs_StoreOnLastReference, and if we use a background daemon, our
'code' is tb->code. But these values are two different things: the
return value from afs_StoreOnLastReference is a raw error code, and
the code from the background daemon (tb->code) has been translated
through afs_CheckCode.
This can be confusing, in particular for the scenario where a
StoreData fails because of network errors or because of a VBUSY error.
If we get a network error when the request went through a background
daemon, afs_CheckCode will translate this to ETIMEDOUT, which is
commonly value 110, the same as VBUSY. So, an ETIMEDOUT error from the
background daemon is difficult to distinguish from a VBUSY error from
a direct afs_StoreOnLastReference call. Either case can result in a
message to the kernel like the following:
afs: failed to store file (110)
To resolve this, have the background daemon store both the 'raw' error
code, and the error code that has been translated through
afs_CheckCode. afs_close can then use the raw error code when
reporting messages like normal, but can still use the translated error
code to return to the caller, if it has a translated error. With this
change, now afs_close will always log "network problems" for a network
error, regardless of if the error came in via a background daemon or a
direct afs_StoreOnLastReference call.
In Irix's afs_delmap, we just remove the old usage of tb->code, since
the result was not used for anything.
Reviewed-on: http://gerrit.openafs.org/10633 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 7f58e4ac454f9c06fb2d51ff0a17b8656c454efe)
Change-Id: Id5935d41b0d20000f06b39c48649cd7d0dd2fd81
Reviewed-on: http://gerrit.openafs.org/10812 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Mon, 30 Sep 2013 22:53:36 +0000 (17:53 -0500)]
salvager: Fix in-memory invalid linktable counts
When we have a nonexistant or invalid linktable, we manually set all
of the linkcounts to 1, since we're recreating the link table from
scratch. However, we also have a linkCount count in our in-memory
allInodes array, which could be populated by garbage if we had a
garbage linktable. So make sure to set our in-memory linkCount to 1
for each inode, so we don't use garbage linkcount data.
Simon Wilkinson [Fri, 30 Mar 2012 18:41:17 +0000 (19:41 +0100)]
afs: Handle reading past the end of a file
... except that this change doesn't actually handle this, it just
stops clang from throwing an error about the bogus code that's already
in there. This needs fixed properly ...
This change differs slightly from the one on master because on master,
afs_MemRead and afs_UFSRead were consolidated into afs_read(). On the
1.6 branch, we must patch the two functions separately.
Change-Id: I7d8d104c89355c0a3294372340af0e02ab170b59
Reviewed-on: http://gerrit.openafs.org/10744 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Tue, 10 Dec 2013 23:02:34 +0000 (17:02 -0600)]
cellconfig: Do not use 'long' for dbserver IPs
A few places in this file assume that our dbserver IP addresses are
"long"s. A long int can be 8 bytes on some platforms, but we know
these IP addresses are all 4-byte integers. In the rare instances
where we have the maximum number of dbservers, this can overwrite a
bit of extra memory. This can also result in a misaligned access on
platforms such as SPARC v9, since the elements of he->h_addr_list are
not guaranteed to be 8-byte aligned.
So instead, treat these as 4-byte integers. For copying out of
he->h_addr_list, also use a memcpy anyway to be safe, since we are not
guaranteed alignment.
Andrew Deason [Thu, 6 Feb 2014 20:27:12 +0000 (14:27 -0600)]
ihandle: Make _ONCLOSE the sync behavior default
The _DELAYED behavior has had serious problems in the past, so change
the default to be _ONCLOSE instead.
This is a 1.6-only change. On master, the _DELAYED option does not
exist at all, and the _ONCLOSE behavior was made the default when this
option was introduced in master, in commit eb5190eb4a7cd95166866a89e0a8f3a69bbc6e8f.
Change-Id: I01a50e1d829c141c38fbbbaba2c6d2d5a371b130
Reviewed-on: http://gerrit.openafs.org/10809 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Wed, 5 Feb 2014 23:32:16 +0000 (18:32 -0500)]
afs_fetchstore: re-avoid uninitialized variable
As noted in the gerrit comments for change 10742, commit baf6af8a8f2207ce39b746d59ca4bc661c002883 does not handle the case
where the second rx_Read() call fails, and the 'length' variable
can still be used uninitialized.
Instead of using an err label and jumping to it on the case of
errors, initialize length to zero and take care to neither
set nor access *alength if an error has occurred. This is
more consistent with the style of the surrounding code while still
avoiding the use of an uninitialized variable.
Benjamin Kaduk [Fri, 10 Jan 2014 03:42:26 +0000 (22:42 -0500)]
afs_fetchstore: avoid use of uninitialized variable
rxfs_fetchInit() attempts to do a 64-bit RPC first, but falls back
to the 32-bit StartRXAFS_FetchData() if the server appears to not
support the 64-bit RPCs.
We correctly did not read a length from the call if the FetchData
RPC(s) failed, but proceeded to assign from the 'length' local
variable into the 'alength' output variable unconditionally later on.
Instead of blindly continuing on, jump to the error-handling part of
the routine when we cannot read a length from the call. This has the
side effect of skipping an afs_Trace3() point in the error case.
Reviewed-on: http://gerrit.openafs.org/10694 Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit baf6af8a8f2207ce39b746d59ca4bc661c002883)
Change-Id: Icf14d5e8a6abf8a8a014ab7d48b767e3dcc7a6a9
Reviewed-on: http://gerrit.openafs.org/10742 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Tue, 17 Dec 2013 23:30:26 +0000 (17:30 -0600)]
LINUX: Use sock_create_kern where available
Currently, we use sock_create to create our Rx socket. This means that
accesses to that socket (sendmsg, recvmsg) are subject to SELinux
restrictions. For all recvmsg accesses and some sendmsg accesses, this
doesn't matter, since the access will be performed by one of our
kernel threads (running as kernel_t or something similar, which is
unrestricted). Such as: the rx listener, a background daemon, the rx
event thread, etc.
However, sometimes we do run in the context of a normal user process.
For some RPCs like FetchStatus, we tend to run the RPC in the
accessing user thread, which can result in us sendmsg()ing the data
packets with the initial arguments in the user thread. We can also
send delayed ACKs via rx_EndCall, and possibly a variety of other
scenarios.
In any of these situations when we are sendmsg()ing from a user
thread, SELinux can prevent us from sending to the socket, if the
calling user thread context is not able to write to an afs_t
udp_socket. This will result in packets not being sent immediately,
but the packets will be resent later, so access will work, but appear
very slow. This can easily happen for processes that are specifically
constrained by SELinux; for example, webservers are often constrained,
even if most of the rest of the system is not. This can be noticed by
seeing the 'resends' and 'sendFailed' counters rising in 'rxdebug
-rxstat', as well as noticing SELinux access failures if 'dontaudit'
rules are ignored.
To avoid this, use sock_create_kern to create the Rx socket, to
indicate that this is a socket for use by kernel code, and not
accessible by a user. This should cause us to bypass any LSM
restrictions (SELinux, AppArmor, etc). Add a configure check for this,
since this function has not always existed, according to
<https://lists.openafs.org/pipermail/openafs-devel/2004-June/010651.html>
Reviewed-on: http://gerrit.openafs.org/10594 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit e988aa45d765c935fef4bcd35585d6a3594cc497)
Change-Id: Ie04a8ac166dabf9fb8368d47d5624d1f319174bd
Reviewed-on: http://gerrit.openafs.org/10598 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Simon Wilkinson [Fri, 30 Mar 2012 18:34:53 +0000 (19:34 +0100)]
viced: Remove pointless braces
Doing if ((a==b)) is unecessary. It's also potentially dangerous, as
that's the syntax required to do assignment within an if statement.
clang now issues warnings (errors in -Werror mode) when it encounters
these.
Remove pointless braces from viced to make clang happy.
Marc Dionne [Thu, 30 Jan 2014 18:50:37 +0000 (13:50 -0500)]
Linux: When revalidating, don't drop in-use dentries
The Linux client can get into a state where the current working
directory is seen as "deleted" by some tools, while it is still
there and accessible to "ls" and other tools. This has been
reported by several users and sites.
One scenario that has been observed while debugging:
- A process does a chdir() into a directory
- This stores a pointer to the dir's dentry in the task structure
- The server hosting the volume goes offline temporarily
- The dentry for the directory is passed to afs_linux_dentry_revalidate
- afs_linux_dentry_revalidate calls afs_lookup which returns an
error (110 - ETIMEDOUT)
- It then considers the dentry not valid, and calls d_drop()
- d_drop unhashes the dentry unconditionally
- Server comes back up, but dentry is still unhashed
- getcwd() fetches the task structure pointer to the current dir
dentry. If unhashed, it returns ENOENT, and the vfs layer is
not involved at all.
At that point, many things won't work and there is no obvious way
for the user to get the directory rehashed.
Instead of calling d_drop directly, call d_invalidate instead, as
it will only drop (unhash) the dentry if we're the only one holding
a reference. Since d_invalidate will also call shrink_dcache_parent,
also remove that call from our code so it doesn't get called twice.
Reviewed-on: http://gerrit.openafs.org/10774 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit 997f7fce437787a45ae0584beaae43affbd37cce)
Change-Id: I1e2b46fd076e96a7acbf3443f118fac8355d3e8c
Reviewed-on: http://gerrit.openafs.org/10804 Tested-by: Anders Kaseorg <andersk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Fri, 10 Jan 2014 04:34:30 +0000 (23:34 -0500)]
Disable some explicit sbrk() usage
Mac OS X 10.9 now considers this function deprecated and warns on
its use, causing the buildslave configuration to error out.
On master, we added a library routine to get a process's size to opr;
opr is not present on the 1.6 branch so another route is needed here.
Since use of the OS X malloc implementation appears to have no
effect on the result of sbrk(0), there is no loss of functionality
by replacing the function call with a (different) constant value.
There may still be some value in sbrk(0) on other systems, so
only disable sbrk() for OS X, on the stable branch.
This change is specific to the 1.6 branch.
Change-Id: Ie5f96e923b78be22a9ce83d0a35a7675d517b073
Reviewed-on: http://gerrit.openafs.org/10746 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Wed, 22 Jan 2014 05:00:00 +0000 (06:00 +0100)]
cmd: Avoid unsafe use of strncat
The NName function was using strncat(a, b, sizeof(a)), which doesn't
work as you would expect if 'a' already contains data, giving a potential
buffer overflow.
This was fixed on master in commit 9a007a9df43645b63a8b642029b4931928f9268b
by using strlcat from libroken, but we do not use libroken on the 1.6
branch. Instead, modify the strncat invocation to use a safer maximum
length to copy.
This is a 1.6-specific change.
Change-Id: Ifa41e603a1c98682550afadd063def4b9706d9e2
Reviewed-on: http://gerrit.openafs.org/10731 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Tue, 21 Jan 2014 19:59:59 +0000 (14:59 -0500)]
Search srcdir and objdir paths for rxkad includes
The addition of rxkad-k5 support in 1.6.5 introduced dependencies
on rxkad to the auth and afsauthent libraries. However, the rxkad
headers used are both source files and generated files, so we must
add both the source and build tree rxkad directories to the include
search path.
This is a 1.6-only change, since on master we are using libtool
and do not need to reach into other parts of the source tree
to rebuild certain files into these libraries.
Change-Id: I819095a3e0ac259bba43205d0462659cbd2c6f03
Reviewed-on: http://gerrit.openafs.org/10736 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: Stephan Wiesand <stephan.wiesand@desy.de>
Simon Wilkinson [Fri, 22 Feb 2013 10:30:56 +0000 (10:30 +0000)]
afsmonitor: Allow CBSTATS collection to work
The switch which selects the collection number was missing a
'break', so selecting the CBSTATS collection would always fall
through to the default, error, case.
Simon Wilkinson [Fri, 30 Mar 2012 18:14:38 +0000 (19:14 +0100)]
libadmin: read returns an ssize_t, not a size_t
size_t is unsigned, and therefore can never be less than 0. Using it as
a return code from read() means that we never catch read errors. read()
is defined as returning ssize_t, so just use this to capture its return
code.
it's claimed these are not initialized before use.
squelch compiler errors. has to be in parent as otherwise
we will zero them in our loop where we potentially want the
parent group id, which is not on "this" line as we add members.
Benjamin Kaduk [Fri, 10 Jan 2014 04:54:45 +0000 (23:54 -0500)]
Disable deprecated warnings for krb5 routines
In OS X 10.9 Mavericks, Apple has marked all of the krb5 routines
as deprecated (in favor of the GSS framework). We must disable
these warnings in order to allow the buildslave to have a successful
build.
Luckily, Apple has left in rope for us to programmatically disable
the deprecated attribute with a preprocessor macro. Defining this
macro should be safe everywhere, so do so unconditionally.
This commit touches a few more files than the version on master does,
since the 1.6 branch is using the krb5 library for its rxkad-k5
implementation; the files in auth/ and rxkad/ are specific to 1.6.
Simon Wilkinson [Fri, 30 Mar 2012 18:37:36 +0000 (19:37 +0100)]
rx: Handle negative returns on packet reads
rxi_RecvMsg returns an int, because it can return a negative value upon
error. Don't store its return value as an unsigned int, because this may
hide the potential errors.
Modify the error handling loop so that errors get to where they are
intended.
Simon Wilkinson [Fri, 30 Mar 2012 18:12:37 +0000 (19:12 +0100)]
Unix CM: Purge needless brackets
Doing if ((a==b)) is unecessary. It's also potentially dangerous, as
that's the syntax required to do assignment within an if statement.
clang now issues warnings (errors in -Werror mode) when it encounters
these.
Remove pointless braces from the Unix CM to make clang happy.
Simon Wilkinson [Fri, 30 Mar 2012 18:30:18 +0000 (19:30 +0100)]
vol: Remove unneeded braces
Doing if ((a==b)) is unecessary. It's also potentially dangerous, as
that's the syntax required to do assignment within an if statement.
clang now issues warnings (errors in -Werror mode) when it encounters
these.
Remove pointless braces from vol to make clang happy.
Simon Wilkinson [Fri, 30 Mar 2012 18:24:23 +0000 (19:24 +0100)]
ptserver: Remove redundant braces
Doing if ((a==b)) is unecessary. It's also potentially dangerous, as
that's the syntax required to do assignment within an if statement.
clang now issues warnings (errors in -Werror mode) when it encounters
these.
Remove pointless braces from ptserver to make clang happy.
Simon Wilkinson [Fri, 30 Mar 2012 18:39:51 +0000 (19:39 +0100)]
rx: Remove needless braces
Doing if ((a==b)) is unecessary. It's also potentially dangerous, as
that's the syntax required to do assignment within an if statement.
clang now issues warnings (errors in -Werror mode) when it encounters
these.
Remove pointless braces from the Unix CM to make clang happy.
Reviewed-on: http://gerrit.openafs.org/7088 Tested-by: Simon Wilkinson <simonxwilkinson@gmail.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 5e107724f3661254cfdb693ae2d4d1c5238eba7d)
Change-Id: I99a04d9a2c547e34a3daca6f9e6714f6c7b76b9c
Reviewed-on: http://gerrit.openafs.org/10732 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Arne Wiebalck [Fri, 13 Dec 2013 10:46:04 +0000 (11:46 +0100)]
make openafs uninstallable even if /afs is missing
The preuninstall scriptlet of the openafs RPM removes /afs. If, for
whatever reason, that directory does not exist, the scriptlet will
fail and hence break the deinstallation of the openafs package. The
proposed patch makes the scriptlet evaluate to true even if the /afs
has been removed by some other means and allows the package to be
uninstalled.
Andrew Deason [Thu, 26 Dec 2013 17:56:37 +0000 (12:56 -0500)]
Fedora: Handle new kernel variant paths
With Fedora 20, Fedora now separates the variant from the rest of the
kernel version with a plus (+) instead of a period (.) . This results
in directories called e.g. 3.12.5-302.fc20.i686+PAE, where right now
we look for 3.12.5-302.fc20.i686.PAE.
Use this new directory scheme for Fedora 20 builds, so we can build
against non-default kernel variants on Fedora 20 and beyond.
Reviewed-on: http://gerrit.openafs.org/10620 Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com>
(cherry picked from commit 837ec9dd41c4b1e10ad9d32a52b0f34dd665026a)
Change-Id: I513ab231a9d7b61ec7790eb99a27da698a355f17
Reviewed-on: http://gerrit.openafs.org/10622 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Mon, 23 Dec 2013 18:32:28 +0000 (13:32 -0500)]
RedHat: Munge future kernel versions
We currently look for "fc1?" (that is, fc10 through fc19) when trying
to munge the kernel version in some ways. This broke on Fedora 20,
since 20 obviously does not match "fc1?". Similarly, we look
specifically for "el6" for RHEL6 versioning quirks, but these will
break on RHEL7 and beyond.
Change the version checks so that this will work all the way through
Fedora 99 and RHEL 9. That won't work forever, but it will keep us
working for a few versions if the versioning quirks do not change.
Reviewed-on: http://gerrit.openafs.org/10618 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com>
(cherry picked from commit cddc732ec5fd40c94126e5f0b7103136592a2efe)
Change-Id: I439cd3101ea360b775c638cd67961fc0e4ffcaf6
Reviewed-on: http://gerrit.openafs.org/10619 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Tue, 17 Dec 2013 23:27:53 +0000 (17:27 -0600)]
rx: Remove obsolete comment
This comment refers to the fact that we used to be just checking for
SELinux to see if we should pass that extra argument. Ever since
commit cb1b41b159b98881f66319d7f65d941ba9fab911, we do have a better
test for this.
Reviewed-on: http://gerrit.openafs.org/10593 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit 2ed7023b26acb3277e42eac803a0702b95167e6e)
Change-Id: I5a8ebcda7fcb85931638ab0bec807b1da8ebed3f
Reviewed-on: http://gerrit.openafs.org/10597 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
The fileserver-side "NAT ping" behavior has yet to be proven to be helpful in
situations with NATs. If the behavior is not helpful, this generates
potentially a significant amount of extra useless traffic. So until it can be
shown to what degree this is helpful, keep this behavior out of the fileserver.