Andrew Deason [Fri, 31 Jan 2014 22:46:12 +0000 (16:46 -0600)]
afs: Throttle byte-range locks warnings per-file
Currently, the warning messages about byte-range locks are throttled
only according to what the last PID of the locking process was. So, if
that same process performs a bunch of byte-range locks a bunch of
times, we log this warning message at most once every 2 minutes.
However, if we have even just one other process also performing
byte-range locks, the throttling can become pretty useless as
lastWarnPid ping-pongs back and forth between the two different PIDs.
This can happen if multiple unrelated byte-range-lock-using pieces of
software just happen to be running on the same machine, or if a piece
of software uses byte-range locks after forking into separate
processes.
To avoid flooding the log in situations like this, keep track of the
last warn time in the relevant vcache, so we don't get frequent
warnings for byte-range lock requests on the same file.
Andrew Deason [Fri, 30 Aug 2013 19:21:16 +0000 (14:21 -0500)]
namei: Ignore misplaced files
The namei salvaging/ListViceInodes code currently ignores files where
we cannot derive an inode number from a given filename. However, if a
file is a valid inode filename, but is in the wrong directory, we
still record it. This can cause the salvager to abort, since it
assumes inode e.g. 12345 is present, but when it tries to open 12345,
namei translates the inode to a nonexistant path, and we bail out.
It is unknown how a namei directory structure can reach this state,
but try to handle it. To be on the safe side, just ignore the files,
and log a message about them. That way, if the files are required for
reconstructing the volume or contain important data, they are still
available if needed. And if they contain incorrect or old data, we
don't screw up the volume by trying to use them.
Thanks to Sabah S. Salih for reporting a related issue.
Andrew Deason [Thu, 3 Oct 2013 17:51:41 +0000 (12:51 -0500)]
salvager: Handle multiple/inconsistent linktables
The ListAFSSubDirs code in namei_ops.c currently detects
incorrectly-named linktable files, and whines about them and says the
salvager will handle them. However, the salvager doesn't really handle
them, since we just use the first linktable we find (FindLinkHandle)
without checking any of the information about it.
So, check for these. Fix FindLinkHandle to only consider a linktable
the "real" linktable to use if it actually matches the volume group id
we're salvaging. Also delete any inconsistent linktables via the new
function CheckDupLinktable later on.
Note that inconsistently-named linktables have been known to have been
created in the past due to a bug in the salvager (fixed by ae227049),
and possibly due to other unknown issues.
Change-Id: Iac461e1254e1f73406a2bc74eaa5a5f53d697304
Reviewed-on: http://gerrit.openafs.org/10322 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
Andrew Deason [Fri, 31 Jan 2014 22:36:44 +0000 (16:36 -0600)]
afs: Refactor DoLockWarning
Change DoLockWarning around a little bit, so subsequent changes are
easier to follow. Move lastWarnTime/lastWarnPid so they are only
usable within this function.
Marc Dionne [Thu, 30 Jan 2014 18:50:37 +0000 (13:50 -0500)]
Linux: When revalidating, don't drop in-use dentries
The Linux client can get into a state where the current working
directory is seen as "deleted" by some tools, while it is still
there and accessible to "ls" and other tools. This has been
reported by several users and sites.
One scenario that has been observed while debugging:
- A process does a chdir() into a directory
- This stores a pointer to the dir's dentry in the task structure
- The server hosting the volume goes offline temporarily
- The dentry for the directory is passed to afs_linux_dentry_revalidate
- afs_linux_dentry_revalidate calls afs_lookup which returns an
error (110 - ETIMEDOUT)
- It then considers the dentry not valid, and calls d_drop()
- d_drop unhashes the dentry unconditionally
- Server comes back up, but dentry is still unhashed
- getcwd() fetches the task structure pointer to the current dir
dentry. If unhashed, it returns ENOENT, and the vfs layer is
not involved at all.
At that point, many things won't work and there is no obvious way
for the user to get the directory rehashed.
Instead of calling d_drop directly, call d_invalidate instead, as
it will only drop (unhash) the dentry if we're the only one holding
a reference. Since d_invalidate will also call shrink_dcache_parent,
also remove that call from our code so it doesn't get called twice.
Arne Wiebalck [Fri, 10 Jan 2014 16:29:11 +0000 (17:29 +0100)]
Log shutdown progress
Shutting down fileservers with thousands of volumes can take a while and
it is helpful for operations to actually see that there is progress when
detaching volumes. This patch adds a log message to the fileserver log
every time 100 volumes have been detached.
Andrew Deason [Thu, 30 Jan 2014 20:43:57 +0000 (14:43 -0600)]
afs: Pay attention to fetchOps->destroy error code
The ->destroy function in our fetchops could change our error code, or
even raise a new error. Don't ignore it. This currently doesn't do
much, since fetchDestroy currently won't change the error code if it's
given an error, but this can change in the future.
Jeffrey Altman [Mon, 27 Jan 2014 05:30:20 +0000 (00:30 -0500)]
Windows: cm_GetAddrsU wrapper for VL_GetAddrsU
cm_GetAddrsU() is a wrapper for the VL_GetAddrsU() RPC. The initial
version is a bare bones replacement for the VL_GetAddrsU() call from
cm_UpdateVolumeLocation(). Future changes will add caching.
Jeffrey Altman [Mon, 27 Jan 2014 05:14:36 +0000 (00:14 -0500)]
Windows: replace cm_allServersp list with osi_queue
Replace the cm_allServersp list with an osi_queue. This simplifies
the Add/Remove functionality which will be required in case of VLDB
server uniquifier changes.
Andrew Deason [Tue, 28 Jan 2014 00:03:59 +0000 (18:03 -0600)]
afs: Translate VNOSERVICE to ETIMEDOUT
Some fileservers will kill calls that are taking too long with the
VNOSERVICE abort code. Our logic for retrying calls is already aware
of this usage, but if we cannot retry the call, we still just return
VNOSERVICE as an error code to our caller.
Don't return this raw, since has the same value as ENOBUFS, which can
cause a confusing error message from logs or applications ("No buffer
space available"). Return ETIMEDOUT instead.
Andrew Deason [Thu, 26 Dec 2013 22:17:44 +0000 (17:17 -0500)]
afs: Fix afs_CheckCode identifier collision
The last argument to afs_CheckCode should be unique so the call site
can be identified if fstrace is turned on. BStore and BPartialStore
were both using 43, so change BPartialStore to 430 to avoid the
collision.
Change-Id: I81a43ee41623fad10d0e70a7d9c8e6029aba30eb
Reviewed-on: http://gerrit.openafs.org/10635 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Thu, 26 Dec 2013 21:42:46 +0000 (16:42 -0500)]
afs: Treat vc_error as a CheckCode-translated code
The vcache field vc_error is generally treated as an error code that
has been translated through afs_CheckCode, but this is inconsistent in
a few places. Fix this in a few ways:
- Adjust afs_nfsrdwr so we do not call afs_CheckCode on vc_error,
translating the error code twice.
- Change afs_close to store vc_error in code_checkcode, and have the
logging code check for specific values in code_checkcode as well.
Log unknown values of code and code_checkcode, so we can
distinguish between e.g. a 'code' value of VBUSY, and a
'code_checkcode' value of ETIMEDOUT.
Michael Meffie [Sun, 19 Jan 2014 03:01:59 +0000 (22:01 -0500)]
libadmin: makefile rule for afs_AdminError.h
Add a makefile rule to export the libadmin afs_AdminErrors.h header
file, instead of exporting afs_AdminErrors.h as a side effect of
generating the afs_AdminBosErrors error table.
Add the missing afs_AdminErrors.h dependency to the afs_utilAdmin.o
dependency list.
Michael Meffie [Fri, 22 Nov 2013 17:23:17 +0000 (12:23 -0500)]
config: parallel-safe param.h makefile rule
Generate the param.h.new temporary file in a parallel-safe
way. The rule to generate the three copies of param.h can
run at the the same time under a parallel make, clobbering
the param.h.new temporary file. Instead of creating this file
inline, create a common rule to generate the temporary file
once.
Michael Meffie [Fri, 22 Nov 2013 16:50:11 +0000 (11:50 -0500)]
libafscp: makefile install rule update
Change the makefile install rules to install the header
file from the libafscp directory, and not the top level
include directory to make the install rules consistent
with the rest of the tree.
Michael Meffie [Wed, 1 Aug 2012 21:26:33 +0000 (17:26 -0400)]
comerr: compile_et -emit option for parallel make
Add the -emit option to the compile_et command to support parallel make.
The -emit option allows make to generate the header and the source files
independently, instead of building two files at the some time. This
avoids the issue where one command creates two separate files, which is
difficult to handle correctly for parallel makes.
Change-Id: Ib44a8e358643cf19b4834b3bd4d5b88db6cd0ccf
Reviewed-on: http://gerrit.openafs.org/7921 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Fri, 24 Jan 2014 17:00:20 +0000 (12:00 -0500)]
FBSD: catch up to 1997 and include if_var.h with if.h
The commit message for upstream's r257244 change includes:
- Make the prophecy from 1997 happen and remove if_var.h inclusion
from if.h.
Despite the clear public posting, we were caught unawares. We made
it down to the cellar despite the missing stairs, but "Beware of
the Leopard" caused us to turn back, apparently.
Since if.h is included in many places and if_var.h is not present
on all OSes, pull the if.h inclusion into the common kernel headers
for afs/ and rx/ , and add in if_var.h (as well as the sys/socket.h
prerequisite).
Andrew Deason [Fri, 20 Dec 2013 18:16:37 +0000 (12:16 -0600)]
afs: Return raw code from background daemons
Currently, a background daemon processing a 'store' request will
return any error code in the 'code' field in the brequest structure,
for processing by anyone that's waiting for the response. Since any
waiter will not have access to the treq for the request, they won't be
able to call afs_CheckCode on that return code, so the background
daemon calls afs_CheckCode before returning its error code.
Currently, afs_close uses the 'code' value from the background daemon
as if it were not passed through afs_CheckCode. That is, if all
background daemons are busy, we get our 'code' directly from
afs_StoreOnLastReference, and if we use a background daemon, our
'code' is tb->code. But these values are two different things: the
return value from afs_StoreOnLastReference is a raw error code, and
the code from the background daemon (tb->code) has been translated
through afs_CheckCode.
This can be confusing, in particular for the scenario where a
StoreData fails because of network errors or because of a VBUSY error.
If we get a network error when the request went through a background
daemon, afs_CheckCode will translate this to ETIMEDOUT, which is
commonly value 110, the same as VBUSY. So, an ETIMEDOUT error from the
background daemon is difficult to distinguish from a VBUSY error from
a direct afs_StoreOnLastReference call. Either case can result in a
message to the kernel like the following:
afs: failed to store file (110)
To resolve this, have the background daemon store both the 'raw' error
code, and the error code that has been translated through
afs_CheckCode. afs_close can then use the raw error code when
reporting messages like normal, but can still use the translated error
code to return to the caller, if it has a translated error. With this
change, now afs_close will always log "network problems" for a network
error, regardless of if the error came in via a background daemon or a
direct afs_StoreOnLastReference call.
In Irix's afs_delmap, we just remove the old usage of tb->code, since
the result was not used for anything.
Benjamin Kaduk [Fri, 10 Jan 2014 04:34:30 +0000 (23:34 -0500)]
Remove some explicit sbrk() usage
Mac OS X 10.9 now considers this function deprecated and warns on
its use, causing the buildslave configuration to error out.
Use the library routine to get a process's size instead of inlining
the call to sbrk (which is unlikely to have worked as intended for
quite some time -- most malloc implementations in use do not use
sbrk to get their storage).
The fileserver-side "NAT ping" behavior has yet to be proven to be
helpful in situations with NATs. If the behavior is not helpful, this
generates potentially a significant amount of extra useless traffic.
So until it can be shown to what degree this is helpful, keep this
behavior out of the fileserver.
Benjamin Kaduk [Fri, 10 Jan 2014 05:00:52 +0000 (00:00 -0500)]
Use an explicit symbol for uninitialized vnode types
Avoid trying to get clever with stuffing -1 into an unsigned bitfield,
which causes the value to change and generates a warning from clang.
Just use vNull, which is intended to be used for uninitialized/empty
vnodes.
Benjamin Kaduk [Fri, 10 Jan 2014 04:54:45 +0000 (23:54 -0500)]
Disable deprecated warnings for krb5 routines
In OS X 10.9 Mavericks, Apple has marked all of the krb5 routines
as deprecated (in favor of the GSS framework). We must disable
these warnings in order to allow the buildslave to have a successful
build.
Luckily, Apple has left in rope for us to programmatically disable
the deprecated attribute with a preprocessor macro. Defining this
macro should be safe everywhere, so do so unconditionally.
Benjamin Kaduk [Fri, 10 Jan 2014 04:38:36 +0000 (23:38 -0500)]
viced/callback.c: Ignore dump write errors even harder
Not only do we need to check the return value of write(2), but
we also need to do so in a way that does not leave an empty body
in the if statement, in order to appease the clang-500.2.79 found
on OS X 10.9 with Xcode 5.0.2.
Benjamin Kaduk [Fri, 10 Jan 2014 03:42:26 +0000 (22:42 -0500)]
afs_fetchstore: avoid use of uninitialized variable
rxfs_fetchInit() attempts to do a 64-bit RPC first, but falls back
to the 32-bit StartRXAFS_FetchData() if the server appears to not
support the 64-bit RPCs.
We correctly did not read a length from the call if the FetchData
RPC(s) failed, but proceeded to assign from the 'length' local
variable into the 'alength' output variable unconditionally later on.
Instead of blindly continuing on, jump to the error-handling part of
the routine when we cannot read a length from the call. This has the
side effect of skipping an afs_Trace3() point in the error case.
Benjamin Kaduk [Thu, 9 Jan 2014 22:29:04 +0000 (17:29 -0500)]
rfc3961: Use enctypes, not keytypes
We previously defined the enctype symbols to be aliases for keytype
symbols. The numerical values matched what we wanted (since these
values are specified in an IANA registry), but the C type is not
required to be the same for enctypes and keytypes.
Some of our buildslave configurations notice the type mismatch and
complain, so fix the types by using the enctype enum for enctype symbols
instead of keytypes.
Andrew Deason [Thu, 9 Jan 2014 18:44:44 +0000 (12:44 -0600)]
opr: Silence rbtree warning
On OS X, gcc can complain that 'child' is uninitialized whenever this
'else if' condition is false. We already handled the case where both
node->right and node->left are non-NULL earlier in this function, so
this should never occur. So, to get rid of the warning, just always
take the path in the 'else if', and assert that the right child is
NULL.
Benjamin Kaduk [Mon, 13 Jan 2014 21:08:14 +0000 (16:08 -0500)]
De-duplicate a couple afs_CheckCode uniquifiers
These uniquifiers are supposed to be globally unique, to identify the
call site within the tree. For whatever reason, a couple of them
were duplicated at different call sites; provide new (unique) values
to disambiguate between them.
There remain a couple of uniquifiers which are used in multiple
places, but those are in different architectures' implementations
of afs/ARCH/foo.c, and thus will be globally unique for any particular
build.
Michael Meffie [Mon, 13 Jan 2014 20:28:17 +0000 (15:28 -0500)]
xstat: use ephemeral ports for xstat_fs_test and scout
Instead of trying to bind to port 7101, and then retrying if
the port is in already in use, let the os find an available
port for scout and xstat_fs_test.
This fixes a bug where scout and xstat_fs_test do not call
rx_Finalize() before retrying rx_Init() with a different port
number, causing the program to crash later when more than
one copy of xstat_fs_test and/or scout are running at the same
time.
Marc Dionne [Tue, 3 Dec 2013 19:10:00 +0000 (14:10 -0500)]
Linux 3.13: Check return value from bdi_init
The use of the bdi_init function now gets a warning because the
return value is unused and the function is now defined with
the warn_unused_result attribute.
Andrew Deason [Thu, 4 Apr 2013 22:35:01 +0000 (17:35 -0500)]
viced: Avoid issuing redundant TMAY requests
Currently, if a new Rx connection comes in from a host we already have
a host struct for, we make a TellMeAboutYourself (TMAY) call to the
given host, to verify the UUID (and caps, interface info, etc) is what
we expect it to be. That is, if it's still the "same" host that we
know about. This is necessary because we otherwise have no way of
telling if the Rx connection is from the same host, or from a new host
that just happens to have the same IP address (e.g. in the case that
hosts are moving around and changing IPs). We do this while the host
is locked, so we only issue these TMAY calls one at a time.
If a large number of Rx connections come in from the same host at
around the same time, this can result in a lot of TMAY requests being
issued against the host, even for hosts that never change IPs and
never do anything strange. In these situations, issuing so many TMAYs
is useless. If we have several calls waiting to lock the host to issue
a TMAY, some of the extra TMAY calls are provably useless. So instead
of calling TMAY repeatedly, remember what the last successful TMAY
result was, and reuse it for the "provably useless" calls.
Note that this 'cache' stores the actual raw results of
TellMeAboutYourself. We could save some memory by storing just how we
interpret that data later on in h_GetHost_r, but this way results in
way simpler h_GetHost_r logic. Since, we can use the same code paths
as for a "real" TMAY call.
Andrew Deason [Thu, 19 Dec 2013 20:04:56 +0000 (14:04 -0600)]
DARWIN: Convert crfree back into a macro
Commit 1d8937b860509fcaabb041bc14faf7aa3023f3c9 turned crfree on
DARWIN into an inline function to work around an error flagged by
clang. A side effect of this is that the address passed to
kauth_cred_unref will not be the actual address of the value given to
crfree; we are instead giving kauth_cred_unref the address of our
function argument in order to adhere to the semantics of a function
call.
kauth_cred_unref seems to just take a pointer to the cred pointer in
order to set the value to effectively NULL afterwards, so this is not
a huge deal. However, this does mean that our current implementation
undoes any of the safeguards intended by making kauth_cred_unref work
this way in the first place.
So, revert 1d8937b860509fcaabb041bc14faf7aa3023f3c9 and put the crfree
definition back to the way it was. Fix the caller in
afs_StoreOnLastReference to not cause an error by just assigning the
cred pointer to a temporary value. While it's not ideal that some
callers may need to do this, this is the only place where this is
necessary and it's more of an artifact of the weirdness of storing a
cred pointer in linkData, which probably should be changed anyway.
Andrew Deason [Wed, 8 Jan 2014 00:24:54 +0000 (18:24 -0600)]
SOLARIS: Support VSW_STATS
Specify the VSW_STATS flag to the vfsdef_t structure we give to
Solaris. This turns on statistics that can be retrieved via fsstat(1M)
and allows the fsinfo::: DTrace provider to work with AFS files.
We don't need to actually maintain these statistics; Solaris does that
for us. This flag just signifies that our vfs_t structure is capable
of storing the information. Since we get our vfs_t from Solaris (via
domount(), it gives us a vfs_t when it calls our afs_mount function)
and do not allocate a vfs_t ourselves, we are safe and this is fine to
do.
Michael Meffie [Mon, 23 Dec 2013 17:10:36 +0000 (12:10 -0500)]
vol: reset nextVnodeUnique when uniquifier rolls over
The on disk uniquifier counter is set to 200 more than the current
nextVnodeUnique counter when the volume information is updated to disk. When
the nextVnodeUnique is near UINT32_MAX, then the uniquifier counter rolls
over. This can happen during a volume header update due to
VBumpVolumeUsage_r().
With this change, the nextVnodeUnique customer is reset to 2 and the
uniquifier is reset to 202 when a roll over occurs. (uniquifier of 1 is
reserved for the root vnode.)
With this change, the number of possible uniquifier numbers is limited to
200 less than UINT32_MAX.
The following shows a series of vnode creation/deletions to illustrate
the uniquifier rollover before this commit:
Michael Meffie [Mon, 23 Dec 2013 16:42:19 +0000 (11:42 -0500)]
vol: fix nextVnodeUnique roll over
Fixes for the per volume nextVnodeUnique counter roll over. Uniquifier number 1
is reserved for the root vnode, so reset the unique count to 2 when the
nextVnodeUnique counter rolls over.
Update the disk backed V_uniquifier count when the in-memory nextVnodeUnique
counter rolls over during the creation of a new vnode. If the nextVnodeUnique
rolls over when V_uniquifier is UINT32_MAX, then the V_uniquifier is not updated
and remains at UINT32_MAX until the next VUpdateVolume_r() call for the volume.
This bug is usually masked by the VBumpVolumeUsage(), which on every 128 volume
accesses, bumps the V_uniquifier to be 200 more than the current
nextVnodeUnique counter. This causes the V_uniquifier to roll over before
reaching UINT32_MAX. (The number of access before updating the headers is set
in the usage_threshold volume package option, which is currently set to 128 by
default.)
The following shows the unique counters for a series of vnode
creation/deletions before this commit. The nextVnodeUnique rolls over to 1,
and the uniquifier is not reset. The `usage_threshold' was set to a value
greater than 200 to avoid the VBumpVolumeUsage() calls during this test run.
Andrew Deason [Thu, 14 Nov 2013 18:53:40 +0000 (12:53 -0600)]
afs: Don't clear afs_CacheTooFull prematurely
Currently, we can clear afs_CacheTooFull here, even if
afs_CacheIsTooFull() doesn't agree that the cache is no longer 'too
full'. This could theoretically result in afs_CacheTooFull being
cleared, even though the cache is indeed 'too full', according to
afs_CacheIsTooFull(). Just break here, and let afs_CacheIsTooFull()
decide.
This reverts a small part of 488c7c97854a4bd0ec67bcfe17df93b3fd025f88.
This part doesn't seem important to the functionality in that commit,
though; the rest of that commit is still here, and avoids the extra
work if we have calculated no needed space to free.
Andrew Deason [Thu, 14 Nov 2013 18:06:56 +0000 (12:06 -0600)]
afs: Fix some dcache-related comments
- The comments preceding the afs_CacheIsTooFull macro, describing the
cache-related high and low water marks, are a little out of date.
We start freeing on 90% space, not 95%, and we also can take into
account how many free/used chunks we have.
- afs_WakeCacheWaitersIfDrained looks at the number of non-used (free
or discarded) blocks, not just free blocks.
Andrew Deason [Tue, 11 Dec 2012 19:19:02 +0000 (13:19 -0600)]
rx: Clarify error checks for busy channel check
Commit a84c6b0ece1fdee4f462c6ce27fa78c2e0d419f4 changed this so we
don't just discard an incoming request if the call already had an
error. But if the call already has an error, rxi_WaitforTQBusy is a
no-op, so checking if the error has "changed" is unnecessary and can
be confusing. Just bypass this whole block if the call already has an
error.
Discussed during the 5 Dec 2012 release-team meeting.
Andrew Deason [Thu, 26 Dec 2013 17:56:37 +0000 (12:56 -0500)]
Fedora: Handle new kernel variant paths
With Fedora 20, Fedora now separates the variant from the rest of the
kernel version with a plus (+) instead of a period (.) . This results
in directories called e.g. 3.12.5-302.fc20.i686+PAE, where right now
we look for 3.12.5-302.fc20.i686.PAE.
Use this new directory scheme for Fedora 20 builds, so we can build
against non-default kernel variants on Fedora 20 and beyond.
Andrew Deason [Mon, 23 Dec 2013 18:32:28 +0000 (13:32 -0500)]
RedHat: Munge future kernel versions
We currently look for "fc1?" (that is, fc10 through fc19) when trying
to munge the kernel version in some ways. This broke on Fedora 20,
since 20 obviously does not match "fc1?". Similarly, we look
specifically for "el6" for RHEL6 versioning quirks, but these will
break on RHEL7 and beyond.
Change the version checks so that this will work all the way through
Fedora 99 and RHEL 9. That won't work forever, but it will keep us
working for a few versions if the versioning quirks do not change.
Benjamin Kaduk [Thu, 9 Jan 2014 17:13:27 +0000 (12:13 -0500)]
ktc: fix up initializer for local_tokens
The old initializer was incomplete (initializing only one of the four
fields in the struct), which prompted warnings from clang
(-Wmissing-field-initializers):
../../../openafs/src/auth/ktc.c:149:2: warning: missing field 'server'
initializer [-Wmissing-field-initializers]
Since the variable is at file scope, it will be initialized to all
zeros anyway, and there is no need for an explicit initializer.
Jeffrey Altman [Thu, 9 Jan 2014 14:57:33 +0000 (09:57 -0500)]
Windows: Mark Irp Pending before Deferring
After CcDeferWrite() is called we no longer have access to the
current Irp. If we mark it deferred after calling CcDeferWrite()
we might mark the wrong thing.
Jeffrey Altman [Tue, 7 Jan 2014 15:57:01 +0000 (10:57 -0500)]
Windows: cm_ConnByServer fix search for replication
Separate connection objects are maintained for use when accessing
replicated and single source volumes. If the matching connection
type cannot be found while holding the cm_connLock shared a second
search is performed after the lock is upgraded to an exclusive lock.
This second connection search was not enforcing the replication criteria.
Jeffrey Altman [Tue, 7 Jan 2014 15:53:37 +0000 (10:53 -0500)]
Windows: cm_connLock not required for cm_GetUCell
In cm_ConnByServer() there is no need to hold the cm_connLock across
the cm_GetUCell() call. Obtain the cm_ucell_t object before the
cm_connLock is obtained.
Andrew Deason [Tue, 10 Dec 2013 23:02:34 +0000 (17:02 -0600)]
cellconfig: Do not use 'long' for dbserver IPs
A few places in this file assume that our dbserver IP addresses are
"long"s. A long int can be 8 bytes on some platforms, but we know
these IP addresses are all 4-byte integers. In the rare instances
where we have the maximum number of dbservers, this can overwrite a
bit of extra memory. This can also result in a misaligned access on
platforms such as SPARC v9, since the elements of he->h_addr_list are
not guaranteed to be 8-byte aligned.
So instead, treat these as 4-byte integers. For copying out of
he->h_addr_list, also use a memcpy anyway to be safe, since we are not
guaranteed alignment.
Arne Wiebalck [Fri, 13 Dec 2013 10:46:04 +0000 (11:46 +0100)]
make openafs uninstallable even if /afs is missing
The preuninstall scriptlet of the openafs RPM removes /afs. If, for
whatever reason, that directory does not exist, the scriptlet will
fail and hence break the deinstallation of the openafs package. The
proposed patch makes the scriptlet evaluate to true even if the /afs
has been removed by some other means and allows the package to be
uninstalled.
Andrew Deason [Tue, 17 Dec 2013 23:30:26 +0000 (17:30 -0600)]
LINUX: Use sock_create_kern where available
Currently, we use sock_create to create our Rx socket. This means that
accesses to that socket (sendmsg, recvmsg) are subject to SELinux
restrictions. For all recvmsg accesses and some sendmsg accesses, this
doesn't matter, since the access will be performed by one of our
kernel threads (running as kernel_t or something similar, which is
unrestricted). Such as: the rx listener, a background daemon, the rx
event thread, etc.
However, sometimes we do run in the context of a normal user process.
For some RPCs like FetchStatus, we tend to run the RPC in the
accessing user thread, which can result in us sendmsg()ing the data
packets with the initial arguments in the user thread. We can also
send delayed ACKs via rx_EndCall, and possibly a variety of other
scenarios.
In any of these situations when we are sendmsg()ing from a user
thread, SELinux can prevent us from sending to the socket, if the
calling user thread context is not able to write to an afs_t
udp_socket. This will result in packets not being sent immediately,
but the packets will be resent later, so access will work, but appear
very slow. This can easily happen for processes that are specifically
constrained by SELinux; for example, webservers are often constrained,
even if most of the rest of the system is not. This can be noticed by
seeing the 'resends' and 'sendFailed' counters rising in 'rxdebug
-rxstat', as well as noticing SELinux access failures if 'dontaudit'
rules are ignored.
To avoid this, use sock_create_kern to create the Rx socket, to
indicate that this is a socket for use by kernel code, and not
accessible by a user. This should cause us to bypass any LSM
restrictions (SELinux, AppArmor, etc). Add a configure check for this,
since this function has not always existed, according to
<https://lists.openafs.org/pipermail/openafs-devel/2004-June/010651.html>
Change-Id: I77e7f87e93be4d750d398e01dc1634efd80657bc
Reviewed-on: http://gerrit.openafs.org/10594 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
Andrew Deason [Tue, 17 Dec 2013 23:27:53 +0000 (17:27 -0600)]
rx: Remove obsolete comment
This comment refers to the fact that we used to be just checking for
SELinux to see if we should pass that extra argument. Ever since
commit cb1b41b159b98881f66319d7f65d941ba9fab911, we do have a better
test for this.
Ken Hornstein [Thu, 5 Dec 2013 18:57:36 +0000 (13:57 -0500)]
Remove extra whitespace from macro invocations
On MacOS X 10.9, the compiler has switched to LLVM and as a consequence
generates an error if there is a space between a macro invocation and
the starting left parenthesis.
Based on code originally done by Matt Haught <dmhaught@ncsu.edu>.
Jeffrey Altman [Thu, 5 Dec 2013 05:41:10 +0000 (00:41 -0500)]
Windows: RXAFS_GetVolumeStatus no PRSFS_READ check
Since d2d591caf2c9b4cf2ebae708cc9b4c8b78ca5a5a the file server no
longer performs a PRSFS_READ access check for the GetVolumeStatus RPC.
The cache manager should no longer test for PRSFS_READ as a means of
avoiding RPCs that are known to fail.
Jeffrey Altman [Wed, 27 Nov 2013 17:26:44 +0000 (12:26 -0500)]
Windows: RDR capture Cc/Mm exceptions do not break
All of the Cc and Mm functions are wrapped in try/except blocks.
The purpose is to ensure that Cc and Mm do not return an error as
an exception which could result in the afs redirector failing to
release a resource. Instead of calling the AFS exception handler
just handle the exception with EXCEPTION_EXECUTE_HANDLER. This permits
the __except block to capture the exception code.
The AFS exception handler will throw its own exception if the
AFSDebugFlags AFS_DBG_BUGCHECK_EXCEPTION bit is set. This is helpful when
debugging exceptions thrown by errors in the afs redirector code. It is
not helpful when a Cc function throws an exception. For example,
CcReadCopy() will throw STATUS_DELETE_FILE as an exception if an attempt
to read from a deleted file is initiated. This should simply fail the
read operation not BSOD the system.
Jeffrey Altman [Tue, 26 Nov 2013 15:52:45 +0000 (10:52 -0500)]
Windows: Rationalize Freelance vs "fs flush*"
Background:
cm_scache_t objects representing Freelance volume (cell=-1, volume=-1)
are special because they are populated from the Freelance mountpoint
and symlink tables. These tables are in turn generated from the
registry. The tables are regenerated on-demand after the execution of
cm_noteLocalMountPointChange() which increments cm_data.fakeDirVersion
which becomes the new data version value for the (-1.-1.1.1) directory
object.
The next time that cm_GetSCache() is called for a Freelance object
the fake root directory is rebuilt by cm_InitFakeRootDir(). Since the
vnode values are not persistent with regards to directory entry names the
FileId unique is used to distinguish the various versions.
cm_data.fakeUnique is incremented with each call to cm_InitFakeRootDir().
Each time cm_noteLocalMountPointChange() is executed the afs redirector is
notified of the data version change which will force the redirector to
rebuild its view of the directory the next time a path evaluation requires
evaluation of the root (\afs). In other words, on the next request.
If cm_noteLocalMountPointChange() is executed multiple times there is the
possibility of a race between the redirector and the service. When the
race is lost the redirector receives an invalidation event for -1.-1.1.1
as it is in the process of rebuilding the directory contents. The
redirector ends up believing it has the most recent data version when it
doesn't but the service no longer has Freelance mountpoint and symlink
tables representing the requested data version. Hence, the mountpoints
and symlinks end up as CM_SCACHETYPE_INVALID.
fs flushfile and fs flushvolume both had explicit checks to prevent
flushing Freelance objects because each call to cm_FlushFile() on a
Freelance object would execute cm_noteLocalMountPointChange() triggering
the race.
The Problem:
fs flushall is not executed on a specific object (volume or file).
Therefore there was no explicit check to prevent execution against
Freelance objects. For each cm_scache_t in the cache cm_FlushFile() is
processed. If there are N Freelance mountpoints and symlinks, there will
be N+1 calls to cm_noteLocalMountPointChange() in quick succession. Not
only does this risk losing the race described above but it is extremely
wasteful as the Freelance tables may be repeatedly regenerated.
This Patchset:
This patchset re-organizes the Freelance processing in the flush code
paths. cm_FlushFile() and cm_FlushVolume() can simply no longer be
successfully executed against a Freelance object. Both will return
CM_ERROR_NOACCESS.
"fs flush <file>" is not permitted against Freelance objects.
"fs flushvolume <path>" will execute cm_noteLocalMountPointChange() once if
the path is a Freelance object.
"fs flushall" continues to execute cm_FlushFile() on all cm_scache_t
objects. The calls on Freelance object will fail. After all cm_scache_t
objects are flushed then cm_noteLocalMountPointChange() will be executed
once to force the Freelance directory to be rebuilt.
This patchset does not address the race but significantly reduces the
likelihood the race will be lost.
Stephan Wiesand [Thu, 21 Nov 2013 14:01:29 +0000 (15:01 +0100)]
Linux: Fix build for older kernels w/o bool
Commit b7f4f2023b2b3e1aac46715176940fb50cc75265 broke builds against
older kernels which don't have bool defined in linux/types.h . Fix
this by using unsigned char instead of bool for the static inline
functions.
Change-Id: Icbb82446ef66edd2650f33135ed6ccd2b8a920b2
Reviewed-on: http://gerrit.openafs.org/10483 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
Ken Hornstein [Wed, 20 Nov 2013 18:37:52 +0000 (13:37 -0500)]
Support for changes to OS X Mavericks VNOP_SYMLINK() function.
Add support for an extra argument to afs_symlink() to return the
newly-created symlink vnode if requested (this is needed on OS X
Mavericks). On OS X Mavericks return the newly-created symlink vnode in
the symlink vnops functions, on all other platforms ignore it.
It turns out that technically OS X has required the symlink to be
created for a while, but code inside of symlink() would call namei() on
the symlink name if the returned vnode point was NULL. The difference
is that on Mavericks the Manditory Access Control Framework has been
enabled, and that turns on some extra code which unconditionally calls
vnode_mount() on the returned vnode pointer, which ends up causing a
panic
Jeffrey Altman [Fri, 15 Nov 2013 23:32:37 +0000 (17:32 -0600)]
Windows: cm_FindVolumeByFID
cm_GetVolumeByFID() does not query the vldb if the volume group
is not known to the cache manager. cm_FindVolumeByFID() is to
be used in cases where the volume group data must be known for the
operation to successfully complete.