Derrick Brashear [Mon, 21 Nov 2011 17:06:59 +0000 (12:06 -0500)]
ihandle: don't keep reallyclosing future fds
given that we can mark something invalid for future use, ever,
once we have done so for all fds, we ih_reallyclose is done.
don't persist the setting to the detriment of new fds
Michael Meffie [Thu, 2 Aug 2012 21:24:02 +0000 (17:24 -0400)]
libafs: revert init req to use the real uid
The commit to use wrappers for creditial structure access
inadvertently changed the user id to be the effective uid instead of
the real uid, when no PAG is present, on linux. Revert this so
setuid programs continue to work.
Mark Vitale [Thu, 2 Aug 2012 22:37:05 +0000 (18:37 -0400)]
vos: convertROtoRW may create 2nd RW on a fileserver
If an RW is already present on disk on the target server (any partition),
'vos convertROtoRW' will still convert the RO, creating a second RW on the server.
Detect this and refuse to convert the RO by returning EXDEV (invalid cross-device link).
Andrew Deason [Thu, 2 Aug 2012 15:58:12 +0000 (11:58 -0400)]
rx: Process ICMP unreachable errors
When a machine receives ICMP errors, we can detect them in
AFS_RXERRQ_ENV environments. Many of these errors indicate that a
machine is not reachable, so we are guaranteed to not get a response
from them. When we get such an error for a particular peer, mark all
relevant calls with an RX_CALL_DEAD error, since we know we won't get
a response from them. This allows some calls to dead/unreachable hosts
to fail much more quickly.
Do not immediately kill new calls, since obviously the host may have
come back up since then (or the routing/firewall/etc was fixed), but
only calls that were started before the current error was received.
Note that a call doesn't actually notice until the next rxi_CheckCall,
since directly killing each of the relevant calls would be rather
slow. So, we don't notice a dead peer immediately, though we notice
much more quickly than we used to.
Reorganize the error queue processing a little bit to make this easier
to do.
Andrew Deason [Wed, 1 Aug 2012 20:31:09 +0000 (16:31 -0400)]
LINUX: Fix error queue processing
Receiving error queues in the Linux kernel is a little different from
userspace. When we encounter a cmsg that is not CMSG_OK, we need to
break out of the loop, and not just continue, since we can keep trying
to process the same cmsg over and over. In addition, on successful
return, the msg_control buffer has been modified to point to the next
available buffer space, and msg_controllen contains how many bytes are
remaining. So, we need to adjust the msg_control and msg_controllen
values to get something more familiar.
Andrew Deason [Wed, 1 Aug 2012 19:56:27 +0000 (15:56 -0400)]
LINUX: Avoid SO_ERROR for RXERRQ_ENV
SO_ERROR is for receiving errors from some nonblocking operations; it
has little relevance to our network operations. For Linux, use a
similar structure as userspace error detection, instead of SO_ERROR.
Andrew Deason [Wed, 1 Aug 2012 19:19:02 +0000 (15:19 -0400)]
rx: Create AFS_ADAPT_PMTU and AFS_RXERRQ_ENV
Currently we have the ADAPT_PMTU define, which turns on functionality
in Linux to detect PMTU-related ICMP errors for Rx. However, this is
really turning on two separate pieces of functionality: the PMTU
processing, and the processing for ICMP errors in general.
So split this out into two defines: AFS_ADAPT_PMTU, and
AFS_RXERRQ_ENV. The former is for processing PMTU discovery, and the
latter is for processing ICMP errors. Both of these are left disabled
due to issues in the error processing. Although PMTU discovery is the
only functionality which makes use of ICMP errors, this will change in
the future.
Andrew Deason [Wed, 1 Aug 2012 18:57:06 +0000 (14:57 -0400)]
rx: Remove ADAPT_MTU and MISCMTU
Ever since 5bcf626ddaf92e199c4b46c11ad276013a47db52, ADAPT_MTU has
been unconditionally defined. MISCMTU has always been unconditionally
defined, and not used anywhere. Remove both of these, assuming they
are always defined.
Michael Meffie [Wed, 1 Aug 2012 15:42:34 +0000 (11:42 -0400)]
bozo: avoid canceling the sigkill timer for hung processes
A sigkill signal is sent to fileserver processes when a timeout is
exceeded for shutting down processes for the fs/dafs bnode.
(Currently 30 minutes for the fileserver, 1 minute for the other
server processes.)
If the bnode goal is set to run before this timeout expires, the
timer is incorrectly stopped, and a wedged process is never killed.
Fix this by not canceling the timer when a fs/dafs process has been
signaled to shutdown, regardless of the current goal.
Andrew Deason [Fri, 30 Mar 2012 19:56:52 +0000 (14:56 -0500)]
libafscp: Add afscp_LocalAuthAs
Add the function afscp_LocalAuthAs to libafscp. This allows the caller
to generate credentials based on the KeyFile on local disk, in order
to appear as an arbitrary user.
Andrew Deason [Tue, 31 Jul 2012 18:40:41 +0000 (14:40 -0400)]
LINUX: Always hold afs_xuser for unixuser read
We were failing to hold the afs_xuser lock when we entered our
unixuser traversal for the first time (when the given position is 0).
This means we can release the lock without acquiring it, causing all
kinds of weird behavior.
Just always grab afs_xuser on entry. We could possibly do some tricks
to avoid grabbing this lock until after we've printed the column
headers, but it does not seem worth it.
Andrew Deason [Fri, 6 Apr 2012 19:56:07 +0000 (14:56 -0500)]
LINUX: Do not lookup immediately recursive mtpts
On Linux, having a mountpoint in a volume root that points to the same
volume can cause serious problems. By 'immediately recursive', I mean
a situation like the following:
fs mkm mtpt vol
fs mkm mtpt/mtpt vol
If there are multiple dentry aliases for the directory (which is
possible if the directory is a mountpoint), an 'rmdir' on the
recursive mountpoint can cause the client to deadlock. Since the
'rmdir' code path in Linux locks the parent directory inode to perform
the rmdir, and locks the child directory inode after performing a
couple of sanity checks. For an immediately recursive mountpoint,
these two inodes are the same, and so we will deadlock.
Andrew Deason [Fri, 6 Jul 2012 21:37:39 +0000 (16:37 -0500)]
Linux: Make dir dentry aliases act like symlinks
Currently, we try to invalidate other dentries that exist for a
particular dir inode when we look up a dentry. This is so we try to
avoid duplicate dentries for a directory, which Linux does not like
(you cannot have hardlinks to a dir).
If we cannot invalidate the other aliases (because they are being
used), right now we just return the alias. This can make it very easy
to panic the client, due to the sanity checks Linux performs when dong
things like 'rmdir'. If we do something like this:
For the 'rmdir', we will lookup 'mtpt2'. Since 'mtpt' and 'mtpt2'
are mountpoints for the same volume, their dentries point to the same
directory inode. So when we lookup 'mtpt2', we will try to invalidate
the other dentry, but we cannot do that since it is the cwd. So we
return the alias dentry (for 'mtpt'). The Linux VFS layer then does a
sanity check for the rmdir operation, checking that the child dentry's
parent inode is the same as the inode we're performing the rmdir for.
Since the dentry we returned was for 'mtpt', whose parent is 'dir1',
and the actual dir we're performing the rmdir for is 'dir2', this
sanity check fails and we BUG.
To avoid this, make the dentry alias act like a symlink when we
encounter an uninvalidateable dentry alias. That is, we allow multiple
dentry aliases for a directory, however, when the dentry aliases are
actually used, we redirect to a common dentry (via d_automount where
possible, and follow_link elsewhere).
This means that such mountpoints will behave similarly to symlinks, in
that we 'point' to a specific mountpoint dentry. This means that if we
have multiple different ways to get to the same volume, and all are
accessed at the same time, all but one of those mountpoints will
behave like symlinks, pointing to the same mountpoint. So, the '..'
entries for each path will all point to the parent dir of one
mountpoint, meaning that the '..' entry will be "wrong", but for most
cases it will still be correct.
In order to try to make the 'target', pointed-to directory consistent,
we add a new field to struct vcache: target_link. This points to the
dentry we should redirect to, whenever that vcache is referenced. To
avoid (possibly not-feasibly-solvable) problems with refcounting, this
pointer is not actually a reference to the target dentry, but just
serves as a pointer to compare to.
afs_server: delete code that has been ifdef'ed out for years
The comments in afs_SetServerPrefs() said "clean up, delete this".
The oldest one is a decade old. Removing these #ifdefs will make
following the rest of the spaghetti #ifdefs a bit easier.
Garrett Wollman [Tue, 9 Aug 2011 04:28:27 +0000 (00:28 -0400)]
libafs: afs_CacheFetchProc can't be called without a dcache pointer
An inspection of the only call site suggests that afs_CacheFetchProc()
can't be called with a null dcache pointer, and code further down
in this function dereferences adc unconditionally (assuming
rxfs_fetchInit() doesn't crash first) so remove the conditional
here.
Probably more of these parameters can and should be included in the
AFS_NONNULL.
OpenAFS does not have separate distributions for the United States
and the rest of the world. Nor are there any restrictions on the
capabilities of the Update Server.
volser: restructure GetNextVol and clients to remove duplicate code
There are several odd-looking but stylized loops involving GetNextVol()
which can be radically simplified if only GetNextVol() would return
a meaningful value. Move all of the code that skips non-volume-header
files in the directory into GetNextVol and have it return a truth value
(instead of always returning zero) that indicates whether it saw
something that looks like a volume header. Then all the odd while
loops and strcmps just collapse into while(GetNextVol(...)).
GetNextVol() had external scope, but there are no callers in the
tree that use it outside of volprocs.c, and it's not part of a
public library interface, so make it static.
While here, don't strcmp() past the end of a filename that begins with
'V' but is too short to be a valid volume name.
afscp: avoid null dereference in _GetSecurityObject error case
Handle the possible error return from krb5_get_host_realm in the
same way as the other error cases (using an anonymous security
object); otherwise "realm" would be left null.
Andrew Deason [Thu, 26 Jul 2012 21:40:03 +0000 (16:40 -0500)]
LINUX: Hold GLOCK for proc traversal
The functions that traverse unixuser structures for display via /proc
(uu_start et al) call various libafs functions hold and release locks,
etc. To do any of that, we need GLOCK. Amongst other issues, we can
panic if we try to acquire a contested lock without GLOCK, since we
assert glock is held when we sleep for the lock or try to wake other
waiters. The same goes for the legacy CellServDB proc file.
rx: protect against ACKs with serial as prevPacket
patchset 4e71409fe1305cde4b9b341247ba658d8d24f4d0 introduced a
check in rxi_ReceiveAckPacket for out of order ack packets which
relied upon the value of the previousPacket field. Unfortunately,
some versions of RX store the previous packet's serial number in
the field instead of previous packet's sequence number. Modify
the check to only discard out of order ACKs if the previousPacket
sequence number is within the valid window.
patchset 1f0cf8b2b4bb6e36d8d82323a15ced72d91db0ec tested for
an empty queue but what is really required is a test for end of
queue after the queue_Scan(). If the queue_Scan() completes
at the end of the queue, in other words, pointing at the list
head, then return NULL because no match was found.
Andrew Deason [Wed, 9 May 2012 23:45:51 +0000 (18:45 -0500)]
vos: Minimize release impact for new RO sites
Currently, if a new RO site is added with 'vos addsite', the only way
to populate the new site with data is a 'vos release' (excepting hacks
using 'vos restore' and 'vos addsite -live', etc). Due to safeguards
in 'vos' ensuring that RO sites always all contain the same data when
marked as up-to-date in the VLDB, such a release always incurs some
amount of data to be transmitted to all sites, as well as remote sites
being brought offline briefly, even when the RW data has not changed
in very long time.
To alleviate this situation, make 'vos release' detect if new,
unpopulated RO sites have been added, and if the RW volume has not
changed since the release of any existing RO sites. If both of these
conditions are true, do not update any of the existing sites, but only
transmit volume data to the sites that did not already contain RO
volumes.
tabular_output: don't leak table struct on error exit
The caller is almost certainly going to exit when we return, but
all the same, don't leak the table description structure in the
error exit. Makes the static analyzer happier.
afsdump_extract: clarify logic to avoid freeing local buffer
Sometimes vnodepath is set to a local buffer. Sometimes it is set
to malloc'ed storage. Simplify the logic for freeing vnodepath
by checking explicitly for this condition rather than the state
of other variables. As a bonus, avoids a false (?) positive from
the static analyzer.
Michael Meffie [Fri, 22 Jun 2012 03:44:31 +0000 (23:44 -0400)]
vlserver: always use the hostaddress table in GetAddrsU
Use the hostaddress (IpMappedAddr) table when looking up hosts by IP
address and when listing addresses by index, instead of accessing
the multi-homed extensions directly.
The existing vos client calls the old GetAddrs rpc to first retrieve
a count of the number of addresses expected. This count is the
number of addresses in the hostaddress table. If there are
unreferenced entries in the mh extension blocks, then vos can return
an incorrect or incomplete list of addresses.
To be consistent with the rest of the host address processing, use
the hostaddress table in GetAddrsU to lookup hosts by index or by IP
address.
The hostaddress table is already used when looking up addresses by
UUID.
afs_conn: make release_conns_vector() actually work
release_conns_vector must never have been called before with
a non-null parameter, because it could not possibly work.
The first line of the loop is a null pointer dereference, and
if that were fixed, there's also a modify-after-free bug as well.
It's not clear how what the old version was trying to do; this
version makes a stab at doing something sensible but might be
less than required. (Note that this would be much simpler if
converted to queue(3) macros or a similar standard linked-list
data structure.)
kauth: ka_CellToRealm's "realm" parameter cannot be null
Annotate ka_CellToRealm with AFS_NONNULL to indicate that its
"realm" parameter cannot be null; it does not make sense to call
this routine without this parameter. (The static analyzer inlines
the call to ka_ExpandCell and concludes that "realm" might be null;
the annotation will prevent that and avoid a false positive.)
Andrew Deason [Wed, 25 Jul 2012 20:48:34 +0000 (15:48 -0500)]
crypto: Use our strcasecmp in kernel
A few pieces of heimdal we use in the kernel call strcasecmp
(hcrypto/evp.c, krb5/crypto.c). The strcasecmp function does not exist
in all kernels (specifically, it does not exist in at least Linux 2.4,
2.6.9, and probably not on Solaris pre-10). Since we have our own copy
of strcasecmp (called afs_strcasecmp), just use that for now.
Ideally we would have some kind of configure test for detecting the
presence of the function in the kernel, and use the roken
implementation when we don't. We currently have the framework for
neither of those in place at the moment, though, so just get by with
this for now.
Change-Id: Ia96b17596da6cb168c80c92486fa049c05205da4
Reviewed-on: http://gerrit.openafs.org/7881 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@dementix.org>
ptuser: avoid implementation-defined behavior in CreateIdList()
CreateIdList() is an internal subroutine of pr_IDListExpandedMembers(),
used to flatten a hash table of protection IDs into an array that can
be passed to pr_IdToName(). If for some reason the hash table had no
entries, it would call malloc(0) and, depending on how the
the implementation defines this, either return a PRNOMEM error (wrong!)
or else allocate a minimum-sized buffer which pr_IdListExpandedMembers
would then promptly leak. Compromise between the two behaviors by
not allocating any memory in this case but returning success, and in
the caller check for an empty list and avoid the pointless RPC to
translate no IDs into no names. pr_IDListExpandedMembers() will return
success, as it previously did in the non-PRNOMEM case.
kauth: don't call lcstring with a null source argument
This code was probably never executed, but now that lcstring() has
an AFS_NONNULL annotation, the static analyzer indicates the
potential null-pointer-dereference.
Andrew Deason [Wed, 25 Jul 2012 15:45:16 +0000 (10:45 -0500)]
rx: Raise minimum Linux atomics version to 2.6
Linux 2.4 does not have atomic_dec_return. If we switch to a
dec_and_test-like API, then we could use the Linux 2.4 atomics. But
for now, just raise the minimum to 2.6, and for 2.4 and below just use
the generic atomics implementation so we can build.
afs_bypasscache: parameters of afs_ReadNoCache can't be null
The first two parameters of afs_ReadNoCache() are unconditionally
indirected through, and all existing callers appear to guarantee
that these parameters are in fact non-null, so annotate the function
declaration to so indicate, and remove the one test that checks
whether avc (the first parameter) is null. I suspect that acred
cannot be null either, but this code does not appear to depend on
that, so it's not included in the non-null annotation.
It is important that down servers be detected as soon as possible.
When it is not possible to perform a blocking probe, perform a
probe in a backgrond thread.
ptuser: pr_SNameToId/SIdToName: if RPC response empty, force error
If the prserver returns an empty response to ubik_PR_NameToID
or ubik_PR_IDToName, but doesn't otherwise give an error,
force a PRINTERNAL error return so that the client knows that the
the return parameter was not updated. Existing callers seem to
expect this, as pr_SNameToId is often called without initializing
the variable which receives the result and checking only for the
error code.
Simon Wilkinson [Mon, 16 Jul 2012 19:09:04 +0000 (20:09 +0100)]
auth: Fix GetTokenEx with NULL cellName
If GetTokenEx is called with a NULL cellName, it means use the
local cell. To do this with the legacy interface, a 0 length string
must be used for the cell instance of the ktc_principal passed to
GetToken. Fix this so that we do so, rather than attempting to
strcpy(..., NULL) which never ends well.
When queue_Scan is executed on an empty queue the queue element
variable, in this case 'rpc_stat' is the queue head, _RXQ(q),
and not NULL. Callers of rxi_FindRpcStat() expect NULL on failure
to find or create an rx_interface_stat object. Correct the behavior
by testing for an empty queue and return NULL immediately if the
queue is empty and the caller is not requesting creation.
Simon Wilkinson [Wed, 7 Sep 2011 17:31:32 +0000 (18:31 +0100)]
afsd: Tidy up system calls
Tidy up the way that we do system calls from afsd, by making
afsd_syscall a va_arg function, using a structure to pass system call
information around, and simplifying the #ifdef ladder that converts our
platform independent system calls into something platform specific.
This fixes all of the warnings in afsd which required the -Wno-error
option, the only warnings remaining are related to daemon being
deprecated on Darwin.
There's no need for type-punning here; usr_getspecific() is a macro
that just assigns to the variable whose address we provide, so the
cast was just unnecessary (and erroneous) obfuscation. This is the
only caller of usr_getspecific(), so if it needs to be more complex
in the future, it should probably just be open-coded here.
strcompose: NULL must always be cast when passed to a variadic function
The C standard allows NULL to be defined as a bare "0", which will
be passed to variadic functions as an int. If the function expects
a pointer type, demons fly out of your nose. strcompose() is such
a function, so make sure that all of its callers cast NULL appropriately.
(None of them did.) This may be an opportune time to change all of
the callers to spell it opr_strcompose() as well, and avoid using a
reserved identifier, but this change does not do so.
opr: constify various string functions and mark them AFS_NONNULL()
All of these string functions require at least one non-null argument.
Mark them as AFS_NONNULL() so that the compiler and static checker can
find erroneous uses. The "source" arguments of lcstring and ucstring
can be const, so do so. (This doesn't affect anything in the tree
right now.) While here, note a few unfixed issues with these interfaces.)
xdr: fix two old FIXMEs related to signed/unsigned arithmetic
It's implementation-defined whether the C '>>' operator, when
applied to a signed integer, is sign-extending or zero-filling.
If you want unsigned arithmetic, you have to ask for it explicitly.
One assumes the reason for the shift is to avoid overflow if the
returned size/count is later converted to a signed int, in which
case maybe it would be better to use INT_MAX here. This is the
minimal change necessary for correctness.
If there are extents in the list with a non-zero ActiveCount,
those extents will be skipped and the list 'le' will never
become empty. Add an additional condition to ensure that the
loop is only executed once for each extent in the list.
Replace AFSExFreePool() with AFSExFreePoolWithTag() which is
a wrapper around both ExFreePool() and ExFreePoolWithTag().
If a 'Tag' value, is provided, ExFreePoolWithTag() is used.
Otherwise, ExFreePool().
Specify allocation tag values wherever possible. Path name buffer
tags are not specified because they are allocated using multiple
tags. The same is true for network provider string buffers.
This is being done in order to debug a memory corruption issue.
Warning: this is a change to the AFSRedir->AFSRedirLib interface
and therefore both drivers must be updated with a reboot and
not simply restarting the service.
rx: rxi_ReceiveDataPacket do not set rprev on drop
In KERNEL builds if there are no available packet buffers the
new packet is dropped on the floor. In that case, the call's
rprev field should not be updated because the packet was never
"received" for delivery to the application.
Remove a dead comment from the same block of code.
Windows: avoid memory overrun during extent release
While tearing down extents, if an extent is found to be in use
it will be skipped. Must use 'ulReleaseCount' as the index
into the released extent array.
Originally, the first store to "code" was dead here. Refactor the
error exits to follow the non-error exit path, which has the effect
of making the store to "code" live again (and also makes it less
likely that any new cleanup code will be unintentionally omitted).
In the ubik_ClientInit recovery case, handle the possibility that
aproc() returned zero and return UINTERNAL rather than letting the
caller think that this operation succeeded.
the whole of the api used for icon handling when you steal it
from a resource fork is deprecated in new macos. fine. we'll just make
an app bundle by cheating, move andy into a standalone icns file,
install him into the "bundle" and open it the macos way.