Ben Kaduk [Wed, 29 May 2013 23:18:22 +0000 (19:18 -0400)]
FBSD: plug refcount leak in pioctl
When gop_lookupname_user returns a non-NULL vnode, the vnode came
from afs_GetVCache (by way of afs_lookup) which takes a reference
on the vnode entry. There's no need to take another spurious
reference here. The existing code already knows that there's a
reference in place, as there is an AFS_RELE down where FBSD80_ENV
unlocks the vnode if it's locked (that code is also suspicious).
Prior to this patch, things like 'fs flush /path/to/file' would
leak a reference on that cache entry, preventing clean shutdown.
Marc Dionne [Sat, 22 Sep 2012 19:29:52 +0000 (15:29 -0400)]
procmgmt: Introduce spawnprocve_sig
Introduce spawnprocve_sig, a variant of spawnprocve that allows
a caller to spawn a process with a specific signal mask.
This is useful when we want to set a mask that is different
from the current one. It needs to be done after the fork()
so that the current thread is not affected.
Andrew Deason [Thu, 12 Sep 2013 20:58:34 +0000 (15:58 -0500)]
ihandle: Make sure we don't ih_attachfd invalid FD
Right now, if you give ih_attachfd_r an invalid fd, and fdLruHead is
NULL, we'll return an FdHandle_t* for an invalid fd. Nowhere in the
code is this possible right now, but the implementation of
ih_attachfd_r and ih_attachfd doesn't make this very clear.
Ideally the "close some fds and retry" behavior in ih_attachfd_r will
be split out, so this code could be easier to follow, and we could
implement open() EMFILE retrying for icreate operations. But for now,
just make the current behavior clearer, so future modifications do not
introduce such mistakes.
Michael Meffie [Fri, 1 Feb 2013 15:57:07 +0000 (10:57 -0500)]
vos: more details in vos release -verbose output
When running vos release with the -verbose flag, print the reasons for a
complete release, and the reasons for doing a full dump of the volume. When
doing a full dump, have the verbose output print 'entire volume' instead of
'full release', to avoid confusion with a complete release.
Ben Kaduk [Fri, 22 Mar 2013 17:51:02 +0000 (13:51 -0400)]
Catch up to FreeBSD VM object read/write locks
Upstream r248084 changed the vm_object mutex to be a rwlock,
allowing for future optimizations. This is a KPI change, so
introduce conditionals to be compatible with both versions of the KPI.
Michael Meffie [Tue, 10 Sep 2013 02:25:50 +0000 (22:25 -0400)]
build: compile_et rules for parallel make
Change all makefile rules which run compile_et in order support parallel
make. The compile_et generates two outputs, so special care must be
taken in rules which run compile_et.
All the rules for compile_et have been changed to the form:
therefore a parallel make will serialize the builds of foo.c and foo.h,
and should detect that the second is no longer needed once the first is
over. This form works since foo.et is not a phony target, and does not
depend on a phony target.
Previously, the rules for compile_et were of the one of the two forms:
a) foo.c foo.h: foo.et
compile_et foo.et -h foo
or
b) foo.h: foo.c
foo.c: foo.et
compile_et foo.et -h foo
Form a) is problematic for parallel makes, since it is equivalent to:
In a parallel make, compile_et will be run concurrently, clobbering
each other's output files.
Form b) is better, but is problematic when foo.h is removed, since foo.h
will not be updated.
Thanks to Russ Allbery for pointing out the automake documentation which
describes issues with commands that produce multiple outputs, and
portable solutions.
It is possible for cm_MergeStatus() to be called while the
cm_buf_t.mx is already held. If it is a panic occurs. Test for
refcount == 0 before acquiring the lock in addition to afterwards.
If the refcount is not zero, then we do not need to acquire the
lock in any case.
Windows: AFSCreate avoid race leading to NULL dereference
If a test for NULL is performed ahead of an assignment and then
use of the assigned value, there is a race which can result in
the assigned value being NULL if the value being assigned is
altered by another thread.
Perform the assignment first then test based upon that.
Andrew Deason [Wed, 11 Sep 2013 16:22:20 +0000 (11:22 -0500)]
Probe directly for com_err.h
com_err.h can be in com_err.h, et/com_err.h, or krb5/com_err.h (for
netbsd 6.1 and possibly other netbsd). aklog currently only includes
either com_err.h or et/com_err.h, depending on autoconf probes
performed by the krb5.m4 macros.
So, also look for krb5/com_err.h. The krb5.m4 macros currently only
look for com_err.h at all if certain other libkrb5 tests return
certain results, so just look for all of them directly in some of our
openafs-specific krb5 probing logic in configure.ac.
Also remove the duplicate check for et/com_err.h in acinclude.m4 while
we're here. We only use et/com_err.h if krb5 support is enabled, so
only check for it in the second of krb5 probes.
Andrew Deason [Wed, 25 Sep 2013 05:25:48 +0000 (00:25 -0500)]
Whine if single-DES keys are in use
If we are using single-DES keys in our KeyFile, yell at the
administrator, so they have a chance at realizing that they should
migrate to stronger crypto.
Andrew Deason [Wed, 18 Sep 2013 21:56:23 +0000 (16:56 -0500)]
vol: Nuke parent vol special inodes
When we "nuke" a volume, we delete all inodes we can find that are for
the given volume id. This currently means that if we nuke an RW volume
id, we delete all of the inodes for file data for the entire volume
group (since they're all stored in the VG id), but we do not delete
the special inodes for any non-RW volumes in that volume group. Those
special inodes left behind are not very useful, since we just deleted
all of the actual file data.
Currently this means that on namei, it's impossible to nuke the
special inodes for non-RW volumes, since the namei nuke will only look
in the subdir for the given volume id. If you give it the RW volume
id, it won't delete the special inodes as menioned above; if you give
it the RO volume id, it will only look in the RO subdir, and won't
find the RO special inodes in the RW subdir.
If a volume group is damaged in such a way that the salvager cannot
fix it (due to a bug), this means that it is impossible to get rid of
that volume group completely from the partition on namei without
manually running "rm -rf" on the relevant AFSIDat directory. Normally
we have a failsafe of running 'vos zap -force', but that doesn't work
for non-RW special inodes, as mentioned above.
So, in order to allow this 'vos zap -force' failsafe to work in
hopefully all situations, also delete the special inodes for the
parent volume. Use similar logic as exists in the salvager's
OnlyOneVolume function.
tkt_MakeTicket5 tries to avoid returning heimdal asn1 error codes,
but uses an incorrect expression that's almost always true. Use
bitwise & instead of logical && to fix.
Andrew Deason [Fri, 20 Sep 2013 20:13:43 +0000 (15:13 -0500)]
rx: Always call rxi_StartListener
Commit c10f5296 made rx_Init only call rxi_StartListener in the kernel
if we have RXK_LISTENER_ENV. But this doesn't make any sense, since
rxi_StartListener only does anything if RXK_LISTENER_ENV is _not_
defined. As a result, for any non-rxk-listener non-rx-upcall platform,
we never receives rx packets in the kernel, since we never set up our
rx packet callback. The only such platform appears to be AIX, since
while other platforms (HPUX, FBSD, IRIX) have a non-rxk-listener mode,
they also implement an rxk-listener mode that we always turn on.
So, just always call rxi_StartListener, and let the ifdef guards for
the various implementations of rxi_StartListener do the right thing.
when we do a no cache read, we should decrease the resid as we use
up buffer... otherwise we have no idea in the caller how much data
actually got transferred
when processing "fs sysname" on a client, a rmtsys-related
checks are executed by default. These prevent a user with gid
2750 and 274i8 (0xabc and 0xabe) from executing this command.
Add a new flag inside the cachemanager for the rmtsys-
functionality. This flag is set through a new ioctl by the afsd
on startup.
Michael Meffie [Sat, 7 Sep 2013 03:58:39 +0000 (23:58 -0400)]
auth: fix cellservdb update check
Fix a bug introduced by the check to avoid excessive stats of the
cellservdb. Fixes a bug where cached cell config data is served for up
to one second after a write.
Check the timeRead field which is reset after a write to indicate the
data should be read.
Marc Dionne [Tue, 3 Sep 2013 11:55:14 +0000 (07:55 -0400)]
Linux 3.11: Adapt to d_count changes
In preparation for upcoming changes in the 3.12 cycle, d_lockref
was introduced late in the 3.11 cycle. The dentry's d_lock and
d_count are moved to this new structure. A new d_lock macro makes
the change transparent for locking, but direct users of d_count
must adapt. A new d_count() helper function is provided and
should now be used.
Use the new d_count() helper function if available, and move
some of the ifdef logic into a helper compatibility function.
If a callback race has been lost cm_MergeStatus is not executed.
In that case either the activeRPC count should not be incremented
or must be decremented to indicate that the current call has been
completed.
If the CcPurge operation fails or cannot be performed, in addition
to setting the purge on close flag, set the verify data flag. This
ensures that the next attempt to access the file will retry the
purge.
If the redirector is using Direct IO servicing there are no extents
in use. Skip the AFSFlushExtents, AFSTearDownExtents, and related
calls unless extent processing is in use. This will reduce lock
contention and reduce cpu processing.
Jeffrey Altman [Mon, 26 Aug 2013 00:07:44 +0000 (20:07 -0400)]
Windows: Hold Fcb Resource across CcPurgeSection
Now that the Fcb Resource and SectionObjectResource are held in
the FastIo pathway and the Trend Micro deadlock has been addressed
by holding a reference on the FileObject it is time to fix the
lock acquisition ordering. For each CcPurgeSection call the
Fcb Resource will be held exclusive before the SectionObjectResource.
Rod Widdowson [Sun, 25 Aug 2013 20:20:28 +0000 (13:20 -0700)]
Windows: Strip out unused ModWriter Fastio code
The code is no longer used (the fcb->PagingIO resource is taken for
us by the modwriter) so we strip it out to save other making changes
and then remembering/discovering that this code isn't being used.
Rod Widdowson [Sun, 25 Aug 2013 16:16:39 +0000 (09:16 -0700)]
Windows: Pin the Cc FileObject during section create.
This means that if we purge the data cache while the section is being
created then the MJ_CLOSE will not happen until we unpin the FO.
Thus we can drop any embarsssing locks prior to the close and
meddling antivirus products can do odd stuff in the close path.
Note that there may not be a file object, but in that case there
will be no close on the purge since any CcInitialize operations
will wait on us dropping the SOP lock exe - hence the SOP cannot
be set up.
Also note that this only applies to the data section,
but we do not purge the image section.
Refactor AFSPerformObjectInvalidate so that all of the non-DIRECT_IO
processing variables are in the Extents processing section. Remove
all references to Extents processing from the DIRECT_IO block.
Jeffrey Altman [Thu, 22 Aug 2013 21:50:39 +0000 (17:50 -0400)]
Windows: Refactor AFSVerifyEntry AFSValidateEntry
Inside a big switch statement it is hard to follow when there
are multiple 'break' exits within a 'case'. Reorganize the code
so that there is only a single exit for the FILE type. Unnecessary
blocks are removed as well.
Section Object Resource acquires and releases are lost in the
noise of all of the rest of the locks. Introduce a dedicated
subsystem just for Section Objects.
Jeffrey Altman [Wed, 21 Aug 2013 16:27:35 +0000 (12:27 -0400)]
Windows: Call AFSExeceptionFilter for all exceptions
In many cases we capture exceptions record and the Exception Code
as ntStatus and move on with life. This patchset changes that.
All exceptions are passed to AFSExceptionFilter so we do not miss
anything.
Andrew Deason [Wed, 21 Aug 2013 22:07:14 +0000 (17:07 -0500)]
viced: Clarify comment explaining cba sorting
The current comment here is very brief; it may not be immediately
clear to a reader why we are sorting these, and so why we need the
given CBAs in an array. Expand on it a bit.
Note that it seems like it might be possible to refactor multi_Rx to
not require all calls to be created before any packets are sent. If
multi_Rx were changed to send data as we create calls, it may be
possible to eliminate this sorting, and allow for slightly more
efficient callback traversal when breaking callbacks.
Jeffrey Altman [Sat, 17 Aug 2013 14:18:53 +0000 (10:18 -0400)]
Windows: Cap Cache Size on X86
Since we know the cache size cannot be arbitrary size because it
must fit into contiguous process memory and because it is difficult
to compute the actual size limit, cap the size to 716800KB.
Jeffrey Altman [Fri, 16 Aug 2013 19:36:32 +0000 (15:36 -0400)]
Windows: Do not recycle deleted scache on refcnt 0
If the scache object with CM_SCACHEFLAG_DELETED set is recycled
then the deleted state is lost and the cache manager cannot prevent
unnecessary FetchStatus queries to the file server.
Jeffrey Altman [Fri, 16 Aug 2013 16:01:55 +0000 (12:01 -0400)]
Windows: Do not remove scp from hash table on deletion
If the CM_SCACHEFLAG_DELETED flag is going to have any benefit, the
cm_scache object must not be removed from the hash table in response
to a VNOVNODE error. Otherwise, a new cm_scache object is allocated,
the CM_SCACHEFLAG_DELETED is not found, and a new callback request
is issued to the file server which in response returns VNOVNODE.
Do this enough times and the abort threshold is triggered and then
the application becomes very unhappy with performance.
Ben Kaduk [Wed, 17 Jul 2013 00:39:56 +0000 (20:39 -0400)]
Check for over/underflow while allocating PTS ids
The behavior of signed integer over/underflow is implementation-defined,
but even if the compiler is nice and just wraps around, we could get
ourselves into trouble later on.
Ben Kaduk [Wed, 31 Jul 2013 00:17:01 +0000 (20:17 -0400)]
Do not use a non-literal format string
Now that UKERNEL's panic() is a proper varargs function (gerrit 9877),
we can use a literal format string "%s" to print the panic message.
clang warngs about a non-literal format string, and in some build
environments the warning becomes fatal via -Werror.
Andrew Deason [Wed, 31 Jul 2013 20:58:41 +0000 (15:58 -0500)]
budb: Do not use garbage cellinfo
If the -servers option is given, we never initialize cellinfo or the
clones array. So, don't give the cellinfo structure or the clones
array to ubik in that case, or we may crash or do other weird things.
This issue appears to have been introduced in commit fc4ab52e.
Andrew Deason [Thu, 1 Aug 2013 19:06:52 +0000 (14:06 -0500)]
DAFS: Remove AFS_DEMAND_ATTACH_UTIL
Currently we have two DAFS-related preprocessor defines in the
codebase: AFS_DEMAND_ATTACH_FS and AFS_DEMAND_ATTACH_UTIL. DAFS_FS is
the symbol for enabling DAFS code, and turns on demand attachment and
all of the related complicated volume handling; it requires pthreads.
DAFS_UTIL is supposed to be used for utilities interacting with DAFS,
but do not have pthreads and so cannot build the relevant threads for
e.g. the VLRU, so they don't support demand attachment and a lot of
more advanced volume handling techniques.
Having both of these exist is confusing. For example, currently in
partition.c we only initialize dp->volLockFile for DAFS_FS, even
though the structure exists if _either_ DAFS_FS or DAFS_UTIL is
defined. This means when only DAFS_UTIL is defined, volLockFile will
exist in the partition structure, but will be uninitialized!
Amongst other possible issues, this means right now that DAFS_UTIL
users (dasalvager is the only one right now) will try to use an
uninitialized volLockFile whenever they try to use a volume that needs
locking. Since the partition struct is usually initialized to all
zeroes, this means we'll try to issue a lock request for FD 0,
whatever FD 0 is. If FD 0 is not open, we'll fail with EBADF and bail
out. But if FD 0 is open to some random file, the lock will probably
succeed, and we'll proceed without actually locking the volume lock
file. While the fssync volume checkout mechanism still works, the
on-disk locking mechanism protects against race conditions the fssync
volume checkout mechanism cannot protect against, and so handling
volumes in this way is not safe.
This is just one example; there are other issues with the partition
headerLockFile and probably may other things; most instances of
DAFS_FS really should be enabled for DAFS_UTIL as well.
So, instead of trying to account for and fix all of these problems
individually, get rid of AFS_DEMAND_ATTACH_UTIL, and just use
AFS_DEMAND_ATTACH_FS. This means that all relevant code must be
pthreaded, but since the only relevant code is for the dasalvager, we
can just make dasalvager pthreaded. Salvaging does not make use of any
threads or LWPs, so this should not have any side-effects.
Thanks to Ralf Brunckhorst for reporting the issue where we encounter
EBADF when FD 0 is not open, leading to the discovery of this.
Anders Kaseorg [Tue, 23 Jul 2013 18:37:26 +0000 (14:37 -0400)]
volume_inline.h: Down with assert, again
Commit 34767c6a0f914960c9a1efabe69dd9c312a2b400 replaced all assert
calls in this file with osi_Assert (now opr_Assert), but shortly
thereafter, commit db6ee95864a8fc5f33b7e95c19c8ff5058d37e92 added
VTimedWaitStateChange_r with two new assert calls. These are
precarious in a public header; fix them to opr_Assert like the ones in
VWaitStateChange_r.
der-protos.h was generated from Heimdal headers which in turn were
auto-generated. The included a large number of function prototypes
of the form
ret-type func(parm-list, type */* comment */);
where the combination of */* is ambiguous. Does it mean an end comment
followed by a pointer declaration or a pointer declaration followed by
a begin comment. This combination generates warnings on Windows. The
bug was fixed in Heimdal's code generator. Fixing it here by editing
the code.
Michael Laß [Sun, 14 Jul 2013 19:31:27 +0000 (21:31 +0200)]
Use -nofork when starting bosserver via systemd
Systemd does not expect the started process to fork unless
"Type=forking" is given. Use -nofork to run BOS in foreground and allow
systemd to track its state.
Change-Id: I024be12b410d6b8299edd16f309d816a3df469ed
Reviewed-on: http://gerrit.openafs.org/10087 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Michael Laß <lass@mail.uni-paderborn.de> Tested-by: Ken Dreyer <ktdreyer@ktdreyer.com> Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com>
Ben Kaduk [Fri, 12 Jul 2013 16:43:57 +0000 (12:43 -0400)]
Update the asetkey man page for rxkad-k5
Also add the usage for the six-argument form while here.
Update some generic text to account for the existence of rxkad-k5,
and mention that the Update Server is not the only thing which can copy
around KeyFiles. Give an example of the seven-argument form's usage for
rxkad-k5.
Derive DES/fcrypt session key from other key types
If a kerberos 5 ticket has a session key with a non-DES enctype,
use the NIST SP800-108 KDF in counter mode with HMAC_MD5 as the PRF to
construct a DES key to be used by rxkad.
To satisfy the requirements of the KDF, DES3 keys are first compressed into a
168 bit form by reversing the RFC3961 random-to-key algorithm
Windows has three additional places to get tokens, who knew?
Krb5 ticket support for server-to-server and localauth
Create a tkt_MakeTicket5 that creates a native krb5 rxkad token with
a service key supported by the rfc3961 library (session keys must be
provided as DES)
Update GenericAuth to search for rxkad_krb5 keys and call tkt_MakeTicket5
if it finds any.
Decrypt tickets with non-des enctypes by calling out to the rfc3961 library.
This requires the security object to be given an enhanced get_key callback
that supports looking up keys by enctype.
Include a wrapper around afsconf_GetKeyByTypes so rxkad doesn't have
to know anything about libauth internals/interfaces