Michael Meffie [Fri, 24 Oct 2014 21:17:07 +0000 (17:17 -0400)]
vldb_check: fix false mh block error message
Fix a false error message about invalid mh blocks when the vldb has more
than one mh block. To add insult to injury, vldb_check complains about
the wrong address and block number.
The flags field in the mh block header is in network byte order, in all
of the blocks, not just the first one. Be sure to convert all of them
to host byte order so the VLCONTBLOCK flag check works. Fix the error
message on the secondary blocks to show the correct address and block
number.
Example bogus error messages:
vldb_check ./vldb.DB0
address 132120 (offset 0x20458): Multihomed Block 0: Not a multihomed block
address 132120 (offset 0x20458): Multihomed Block 0: Not a multihomed block
address 132120 (offset 0x20458): Multihomed Block 0: Not a multihomed block
Benjamin Kaduk [Mon, 6 Oct 2014 21:19:44 +0000 (17:19 -0400)]
Finish deorbiting libjuafs.a
Change I2074d5bc26e326db36b16e055431818ef1c69210 removed the separate
compilation/link of a libjuafs.a (since it was functionally identical
to the libuafs.a already being built), but retained a libjuafs.a in
TOP_LIBDIR for use by src/JAVA/libjafs/.
This commit adjusts src/JAVA/libjafs to refer to libuafs.a directly,
and removes references to libjuafs.a which are no longer relevant.
Benjamin Kaduk [Thu, 16 Oct 2014 04:04:14 +0000 (00:04 -0400)]
Make afs_usrops.h more closely resemble reality
Remove prototypes for many routines which are not implemented.
(I thought some toolchains would complain about this sort of thing?
Maybe we disable it.)
Benjamin Kaduk [Sat, 20 Sep 2014 03:04:10 +0000 (23:04 -0400)]
Build libuafs with libtool
Use the standard program for building PIC and non-PIC object files,
instead of rolling our own. This allows us to pull the build rules
into the Makefile.common, leaving just compiler flags and similar
in the MakefileProtos.
This does change the build flags being used to compile these files
somewhat -- the old CRULE1 and CRULEPIC used CC instead of CCOBJ
or MT_CC, and did not pass MT_CFLAGS, but it should be safe to
move to the standard compiler invocations. We can also eliminate
the libuafs-specific 'OPTF' variable which expands to OPTMZ almost
everywhere.
Rename our COMMON_INCLUDE to MODULE_INCLUDE so it's picked up properly
by the standard build rules; this will let us remove
${TOP_OBJDIR}/src/config and ${TOP_INCDIR} once the rest of the
build rules in this Makefile are converted to use libtool, as those
include directories are already added by COMMON_INCL in Makefile.config.
As a side effect, we get rid of the LIBUAFS make variable -- all sites
were defining it to libuafs.a anyway, so we can just hardcode it.
We can also build a shared libuafs.la "for free". Don't install
it anywhere just yet, though.
Benjamin Kaduk [Wed, 15 Oct 2014 23:49:12 +0000 (19:49 -0400)]
(Partially) unify XDR for libuafs and libafs
The libuafs build was getting xdr_vector() from both afsaux.c and
xdr_update.c, but because of the rules for creating static libraries,
this did not cause build errors.
The libafs build is sensitive to duplicate symbols, and was only
getting xdr_vector() from afsaux.c; libafs was not building xdr_update.c
or xdr_refernce.c (that is not a typo).
Remove duplicate xdr_vector() from afsaux.c, and build xdr_update.c
and xdr_refernce.c into libafs.
Benjamin Kaduk [Wed, 15 Oct 2014 21:52:22 +0000 (17:52 -0400)]
Avoid AFS_version conflicts in uafs
libuafs links in both afsd.o and AFS_component_version_number.o;
afsd.c #includes AFS_component_Version_number.c, which causes
symbol conflicts when linking shared.
Don't include the version file when compiling for UKERNEL, to
avoid the conflict.
Benjamin Kaduk [Tue, 8 Apr 2014 01:54:46 +0000 (21:54 -0400)]
Do not install kauth manpages when kauth is disabled
Commit 5afe7a882b0bb90a515e505d9ffce4f644633f06 added a configure
option to disable the installation of the kauth suite, but did not
add any logic to disable the installation of the corresponding man
pages, so those man pages were always installed regardless of the
options to configure.
Add logic to doc/man-pages/Makefile.in to create .noinstall files
for man pages which should not be installed in the current configuration.
Depend on the Makefile (which will be regenerated by configure) in
this target so as to attempt to behave properly if configure is re-run
with different arguments in the same working tree.
Andrew Deason [Tue, 14 Oct 2014 22:02:55 +0000 (17:02 -0500)]
auth: Fix GetNthIdentityOrUser EOF return code
Before commit 0af17e7e, afsconf_GetNthUser always returned 1 on
failure, to indicate to the caller that they should stop traversing
over the list. After commit 0af17e7e, when reaching the end of the
list, we return EIO or -1. This causes 'bos listusers' invocations to
always fail, since 'bos' clients expect to get the code 1 when
reaching the end of the SUsers list.
To fix this, make GetNthIdentityOrUser always return -1 when searching
beyond the end of the list. For the newer interface
afsconf_GetNthIdentity, we return this as-is, to have a separate
return code specifically for EOF, but still allowing us to return any
positive error code in case of an error.
For the older interface afsconf_GetNthUser, return 1 in this
situation, to retain compatibility with the old interface (both at the
libauth level and on the wire).
Change-Id: I2db4760440d7846dc290a05fa24e24ec97a02f12
Reviewed-on: http://gerrit.openafs.org/7377 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Thu, 30 Oct 2014 23:38:50 +0000 (19:38 -0400)]
Attempt to make the server install bits current
Avoid using -noauth, and mention both the rxkad.keytab (1.6)
and the KeyFileExt (as 1.8, though it's only master at present).
To support these, move forward the use of kadmin to extract
the afs/cell principal's keytab.
Move the buserver's creation to the end of the list and mark it
as optional (many sites do not run the AFS backup suite).
Deindent some programlisting blocks so they don't flow off the
page as much in the PDF version.
Drop vos syncserv and vos syncvldb from the tasks for setting
up a new server; they should not be needed, as the new db server
should pick up the existing database when it joins the quorum.
Benjamin Kaduk [Mon, 3 Nov 2014 21:46:20 +0000 (16:46 -0500)]
Drop the non-DA fileserver
The instructions are clearer when we just tell people what
to do, and we think that dafs should be right for almost
everyone. Mention that the traditional fileserver is an
option and where to read about it, but nothing more.
Benjamin Kaduk [Mon, 3 Nov 2014 17:57:08 +0000 (12:57 -0500)]
Deorbit "Getting started on IRIX systems"
IRIX is mostly gone as an upstream. The case for removing this
is less clear than the case for removing the HP-UX docs, but
it still feels like clutter in this document.
Michael Meffie [Tue, 4 Nov 2014 00:06:15 +0000 (19:06 -0500)]
avoid writing loopback addresses into CellServDB
Do not use loopback addresses for the server side CellServDB file. Use
getaddrinfo() instead of gethostbyname() to look up a list of IPv4
addresses for a given hostname, and take the first non-loopback address.
This avoids writing a loopback address into the CellServDB on systems
such as Debian, which map the address 127.0.1.1 to the hostname in the
/etc/hosts file.
Andrew Deason [Tue, 28 Oct 2014 05:10:56 +0000 (00:10 -0500)]
LINUX: Avoid d_revalidate failure on mtpt mismatch
Currently, if afs_linux_dentry_revalidate is given an inode that
corresponds to a mtpt vcache ('vcp'), it resolves the mtpt to its root
dir if it's easy to do so (mvid and CMValid are set). Later on, we run
afs_lookup to see if looking up our dentry's name returns the same
vcache that we got; afs_lookup presumably will also resolve the mtpt
if it's easy to do so.
However, it is possible that afs_linux_dentry_revalidate and
afs_lookup will make different decisions as to whether or not they
resolve a mtpt to a dir. Specifically, if CMValid is cleared after
afs_linux_dentry_revalidate checks for it, but before afs_lookup does,
then afs_lookup will return a different vcache than
afs_linux_dentry_revalidate is expecting, even though the relevant
directory entry has not changed. That is, tvc is not equal to vcp, but
tvc could be a mtpt that resolves to vcp, or vice versa. CMValid can
be cleared by another thread at virtually any time, since this is
cleared in some situations when we're not sure if the mtpt resolution
is still valid (callbacks are broken, vldb cache entries expire, etc).
afs_linux_dentry_revalidate interprets this situation to mean that the
directory entry has changed, and so it eventually d_drop's the
associated dentry. The way that this manifests to users is that a
"fakestatted" mtpt can appear to be deleted effectively randomly, even
when nothing has changed. This can be a problem because this causes
the getcwd() syscall to return ENOENT when the working directory
involves such an affected directory.
To fix this situation, we just detect if afs_lookup returned either
'vcp' (our possibly-resolved vcache), or the original inode associated
with the dentry we are revalidating. If the returned vcache matches
either of these, then the entry is okay and we don't need to
invalidate or drop anything.
FIXES 131780
Change-Id: Ide1dd224d1ea1e29a82eb7130a010877cf4e9fc7
Reviewed-on: http://gerrit.openafs.org/11559 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com>
Marc Dionne [Thu, 23 Oct 2014 15:27:55 +0000 (11:27 -0400)]
Linux 3.18: key_type no longer has a match op
Structure key_type no longer has a match op, and
overriding the default matching has to be done
differently.
Our current match op doesn't do anything special so there's
no need to try to override the defaults; just remove the
assignment of .match and the associated function.
Jeffrey Altman [Thu, 12 Jun 2014 00:53:09 +0000 (20:53 -0400)]
viced: kill CLIENT_TO_ZERO macro
Move all struct client fields that are to be zeroed upon structure
reuse to a new struct client_to_zero. Include the new structure
within struct client and call memset() on that structure.
Change-Id: I0f83f5f18b41bc0d4f8e1f7f8e04cd5508cbe4e1
Reviewed-on: http://gerrit.openafs.org/11288 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Jeffrey Altman [Wed, 11 Jun 2014 23:37:34 +0000 (19:37 -0400)]
viced: move host tmay fields before index
The index field and those after it in struct host do not get zeroed
when a host is reused. The placement of the tmay fields after index
in commit 9a0a8ca4d186cf953b87d9fae1a35f66090b060c results in the
use of uninitialized memory.
This change moves the tmay fields before index which permits
the HOST_TO_ZERO() macro to compute the correct size to be memset()
to zero.
Change-Id: I1f93bebb23c99eaa7826dafa8cd7497d1b49bb53
Reviewed-on: http://gerrit.openafs.org/11286 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com>
D Brashear [Tue, 14 Oct 2014 18:03:40 +0000 (14:03 -0400)]
libafs: avoid contaminating the return of lookup vnop
when we resort to checking the inlinebulk errors to see if a retry
is needed, do not overwrite the lookup return code; only decide if
a retry is needed.
problem case was where the first vnode returned EACCES and so
all vnodes were assumed to have failed, when just one did.
Andrew Deason [Mon, 27 Oct 2014 21:39:34 +0000 (16:39 -0500)]
rx: Reset lastSendData when resetting call
Currently we use call->lastSendData to attempt to detect a stalled
call, if it's been too long since the last time the call sent any
data. However, we never initialize lastSendData to anything when
creating a new call.
This means that when rx_NewCall (or rxi_NewCall) returns, lastSendData
can be nonzero. This can happen if we reuse a DALLY call, or if we
pull a call off of rx_freeCallQueue. This can be a time very far in
the past, since the lastSendData time has not changed since the last
time the call was used; it will remain unchanged until a user of the
new call writes something to the call stream.
This can be a problem between the time when a caller creates a new
call with rx_NewCall and when the caller actually writes something to
the stream. Between those two times, if lastSendData happens to be set
to a time in the past, we may call rxi_CheckCall on that call, and
abort the call for being idle. The call will thus be aborted before it
even sent any data on the wire.
This is of particular concern for multi_Rx calls, since those can
create a large number of call structures, possibly introducing a delay
between calling rx_NewCall and writing anything to the stream (if one
of the later rx_NewCall invocations blocks waiting for an open call
channel, for instance, all of the previous allocated calls will stick
around unused for potentially a long time).
One such multi_Rx call is done by the cache manager, where it
periodically uses multi_Rx to call RXAFS_GetCapabilities to probe
fileservers for reachability. If this issue occurs during that
operation you can see a large number of servers get marked down for
code -9 (RX_CALL_IDLE), and then get marked as coming back up.
To fix this, set lastSendData to 0 when resetting a call, along with
most of the other fields in a call, to indicate that the call has
never sent any data. As long as lastSendData is 0, the call will never
get aborted with RX_CALL_IDLE, and this situation will be avoided.
This ensures that this issue cannot happen, since rxi_ResetCall is
guaranteed to be called at some point whenever we reuse a call
structure for any reason.
Benjamin Kaduk [Mon, 6 Oct 2014 17:31:23 +0000 (13:31 -0400)]
Merge pam into the kauth configure option
Realistically, you shouldn't be using either kauth or pam. The
pam functionality provided by the module in our tree is only
useful in a kaserver-style environment, so it makes sense to merge
the two knobs.
Retain a separate enable_pam variable so that it can be overridden
on a per-architecture basis where it is known to not work. Consolidate
the two places where we did such checks, as well.
Change-Id: I6bf39ee5002f943548c51d089fe612f7e2f0501b
Reviewed-on: http://gerrit.openafs.org/11524 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: D Brashear <shadow@your-file-system.com>
Marc Dionne [Thu, 25 Sep 2014 10:52:12 +0000 (07:52 -0300)]
Linux 3.17: Deal with d_splice_alias errors
In 3.17 the logic in d_splice_alias has changed. Of interest to
us is the fact that it will now return an EIO error if it finds
an existing connected directory for the dentry, where it would
previously have added a new alias for it. As a result the end
user can get EIO errors when accessing any file in a volume
if the volume was first accessed through a different path (ex:
RO path vs RW path).
This commit just restores the old behaviour, adding the directory
alias manually in the error case, which is what older versions
of d_splice_alias used to do.
Change-Id: I5558c64760e4cad2bd3dc648067d81020afc69b6
Reviewed-on: http://gerrit.openafs.org/11492 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com>
Marc Dionne [Tue, 9 Sep 2014 13:39:55 +0000 (10:39 -0300)]
Linux 3.17: No more typedef for ctl_table
The typedef has been removed so we need to use the structure
directly.
Note that the API for register_sysctl_table has also changed
with 3.17, but it reverted back to a form that existed
before and the configure tests handle it correctly.
Change-Id: If1fd9d27f795dee4b5aa2152dd09e0540d643a69
Reviewed-on: http://gerrit.openafs.org/11455 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Wed, 17 Sep 2014 16:07:02 +0000 (12:07 -0400)]
Fix disk name initialization in scout
Scout needs to initialize names in scout_disk structures to prevent
the use of uninitialized data. However, '\0' is a NUL character
constant, i.e., the integer value 0, which is interpreted as NULL
(the pointer constant) in a pointer context, such as when assigned to
a variable of type char*. Since the name field in these structs is
passed to printing routines, the safe initialization value is the
empty string constant "", not a zero value.
Benjamin Kaduk [Wed, 17 Sep 2014 02:57:53 +0000 (22:57 -0400)]
Build fixes for recent FreeBSD -current
Let's try a new paradigm of using flag checks in the main code,
which are based off of precise version checks in the FreeBSD-specific
param.h file. It's not quite configure checks, but is much more
granular.
Benjamin Kaduk [Mon, 8 Sep 2014 17:47:33 +0000 (13:47 -0400)]
Tweak AFSDIR_PATH_MAX definition
On recent Debian, we run into runtime errors in the test suite
because _POSIX_PATH_MAX is only 256, and that buffer is too small
for a call to realpath(). Use PATH_MAX if it's available and larger
than _POSIX_PATH_MAX, in a way that should be safe even when PATH_MAX
is not defined.
Change-Id: I39127e88d92b358245ece21131219380ca4be98a
Reviewed-on: http://gerrit.openafs.org/11453 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Mon, 8 Sep 2014 17:42:27 +0000 (13:42 -0400)]
Let mancheck_utils ignore version subcommands
We don't have a man page for the 'version' subcommand, which has
"always" been present but only recently was exposed to the usage.
It's okay to not have a man page for it, so tell the test infrastructure
to not complain about its absence.
Change-Id: Ife834d41797d1d1efe403b204736ac85d62724e9
Reviewed-on: http://gerrit.openafs.org/11452 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Fri, 19 Sep 2014 18:39:04 +0000 (14:39 -0400)]
Build hcrypto with libtool
Or rather, with lwptool, since we need a LWP version as well as
the various pthreaded versions.
The previous version was the initial version, 1.1, but since we're
switching to libtool, bump the version to 2.0 just to be safe.
Libtool abstracts away the extra logic that had previously been needed
to build different copies of rand-fortuna for the pthreaded and LWP
libraries.
As for roken, we must install both shared and static libraries
to $(TOP_LIBDIR) for unity of consumption, but remove the libtool
archive after instllation.
Change-Id: Ibc530a1fa4baa7a38b44eb3e0719e1905a6fe269
Reviewed-on: http://gerrit.openafs.org/11482 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Tue, 23 Sep 2014 22:19:09 +0000 (18:19 -0400)]
Allow external hcrypto
Put the configure checks into a separate file in src/cf, following
the same general structure as the roken checks.
Allow explicitly requesting the internal version, or checking
what's in the default paths, or providing a specific hcrypto root
or lib/include dirs for Debian compatibility.
We must still always compile libafshcrypto_lwp.a for use by LWP
binaries, from the bundled sources, but other binaries will use
the system version.
The hcrypto headers have an unfortunately large number of dependencies,
including depending on being able to find each other by including
<hcrypto/foo.h> paths. As such we must pass both the user-supplied
directory and $dir/hcrypto to the preprocessor in order for things
to work, and we also may need to revisit the includes used in the
configure check for use on non-linux systems due to the dependencies
on system headers.
Change-Id: Idcba1418a19a7b562335524c911d69dc84268177
Reviewed-on: http://gerrit.openafs.org/11481 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Tue, 23 Sep 2014 20:58:08 +0000 (16:58 -0400)]
Link aklog against LIB_hcrypto
This was the last place where libafshcrypto.a was explicitly referenced,
preventing the use of an out-of-tree hcrypto library.
We will continue to need to build the in-tree code to produce a
libafshcrypto_lwp.a library for use in LWP applications, until we
do not have any more LWP applications, but some systems (such as
Debian) have a desire to avoid bundled libraries, so we should
facilitate the use of an external libhcrypto where possible.
Many consumers of libafshcrypto_lwp.a will be removed when the
LWP versions of various modules are removed after 1.8 is branched.
Change-Id: I23049866caae9c16ffb2ec32c5e7b058465a26ba
Reviewed-on: http://gerrit.openafs.org/11480 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Fri, 19 Sep 2014 19:01:29 +0000 (15:01 -0400)]
Build venus tests with libtool
This was the only place doing -lafshcrypto_lwp on the command line.
(There are other consumers, which list libafshcrypto_lwp.a explicitly;
we can use the presence of libafshcrypto_lwp.a to track progress towards
not needing a LWP hcrypto build, which must come from the in-tree version.)
Convert the tests to build with libtool (as pthreaded), where we can
just throw in $(LIB_hcrypto) and deal with what we get.
Change-Id: Ibc99615d2ff03b8aebf956502a2a6b1cb26f0a65
Reviewed-on: http://gerrit.openafs.org/11479 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Thu, 18 Sep 2014 17:55:15 +0000 (13:55 -0400)]
Build roken using libtool
Previously it was version 1.1; just in case I did something terrible,
bump it to 2.0, as was done for the other libtool conversions.
Install both the libtool archive and the static archive to $(TOP_LIBDIR),
so that all our internal consumers can just use -L$(TOP_LIBDIR) -lrokenafs
(well, via the LDFLAGS_roken and LIB_roken aliases) whether linking
statically or shared. Installing the libtool archive gets us the shared
library there, but we have to then remove the libtool archive, since
this is not the location we told libtool we would install to (the prefix
we configured with), and libtool would get confused trying to use this
installed, but installed-at-the-wrong-place libtool archive.
Add rk_vsyslog to the export list, for AIX.
It is tempting to eschew this installation and instead point LIB_roken
at the libtool archive file librokenafs.la directly (with empty
LDFLAGS_roken), but this is not possible until all consumers of roken
are converted to build using libtool. In practice, this will probably
not happen until LWP is evicted from the tree.
Change-Id: If6ab6c2d57c0a1b1511f9631b9aeb522d7e7392b
Reviewed-on: http://gerrit.openafs.org/11477 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Tue, 23 Sep 2014 19:33:08 +0000 (15:33 -0400)]
Build auth tests with libtool
(And pthreaded.)
This was the only place consuming librokenafs directly, which is
forbidden if we are to properly support using an external roken.
Convert to libtool and throw $(LIB_roken) on at the end.
Change-Id: I0cdea690800be1022888244b613929ce3154db1d
Reviewed-on: http://gerrit.openafs.org/11476 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Fri, 19 Sep 2014 01:35:30 +0000 (21:35 -0400)]
Fix LT_LDLIB_shlib_missing
Libtool's -symbols-file argument is taken as an exact match of symbols
that this library claims to export. It does not filter based on what
symbols are actually present in the objects comprising the library.
Instead, if there are symbols in the file which are not implemented
by the library, there is an implicit assumption that some other library
will provide those symbols, which must be linked into a consumer of
this library alongside this library.
These are not the semantics we want (at present, only for roken), wherein
a library will implement some (but probably not all) of a given list
of symbols, and we want the export list to reflect only those symbols
which are implemented. Instead, use the symbols file to build a regex that
will only match symbols listed in the file (and no other symbols), and
only export the subset which is present.
Change-Id: Id81f7a35089ae7f760fe643680f9bfb9c81521aa
Reviewed-on: http://gerrit.openafs.org/11475 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Mon, 22 Sep 2014 21:02:27 +0000 (17:02 -0400)]
Allow building with MIT krb5 and external roken
That is, an external roken which is part of a heimdal distribution,
with full headers and libraries, most notably krb5.h and libkrb5.
This adjusts the ordering of file- and module-specific compiler and
linker arguments so that the more specific arguments are able to
take precedence. For general flags arguments, such as enabling
or disabling warnings or features, the more-specific settings should
come last, so as to override the flags set by default. However,
for arguments that affect a global search list (e.g., for headers
or libraries), the more-specific arguments must come first, so
as to be at the beginning of the search list.
We presently use per-file CFLAGS for both warning-type flags and
preprocessor (i.e., include path) type flags, so add an additional
file-specific setting for CPPFLAGS, which comes at the beginning of
the compiler invocation.
At present, MODULE_CFLAGS are essentially only used for preprocessor
functionality, so treat them as CPPFLAGS and put them right after
the per-file CPPFLAGS. (It might be cleaner to rename them to
MODULE_CPPFLAGS, but that would be more churn than is needed. If
such a distinction turns out to be necessary, it can be done at a
later date.) Likewise the MODULE_LDFLAGS are generally being used
to affect the library search path, so put them early as well.
Make the necessary Makefile changes to use these new features to
allow building with MIT krb5 and external roken: put KRB5_CPPFLAGS
in per-file CPPFLAGS, and put LDFLAGS_KRB5 in MODULE_LDFLAGS for
aklog.
Change-Id: I1091223b3b75c782b39b9e189bdd47e52ebefae2
Reviewed-on: http://gerrit.openafs.org/11474 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Mon, 22 Sep 2014 19:27:44 +0000 (15:27 -0400)]
Adjust roken.m4 to allow separate lib and include
In some installations (e.g., Debian), the roken libraries and headers
will not be installed in a common root directory to which /lib and
/include may be appended to find the appropriate library and header
directories, respectively.
Take inspiration from rra-c-util's GSSAPI macros and allow the
specification of separate include and lib directories. Since there
are now three values to pass to the guts of the checking routine,
pass them in global variables instead of as parameters.
The expected usage would be to set either --with-roken, or both of
--with-roken-libdir and --with-roken-includedir, as in
configure --with-roken-include=/usr/include/heimdal
--with-roken-lib=/usr/lib/x86_64-linux-gnu/heimdal
This also fixes a typo that prevented --with-roken=internal from
functioning as intended.
Change-Id: I6f651ef3f3abf37c92ea81ea1801294ca3dc00b2
Reviewed-on: http://gerrit.openafs.org/11473 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Sat, 20 Sep 2014 01:18:38 +0000 (21:18 -0400)]
Deorbit separate JUAFS build
Since 80943970b8cfcdf3fc630b25804aebaea228bd73, when the web enhancements
were enabled universally, there has no longer been a functional difference
between the UAFS and JUAFS builds. Their object files are compiled
using the same compilation rule, and the list of object files differed
only by rx_kmutex.o (which is devoid of content) and xdr_int32.o
(which is preumably an oversight).
Save the extra build time by just reinstalling libuafs.a as libjuafs.a
to preserve the existing interfaces.
Additionally, drop the LIBJUAFS make variable -- all definitions set
it to libjuafs.a. Similarly, the LIBJUAFS_FLAGS variable was unused
and can be removed.
Change-Id: I2074d5bc26e326db36b16e055431818ef1c69210
Reviewed-on: http://gerrit.openafs.org/11471 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Mon, 15 Sep 2014 01:16:56 +0000 (21:16 -0400)]
Make pam conditional on INSTALL_KAUTH
The pam module we provide is only useful in kaserver-like environments,
and as such should not be installed when the user has requested to
not have kauth.
Change-Id: I9b336593e34cedfd6e8c2210f3798575d115d2d6
Reviewed-on: http://gerrit.openafs.org/11466 Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Fri, 12 Sep 2014 22:07:51 +0000 (18:07 -0400)]
Build a usable pam_afs.so
Our use of libtool for building the pam modules resulted in shared
objects which had dependencies on liboafs_auth.so and liboafs_kauth.so,
neither of which are installed.
We still need some way to resolve those dependencies at link time, and
a dependency on libafsauthent.so seems ill-advised to insert into the
pam stack, so we are left with only the option of directly linking in
the requisite functionality. Fortunately, almost all of the requisite
convenience libraries of PIC objects already exist to meet the
requirements of libafsrpc and libafsauthent; the only exception is
from the auth module. Here, we require a new convenience library,
because the pam_afs.krb.so module includes its own version of ktc.o,
compiled with AFS_KERBEROS_ENV defined, yet the pam_afs.so module
requiers a ktc.o compiled without AFS_KERBEROS_ENV defined. The
convenience library from the auth module can only include one version,
and would therefore be wrong for the other. As such, create the new
libpam_auth.la archive from the BASE_objs in src/auth, and manually
compile ktc.lo and ktc_krb.lo as needed for the pam modules.
As for libafsrpc and libafsauthent, the convenience libraries included
from other parts of the tree belong in LT_objs, not LT_deps, because
they are contributing actual content to be included in the resulting
library; they are not library dependencies of the output of this module.
Change-Id: I5292718a5494710d166043fd08ad07269ff9fdf2
Reviewed-on: http://gerrit.openafs.org/11463 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Mon, 8 Sep 2014 22:06:25 +0000 (18:06 -0400)]
Build and install libafsauthent.so.2
During the libtool interim, we had been building a .0 but not
installing it. Prior to the libtoolization of shlibafsauthent, we
had installed a libafsauthent.so.1.1, which is the same version currently
installed by the 1.6 branch. Since there have been backwards-incompatible
ABI changes (e.g., afsconf_BuildServerSecurityObjects) since the .1.1
version, we must bump the SONAME to .2.0.
At time of this writing, the libtool rules for updating the
version information is found at:
http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
and
http://www.gnu.org/software/libtool/manual/html_node/Libtool-versioning.html
This lets us consolidate the building of the shared and static libafsrpc
and their installation), as libtool will happily do both for us
at once.
We explicitly do not install the .la files, as our libtool use is
to be kept entirely internal.
Change-Id: I283f9bb74eb9853c268e8642ac1f01741deeae2b
Reviewed-on: http://gerrit.openafs.org/11462 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Mon, 8 Sep 2014 22:06:25 +0000 (18:06 -0400)]
Build and install libafsrpc.so.2
During the libtool interim, we had been building a .0 but not
installing it. Prior to the libtoolization of shlibafsrpc, we
had installed a libafsrpc.so.1.4 (though the 1.6 branch was
installing libafsrpc.so.1.5, "so we don't collide with the shlibafsrpc
versions on the master branch", which seems misguided). Since there
have been backwards-incompatible ABI changes (e.g., rx_SetMaxMTU) since
the .1.4 version, we must bump the SONAME to .2.0.
At time of this writing, the libtool rules for updating the
version information is found at:
http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
and
http://www.gnu.org/software/libtool/manual/html_node/Libtool-versioning.html
This lets us consolidate the building of the shared and static libafsrpc
and their installation), as libtool will happily do both for us
at once.
We explicitly do not install the .la files, as our libtool use is
to be kept entirely internal.
Change-Id: I11bc3cbc80048d0192aadeb80b89d2772bcd01cd
Reviewed-on: http://gerrit.openafs.org/11461 Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Fri, 12 Sep 2014 21:21:42 +0000 (17:21 -0400)]
Normalize LT_deps/LT_objs split
As described in the commit message of 69f26ece3c4545ecc9099641f7a507796fe9dc77, LT_objs should contain
the .lo files for the given module, and LT_deps should contain the
libtool dependencies, i.e., the .la files from other parts of the
tree. However, this simple split by file suffix is not correct
when we are using convenience libraries. Really, LT_objs represents
the "new" objects being provided by the module, and LT_deps is
libraries from other modules that provide functionality on which
we depend. Since convenience libraries are just thin aggregates
of object files, they behave more like object files than libraries
upon which we depend. In particular, libafsrpc and libafsauthent
are wrapper libraries that gather together the functionality of
several modules and export them as a single library interface;
they do not have any objects of their own.
However, libafsauthent has a dependency on libafsrpc, which does
belong in LT_deps (or possibly in LT_libs).
Simon's description of LT_libs leaves a little ambiguity, as it
does not describe what should be done with non-libtool libraries
from within OpenAFS. (At present, these include libafshcrypto
and librokenafs, both of which are regularly put in LT_libs.)
I prefer to recast LT_libs as containing externally visible libraries,
not just external libraries, which rationalizes the inclusion of
roken and hcrypto there, since we currently install those libraries,
and build libraries that have shared library dependencies on them.
In the future, as we begin committing to stable shared library
interfaces for libraries produced by libtool, I would like to
have those .la files be moved to LT_libs, since they would then
be external library dependencies of the given module.
Change-Id: Ie50010da84df99cec048c3e39ffeb9d5897fc08c
Reviewed-on: http://gerrit.openafs.org/11460 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Benjamin Kaduk [Fri, 12 Sep 2014 19:41:23 +0000 (15:41 -0400)]
Normalize names of libtool convenience libraries
Part of why libtool was introduced into the tree was to reduce the
number of times each source file is compiled. PIC code is needed
for shared objects, and non-PIC code for static libraries, so in most
cases a C file must be compiled twice, but not more than that.
Libtool automatically manages which version of an object is passed to
the linker when libtool is used to link .la files. At several places
in the tree (libafsrpc, libafsauthent, libuafs_pic.so, and pam_afs.so)
we use libtool to link a .la library and pass other .la libraries in
as linker inputs. In normal situations, libtool would produce an
output shared library that registered a shared library dependency on
the (shared version of the) input library. However, in our usage,
these input .la libraries are used only for our convenience, and are
not intended to be installed, so libtool would produce an output
library that was not usable.
Libtool refers to our usage of libraries not intended to be installed
as "convenience libraries"; for us, they are essentially just
static archives that contain PIC objects (as opposed to normal
static archives which contain non-PIC objects).
Prior to this commit, we had named our convenience libraries things
like libafsauthent_auth.la or libafsrpc_comerr.la, since they were
mostly only used for either libafsrpc or libafsauthent. However,
future commits will need to use some of these convenience libraries
in other shared objects (such as pam_afs.so), so we normalize the
library name to indicate merely that it is a PIC version of that
module.
There are three exceptions to this rule: libafsrpc_sys.la, which
contains only a single file and not the whole of the sys module,
libafsrpc_util.la, which contains a subset of the util objects,
and libauthent_ubik.la, which contains a subset of the ubik
objects. Since these convenience libraries are in fact tailored to
the particular application, a target-specific name is appropriate.
The convenience library provided by the ptserver module is named
libprot_pic to match the existing public interface libprot.a.
We cannot link the dependencies of the convenience libraries
directly into them, because any given object may only be linked
once into a given library, and our dependency graph between
modules is decidedly not a tree, so attempting to link in the
dependencies would result in duplicate symbol errors.
Change-Id: I5f10af74b8582edd51e5f1b3f0026dbc7ef9f7ad
Reviewed-on: http://gerrit.openafs.org/11459 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
Anders Kaseorg [Tue, 30 Sep 2014 17:52:31 +0000 (13:52 -0400)]
aklog: Fix segfault on aklog -path
Commit 2fac53522e7ef5b3a376e191bffdc1f6784e6995 “aklog: Fix improper
use of readlink” inadvertently changed the meaning of int link from a
boolean flag (length > 0) to just a length. This caused ‘aklog -path
(anything)’ to segfault.
Update the type of link and the condition of the while loop to account
for this change.
FIXES 131930
Change-Id: Ia05836795425a53e858ab29866900f6d45970644
Reviewed-on: http://gerrit.openafs.org/11517 Reviewed-by: Anders Kaseorg <andersk@mit.edu> Tested-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: D Brashear <shadow@your-file-system.com>
Perry Ruiter [Thu, 29 May 2014 22:51:57 +0000 (15:51 -0700)]
afs: Verify osi_UFSOpen worked
In some builds (UKERNEL) osi_UFSOpen returns a NULL if it runs
into a problem. On the other builds osi_UFSOpen simply panics.
afs/afs_cell.c was checking for a NULL return but other callers
were not. Add checking logic to all callers.
This is a prepartory patch. A subsequent patch will have
osi_UFSOpen return NULL rather than panic for other builds too.
Change-Id: I3610a57dff59b84fe5ea8b1c862f3192157f255f
Reviewed-on: http://gerrit.openafs.org/11243 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Garrett Wollman <wollman@csail.mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com>
When SetDispositionInfo is called to mark a file for pending
delete the link count should not be decrememented. The count is
decremented only when the file is actually deleted.
pete scott [Thu, 25 Sep 2014 15:01:27 +0000 (09:01 -0600)]
Windows: Check for RO and Open Target in rename
During a file rename operation, check to see if the target file
has the DOS readonly attribute set or has a non-zero reference
count. If yes, the request must be failed. The error status
depends upon the state of the pending delete flag. Either
STATUS_PENDING_DELETE or STATUS_ACCESS_DENIED.
Andrew Deason [Thu, 25 Sep 2014 17:34:18 +0000 (12:34 -0500)]
afs: Move init_hckernel_init to osi_Init
Currently we call init_hckernel_init inside afs_InitSetup, to
initialize the hcrypto mutex. However, we use the hcrypto mutex in the
AFSOP_SEED_ENTROPY syscall, which afsd calls before any syscall that
cals afs_InitSetup. This means we crash on trying to
AFSOP_SEED_ENTROPY.
To avoid this, just call init_hckernel_init inside osi_Init instead,
which is called when our kernel module itself is initialized. This
ensures that the mutex is initialized early on, regardless of what
happens with afsd and the startup syscalls.
Change-Id: Ib6cbed7abcfd8f9a61685f613a848e9f36d6050d
Reviewed-on: http://gerrit.openafs.org/11509 Tested-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
pete scott [Wed, 24 Sep 2014 17:49:38 +0000 (11:49 -0600)]
Windows: Use the allocation size from the service
The prior patchset modified the service AllocationSize return value
to count the number of 1KB units. Use the value from the service
without modification. This corrects an inconsistency in the
FileStandardInformation response.
pete scott [Wed, 24 Sep 2014 17:00:36 +0000 (11:00 -0600)]
Windows: Remove trailing slash on non-root directories
For the FileNameInformation and FilePhysicalNameInformation queries
a trailing slash is required for the \\server\share\ path but is
not required for directories below the root.
pete scott [Wed, 24 Sep 2014 16:49:06 +0000 (10:49 -0600)]
Windows: FilePhysicalNameInfo query AFS prefix
In response to the FilePhysicalNameInformation query the AFS redirector
failed to include the server name in the response. Since the constructed
name is the same as the FileNameInformation query create a helper function
AFSGetFullFileName() to populate the file name into the provided buffer
and use it to satisfy both queries.
pete scott [Wed, 24 Sep 2014 16:06:38 +0000 (10:06 -0600)]
Windows: FileInfo too small INFO_LENGTH_MISMATCH
The FileAllInformation query is initially processed by the IO Manager
and when the IO Manager is passed a buffer that is too small to hold
the File Information structure it returns STATUS_INFO_LENGTH_MISMATCH.
Previously the afs redirector returned STATUS_BUFFER_TOO_SMALL in this
case. Instead follow IO Manager's lead.
pete scott [Tue, 23 Sep 2014 19:20:45 +0000 (13:20 -0600)]
Windows: !overwrite IOMgr populated FileInfo data
I/O Manager will populate the FILE_ACCESS_INFORMATION,
FILE_MODE_INFORMATION, and FILE_ALIGNMENT_INFORMATION portions of
a FILE_ALL_INFORMATION structure prior to forwarding a FileAllInformation
FileInfo query to the file system. There is no need for the file system
to duplicate the effort.
Change-Id: Iaa7f1de95c6b7e42bdc326cc3f4bfe8596add949
Reviewed-on: http://gerrit.openafs.org/11478 Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
Windows: preserve prior vlserver list on dns failure
Do not destroy the existing vlserver list if the DNS resolver query
fails. Continue using the prior vlserver values until a DNS response
is obtained. This will result in repeated DNS queries and a delay
if there is continued failure, but it will permit VL RPCs to continue
to be issued in the face of a DNS failure or misconfiguration.
Modify cm_PingServer and cm_CheckServersMulti to avoid probing servers
when there are no network interfaces with which to do so. Just mark
the servers down.
lana_OnlyLoopback() relies upon Netbios over TCP being enabled but
Netbios over TCP is not officially supported on Vista and above.
Replace all lana_OnlyLoopback() calls with a test on the number of
network interfaces as computed by syscfg_GetIFInfo(). That list
excludes loopback interfaces.
Add a new function cm_UpdateIFInfo() that consolidates all of the
syscfg_GetIFInfo() call functionality into a single rountine. Replace
all of the existing call sites.
It is safe to call cm_UpdateIFInfo() without holding cm_syscfgLock
during afsd initialization because no other threads have been created.
If CcMdlRead or CcPrepareMdlWrite fail, check the IoStatus.Information
field to see if any MDL pages have been locked. If the Information
value is greater than zero, complete the Mdl operation to unlock the
pages.
Change-Id: Icb44e74e25b46c7976f3f418410364a90a723d91
Reviewed-on: http://gerrit.openafs.org/11442 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
Ben Kaduk [Tue, 26 Mar 2013 23:43:07 +0000 (19:43 -0400)]
Garbage-collect afs_GCUserData's argument
We no longer need the ability to force all rxnull connections to be
reaped, as the epoch is set globally. Change the prototype and
callers accordingly.
Ben Kaduk [Tue, 26 Mar 2013 17:41:40 +0000 (13:41 -0400)]
Move epoch and cid generation into the rx core
Now that we have hcrypto available everywhere, we can get real randomness
in the rx core (both uerspace and kernel), and thus can initialize the
RX epoch to a real random value when first initializing a host; there is
no need to rely on rxkad to produce randomness for this purpose.
Initialize a connection ID counter at the same time, and use that in
rx_NewConnection, also supplanting rxkad-specific logic.
The rxkad-specific logic is removed, and in particular there is no longer
a need to export rxkad_EpochWasSet to the rest of the world.
The code in afs_Daemon() to check whether the rxepoch was set can be
removed, as if the epoch is not set, rx initialization fails.
Add libafshcrypto to LIBS in the handful of places it hadn't crept already,
and likewise afshcrypto.lib in the NTMakefiles.
Ben Kaduk [Wed, 27 Mar 2013 21:02:55 +0000 (17:02 -0400)]
Export heimdal's rand-fortuna PRNG to the kernel
Some systems (e.g., AIX, SGI, DFBSD, HPUX) do not supply a useful
implementation of osi_readRandom(), in some cases because the kernel
does not expose a random-number interface to kernel modules. We want
real random numbers on all systems, because we want to use the for
setting the RX epoch and connection ID in the kernel.
Build hcrypto's rand-fortuna PRNG into the rand-kernel interface we expose,
and implement RAND_bytes using rand-fortuna when osi_ReadRandom()
is not useful.
Add stub routines to config.h as needed, and add a heim_threads.h
with the necessary locking for rand-fortuna. The rand-fortuna algorithm
requires some measure of time's passage, so provide a stub gettimeofday()
with single-second resolution. We use a single (global) mutex for the
hcrypto kernel code, so that we can statically declare an initializer to
be the address of that mutex. Otherwise the locking is taken essentially
wholesale from rx_kmutex.
rand-fortuna requires the sha256 code for its hashing, and also
requires a stub rand-fortuna to satisfy linker symbol visibility.
Since the rand-fortuna code does not have any actual sources of entropy
available to it during its initialization routines, we must explicitly
seed the in-kernel rand-fortuna using entropy passed in from userland.
(Userland will always have at least /dev/random available, so the
userland hcrypto should always have usable entropy.) Be sure to do so
early in the afsd startup sequence, before any daemons are started, so
that entropy is available to the core rx code for generating the epoch
and cid -- the rand-fortuna code will (erroneously) always claim that
it has startup entropy even though in this case it may not actually
have any entropy. The rand-fortuna code does not consider itself
fully seeded until it has 128 bytes of entropy, so be sure to pass
more than that in from userspace.
It is preferrable to always build this code into the kernel, even on
systems when it is not going to be used, to help prevent bitrot. This
also avoids the possibility of a new system being supported that would
attempt to use the rand-fortuna code but fail to supply any seed entropy,
which would not necessarily be readily apparent.
Simon Wilkinson [Mon, 25 Aug 2014 15:25:43 +0000 (16:25 +0100)]
ubik: Don't leak UBIK_VERSION_LOCK if udisk_LogEnd fails
If the call to udisk_LogEnd() fails (probably due to an I/O error)
don't leak the UBIK_VERSION_LOCK.
This is the possible cause of a vlserver deadlock, which had
approximately 4800 threads blocked. Analysis of backtrace of all
of these threads showed that all blocked threads were waiting in
ubik.c:555 (blocked on DBHOLD) with the exception of:
One in beacon.c:388 (blocked on UBIK_VERSION_LOCK)
One in recovery.c:503 (blocked on DBHOLD)
One in ubik.c:125 (blocked on DBHOLD)
One in ubik.c:585 (blocked on UBIK_VERSION_LOCK)
The last of these is the critical one, because it already holds
the lock that DBHOLD waits on - so despite the vast majority of
threads being blocked in DBHOLD, it's actually UBIK_VERSION_LOCK
that we're waiting on.
There is no sign of a thread which is still active which currently
holds UBIK_VERSION_LOCK.
Simon Wilkinson [Mon, 25 Aug 2014 15:15:26 +0000 (16:15 +0100)]
ubik: Don't leak UBIK_VERSION_LOCK if setlabel fails
If a call to the setlabel() physical IO function fails, don't
leak the UBIK_VERSION_LOCK.
This is the possible cause of a vlserver deadlock, which had
approximately 4800 threads blocked. Analysis of backtrace of all
of these threads showed that all blocked threads were waiting in
ubik.c:555 (blocked on DBHOLD) with the exception of:
One in beacon.c:388 (blocked on UBIK_VERSION_LOCK)
One in recovery.c:503 (blocked on DBHOLD)
One in ubik.c:125 (blocked on DBHOLD)
One in ubik.c:585 (blocked on UBIK_VERSION_LOCK)
The last of these is the critical one, because it already holds
the lock that DBHOLD waits on - so despite the vast majority of
threads being blocked in DBHOLD, it's actually UBIK_VERSION_LOCK
that we're waiting on.
There is no sign of a thread which is still active which currently
holds UBIK_VERSION_LOCK.