Mark Vitale [Tue, 28 Feb 2017 23:02:39 +0000 (18:02 -0500)]
SOLARIS: prevent BAD TRAP panic with Studio 12.5
Starting with Solaris Studio 12.3, it is documented that Solaris kernel
modules (such as libafs) must not use any floating point, vector, or
SIMD/SSE instructions on x86 hardware. However, each new Studio
compiler release (12.4 and especially 12.5) is more likely to use these
types of instructions by default.
If the libafs kernel module includes any forbidden kernel instructions,
Solaris will panic the system with:
BAD TRAP: type=7 (#nm Device not available)
Provide a new autoconfig test to specify the required compiler options
(-xvector=%none -xregs=no%float) when building the OpenAFS kernel module
for Solaris, so that no invalid x86 instructions are used.
In addition, reinstate default kernel module optimization for Solaris.
It had been disabled in commit 80592c53cbb0bce782eb39a5e64860786654be9f
to address this same issue in Studio 12.3 and 12.4. However, Studio
12.5 started using some SSE instructions even with no optimization.
This commit has been tested with OpenAFS master and Studio 12.5 at all
optimization levels (none, -xO1 through -xO5) and verified to contain no
XMM register instructions via the following command:
$ gobjdump -dlr libafs64.o | grep xmm | wc -l
Mark Vitale [Tue, 21 Feb 2017 01:16:47 +0000 (20:16 -0500)]
DAFS: do not save or restore host state if CPS in progress
If a fileserver is shutdown while one or more PR_GetHostCPS calls
are in progress, this state is saved in the fsstate.dat file as
hostFlags HCPS_WAITING, HCPS_INPROGRESS. Other hosts that are
merely waiting will have HCPS_WAITING recorded.
However, it makes no sense to restore host structs in this state,
because the GetCPS calls will no longer be in progress. Once these
hosts become active, they will block server threads and quickly cause
all server threads to be exhausted as other CPS requests are blocked
behind them.
Instead, exclude these states from both save and restore.
Marcio Barbosa [Thu, 2 Mar 2017 21:01:48 +0000 (18:01 -0300)]
osx: build afscell only for active architecture
The InstallerPlugins framework provided by the MacOSX10.12.sdk does not
define symbols for architecture i386. As a result, the OpenAFS code
cannot be built on OS X 10.12.
To fix this problem, build the afscell xcode project only for active
architecture.
Michael Meffie [Thu, 11 Jun 2015 17:14:27 +0000 (13:14 -0400)]
libafs: vldb cache timeout option (-volume-ttl)
The unix cache manager caches VLDB information for read-only volumes as
long as a volume callback is held for a read-only volume. The volume
callback may be held as long as files in the read-only volume are being
accessed. The cache manager caches VLDB information for read/write
volumes as long as volume level errors (such as VMOVED) are not returned
by a fileserver while accessing files within the volume.
Add a new option to set the maximum amount of time VLDB information will
be cached, even if a callback is still held for a read-only volume, or
no volume errors have been encounted while accessing files in read/write
volumes.
This avoids situations where the vldb information is cached indefinitely
for read-only and read/write volumes. Instead, the VL servers will be
periodically probed for volume information.
Change-Id: I5f2a57cdaf5cbe7b1bc0440ed6408226cc988fed
Reviewed-on: https://gerrit.openafs.org/11898 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Sergio Gelato [Wed, 22 Feb 2017 21:55:33 +0000 (13:55 -0800)]
LINUX: Debian/Ubuntu build regression on kernel 3.16.39
Now that kernel 4.9 has hit jessie-backports, it becomes desirable to
also backport the associated openafs patches.
Unfortunately, Linux-4.9-inode_change_ok-becomes-setattr_prepare.patch
causes a build failure against jessie's current default kernel,
3.16.39-1, due to the fact that setattr_prepare() is available (it was
cherrypicked to address CVE-2015-1350) but file_dentry() is not (it was
introduced in kernel 4.6).
This makes it difficult to have a version of openafs for jessie that
supports both kernels.
To deal with this, follow the implementation of file_dentry() in 4.6,
and simplify it to account for the lack of d_real() support in older
kernels.
Note that inode_change_ok() has been added back to 3.16.39-1 to avoid
ABI changes. That means the current openafs packages in jessie continue
to work with kernel 3.16.39-1 since they do not include
Linux-4.9-inode_change_ok-becomes-setattr_prepare.patch.
Originally reported at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855366
FIXES RT134158
Change-Id: I157aa2ff25945c1c6e3b8e4a600557125711a681
Reviewed-on: https://gerrit.openafs.org/12523 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Wed, 7 Dec 2016 16:11:45 +0000 (11:11 -0500)]
Linux 4.10: have_submounts is gone
Linux commit f74e7b33c37e vfs: remove unused have_submounts() function
(v4.10-rc2) removes have_submounts from the tree after providing a
replacement (path_has_submounts) for its last in-tree caller, autofs.
However, it turns out that OpenAFS is better off not using the new
path_has_submounts. Instead, OpenAFS could/should have stopped using
have_submounts() much earlier, back in Linux v3.18 when d_invalidate
became void. At that time, most in-tree callers of have_submounts had
already been converted to use check_submounts_and_drop back in v3.12.
At v3.18, a series of commits modified check_submounts_and_drop to
automatically remove child submounts (instead of returning -EBUSY if a
submount was detected), then subsumed it into d_invalidate. The end
result was that VFS now implicitly handles much of the housekeeping
previously called explicitly by the various filesystem d_revalidate
routines:
- shrink_dcache_parent
- check_submounts_and_drop
- d_drop
- d_invalidate
All in-tree filesystem d_revalidate routines were updated to take
advantage of this new VFS support.
Modify afs_linux_dentry_revalidate to no longer perform any special
handling for invalid dentries when D_INVALIDATE_IS_VOID. Instead, allow
our VFS caller to properly clean up any invalid dentry when we return 0.
Change-Id: I0c4d777e6d445857c395a7b5f9a43c9024b098e9
Reviewed-on: https://gerrit.openafs.org/12506 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Joe Gorse [Thu, 16 Feb 2017 23:01:50 +0000 (18:01 -0500)]
LINUX: Bring debug symbols back to the Linux kernel module.
Starting with 4.8 Linux kernels our existing build script
generator, make_kbuild_makefile.pl, does not pass the debugging
symbols CFLAGS that were present when building for previous kernels.
This fix appends the $(KERN_DBG) variable which will only be defined
when the configuration includes the --enable-debug-kernel option.
Change-Id: I9a85dc0311a3a706239bc9e471b2d7197ebe1946
Reviewed-on: https://gerrit.openafs.org/12519 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Fri, 10 Feb 2017 15:39:09 +0000 (10:39 -0500)]
build: add --without-swig to override swig check
Add the --without-swig option to disable the automatic swig detection
and disable the optional features which depend on swig. This allows
builders to avoid swig even if present on the build system.
Also, add the --with-swig option to force the check and fail if not
detected. This allows builders to declare the swig features are
mandatory.
The default continues to be to check for swig, and if present, build the
optional features which require swig.
To disable the automatic check for swig and disable the features which
depend on swig:
./configure --without-swig # or --with-swig=no
To force the check and fail if swig is not present on the system:
./configure --with-swig # or --with-swig=yes
If --with-swig is given and swig is not detected, then configure will
fail with the message:
configure: error: swig requested but not found
The Perl 5 bindings for libuafs is the only feature which requires swig
at this time.
Andrew Deason [Fri, 10 Feb 2017 07:29:28 +0000 (01:29 -0600)]
PERLUAFS: Modernize lang-specific swig typemaps
Currently, our swig bindings for PERLUAFS define a couple of typemaps
like so:
%typemap(in, numinputs=1, perl5) (char *READBUF, int LENGTH) {
[...]
}
Embedding the target language name in the typemap arguments is a very
old way of specifying what language the typemap is for; they were
removed after swig 1.1. With swig 3.0.x releases (and possibly
others), the specific combination of this deprecated syntax and some
other features we're using causes a segfault. That's clearly a bug in
swig, but we shouldn't be using the deprecated syntax anyway.
Update this to instead use preprocessor symbols to specify
language-specific typemaps (#ifdef SWIGPERL). We only actually define
these for perl right now, so make sure to throw an error if we're not
running for perl.
FIXES 134103
Change-Id: I14264a2dfada53d99413808ed5d60b79b1ee44f3
Reviewed-on: https://gerrit.openafs.org/12517 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Anders Kaseorg [Tue, 6 Dec 2016 15:48:31 +0000 (10:48 -0500)]
AFS_component_version_number.c: Respect SOURCE_DATE_EPOCH if set
To improve build reproducibility, if the SOURCE_DATE_EPOCH environment
variable is set, use it to deterministically replace the embedded build
date, and do not include the username or hostname in this case.
Marcio Barbosa [Wed, 11 Jan 2017 14:05:04 +0000 (06:05 -0800)]
osx: let prefpane knows where binaries can be found
Starting from OS X 10.11, the OpenAFS binaries were moved to the
following directories: /opt/openafs/bin and /opt/openafs/sbin. However,
the OpenAFS prefpane is not aware of the change mentioned above. As a
result, some functionalities provided by the OpenAFS prefpane are not
working properly.
To fix this problem, add the new paths to the proper environment
variable.
Mark Vitale [Sat, 7 Jan 2017 11:22:47 +0000 (06:22 -0500)]
LINUX: eliminate unused variable warning
Commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286 added routine
osi_TryEvictDentries and included new logic for D_INVALIDATE_IS_VOID.
Unfortunately, this new code path no longer uses dentry; it also should
have been made conditional at that time.
Wrap the declaration of dentry in #ifndef D_INVALIDATE_IS_VOID to
eliminate the unused variable warning.
Andrew Deason [Sat, 8 Aug 2015 21:49:50 +0000 (16:49 -0500)]
SOLARIS: Use AFS_PAG_ONEGROUP_ENV for Solaris 11
On Solaris 11 (specifically, Solaris 11.1+), the supplemental group
list for a process is supposed to be sorted. Starting with Solaris
11.2, more authorization checks are done that assume the list is
sorted (e.g., to do a binary search), so having them out of order
can cause incorrect behavior. For example:
$ echo foo > /tmp/testfile
$ chmod 660 /tmp/testfile
$ sudo chown root:daemon /tmp/testfile
$ cat /tmp/testfile
foo
$ id -a
uid=100(adeason) gid=10(staff) groups=10(staff),12(daemon),20(games),21(ftp),50(gdm),60(xvm),90(postgres)
$ pagsh
$ cat /tmp/testfile
cat: cannot open /tmp/testfile: Permission denied
$ id -a
uid=100(adeason) gid=10(staff) groups=33536,32514,10(staff),12(daemon),20(games),21(ftp),50(gdm),60(xvm),90(postgres)
Solaris sorts the groups given to crsetgroups() on versions which
required the group ids to be sorted, but we currently manually put our
PAG groups in our own order in afs_setgroups(). This is currently
required, since various places in the code assume that PAG groups are
the first two groups in a process's group list.
To get around this, do not require the PAG gids to be the first two
gids anymore. To more easily identify PAG gids in group processes, use
a single gid instead of two gids to identify a PAG, like modern Linux
currently uses (under the AFS_PAG_ONEGROUP_ENV). High-numbered groups
have been possible for quite a long time on Solaris, allegedly further
back than Solaris 8. Only do this for Solaris 11, though, to reduce
the platforms we affect.
[mmeffie@sinenomine.net: Define AFS_PAG_ONEGROUP_ENV in param.h.]
Benjamin Kaduk [Mon, 26 Dec 2016 18:15:35 +0000 (12:15 -0600)]
afsd_kernel: remove gratuitous OS dependence
Commit 94c15f62 in 2010 gave NetBSD and only NetBSD the debug
printing of errno and the strerror() output, with no justification
in the commit message. In the interest of unifying behavior and
avoiding unnecessary OS dependence, give all platforms the errno
and strerror() behavior.
Michael Meffie [Tue, 13 Sep 2016 02:21:59 +0000 (22:21 -0400)]
afsd: print syscalls on separate lines with afsd -debug
afsd prints information to standard out for testing and debugging when the
-debug option is given. However, syscall tracing is emitted without trailing
newlines on all platforms except netbsd, creating an unreadable wall of text.
Michael Meffie [Mon, 26 Sep 2016 15:19:13 +0000 (11:19 -0400)]
vol: convert vnode macros to inline functions
Convert the vnode macros to inline functions to fix integer overflows
for very large vnode numbers (and generally improve the code robustness
and readability).
The macro version of vnodeIndexOffset() will evaluate to an incorrect
offset for very large vnode numbers due to 32-bit integer overflow. The
vnode index file will then be corrupted when writing to the incorrect
offset.
In code paths where the vnode number incorrectly defined as a signed
32-bit integer, this change prevents vnodeIndexOffset() from evaluating
to a negative result when a vnode number is larger than 2^31.
Thanks to Mark Vitale for reporting and providing analysis.
$ make check
…
volser/vos..............FAILED 6
…
$ cd tests
$ ./libwrap ../lib ./runtests -o volser/vos
1..6
ok 1 - Successfully got security class
ok 2 - Successfully built ubik client structure
ok 3 - First address registration succeeds
ok 4 - Second address registration succeeds
ok 5 - vos output matches
Server exited with code 15
# wanted: 0
# seen: -1
not ok 6 - Server exited cleanly
# Looks like you failed 1 test of 6
afstest_StopServer has a check for the process terminating with signal
15 (SIGTERM), but not for the process exiting with code 15.
Change-Id: I022965ea2b5440486ea1cf562551d3bbd0516104
Reviewed-on: https://gerrit.openafs.org/12489 Tested-by: Anders Kaseorg <andersk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Anders Kaseorg [Fri, 16 Dec 2016 05:29:21 +0000 (00:29 -0500)]
doc/man-pages/Makefile.in: mkdir man[158] in case we did regen.sh -q
Fixes this error:
$ git clean -xdf
$ ./regen.sh -q
$ ./configure
$ make
[…]
make[3]: Entering directory '/…/openafs/doc/man-pages'
rm -f man*/*.noinstall
if [ "no" = "no" ] ; then \
for M in man1/klog.1 man1/knfs.1 […] man8/kpwvalid.8 man1/klog.krb.1; do \
touch $M.noinstall; \
done; \
fi
touch: cannot touch 'man1/klog.1.noinstall': No such file or directory
touch: cannot touch 'man1/knfs.1.noinstall': No such file or directory
[…]
touch: cannot touch 'man8/kpwvalid.8.noinstall': No such file or directory
touch: cannot touch 'man1/klog.krb.1.noinstall': No such file or directory
Makefile:34: recipe for target 'prep-noinstall' failed
make[3]: *** [prep-noinstall] Error 1
make[3]: Leaving directory '/…/openafs/doc/man-pages'
Change-Id: I95098fb2b27f1d87fc9769497b225e9f91f72266
Reviewed-on: https://gerrit.openafs.org/12492 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Anders Kaseorg [Wed, 14 Dec 2016 20:47:21 +0000 (15:47 -0500)]
tests/opr/softsig-t: Avoid hanging due to intermediate sh -c
If the build directory happened to contain shell metacharacters, like
the ~ in /build/openafs-vb8tid/openafs-1.8.0~pre1 used by the Debian
builders, Perl was running softsig-helper via an intermediate sh -c,
which would then intercept the signals we tried to send to
softsig-helper. Use the list syntax to avoid this sh -c.
Benjamin Kaduk [Fri, 16 Dec 2016 04:12:01 +0000 (22:12 -0600)]
tests: use exec to call libwrap'd executables
No need to leave the shell process hanging around.
In particular, if we are manually running softsig-helper under
libwrap to debug test failures, the child process of the shell is
another shell, which interprets some signals that we wanted to
be passed through, like SIGTERM. On the other hand, once the
softsig-helper is exec()'d, you basically need another shell to
terminate it, which is a different problem....
Change-Id: Iff7c519886a018cb68e692746d40c427b6299457
Reviewed-on: https://gerrit.openafs.org/12490 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Anders Kaseorg <andersk@mit.edu> Tested-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Tue, 16 Aug 2016 16:56:47 +0000 (12:56 -0400)]
tests: fix signo to signame lookup in opr/softsig tests
Fix the loop condition when scanning the signal number to name table to
convert a signal number to a name. Instead of looping sizeof(size_t)
times, loop for the number of elements in the table.
This bug was masked on 64 bit-platforms, since the signal number to name
table table currently has 8 elements, which is coincidently the same as
sizeof(size_t) on 64-bit platforms. The bug becomes apparent on 32-bit
systems; only the first 4 elements of the table are checked.
Example error output before this fix:
$ cd tests
$ ./libwrap ../lib ./runtests -o opr/softsig
1..11
ok 1
ok 2
ok 3
ok 4
ok 5
not ok 6
# Failed test in ./opr/softsig-t at line 57.
# got: 'Received UNK
# '
# expected: 'Received TERM
# '
not ok 7
# Failed test in ./opr/softsig-t at line 60.
# got: 'Received UNK
# '
# expected: 'Received USR1
# '
not ok 8
# Failed test in ./opr/softsig-t at line 63.
# got: 'Received UNK
# '
# expected: 'Received USR2
# '
ok 9 - Helper exited on KILL signal.
ok 10 - Helper exited on SEGV signal.
ok 11 # skip Skipping buserror test; SIGBUS constant is not defined.
# Looks like you failed 3 tests of 11.
Neale Ferguson [Thu, 8 Dec 2016 16:47:09 +0000 (11:47 -0500)]
s390: desupport 32-bit Linux kernels on s390/s390x
Remove the obsolete and custom lwp assembler for the s390 and s390x
architectures. That assembler is no longer needed since 32-bit
mainframe Linux distributions are no longer supported and are very
unlikely to be in use.
The generic process.default.s is sufficient for modern 64-bit Linux
distributions on s390/s390x.
[mmeffie@sinenomine.net: commit message wording]
Change-Id: I654b10dfc257e7de90c9a50048982427276f4d61
Reviewed-on: https://gerrit.openafs.org/12475 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Tue, 12 Jan 2016 23:06:51 +0000 (18:06 -0500)]
afs: fs getcacheparms miscounts dcaches for large files
fs getcacheparms issued with the -excessive option tabulates in-memory
dcaches ("DCentries") by size. However, any dcache with validPos > 2^31
is miscounted in the 4k-16k bucket. This is caused by a type mismatch
between 'validPos' (afs_size_t) and 'size' (int) which leads to a
negative value for size by sign-extension. The size comparison "sieve"
fails for negative numbers; it skips the first bucket (0-4K) and dumps
them in the second one (4k-16k).
Move the declaration of 'size' closer to its use, and declare it with
the same type as 'validPos' (afs_size_t) so the comparison sieve
correctly places these dcaches in the last (>=1M) bucket.
Change-Id: Ib0d973da92865043a4f1c068de5e9b81bcde2b9a
Reviewed-on: https://gerrit.openafs.org/12347 Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Tue, 6 Dec 2016 22:07:40 +0000 (17:07 -0500)]
Update libafsdep files for in-kernel fortuna
Commit 0d67b00ff9db added heimdal's rand-fortuna PRNG to the kernel
module on all architectures, even though it is only needed on the small
subset that do not provide a cryptographically strong random number
generator to kernel module consumers. This was done to ensure that
the build infrastructure for it gets regularly exercised by developers.
However, not all build infrastructure was exercised at the time of
that submission; in particular, the make_libafs_tree.pl script was
not tested. This led to a situation where the libafs tree generated
by that script omitted several files that were now referenced by
the kernel build due to the fortuna import.
To remedy the situation, list the additional files that are needed,
so that they will be copied into the build area for this class of
kernel module builds.
Since the libafs-tree functionality is used to build the Debian
kernel-module source packages, this fix is needed in order to have
a tree that can be built into debian packages without patching.
This is a little silly, because if rxi_FlushWrite has anything to do,
it just acquires/drops call->lock again.
This seems like a very minor performance penalty, but in the right
situation it can become more noticeable. Specifically, when an Rx call
on the server ends successfully, rx_EndCall will rxi_FlushWrite (to
send out the last Rx packet to the client) before marking the call as
finished. If the client receives the last Rx packet and starts a new
Rx call on the same channel before the server locks the call again,
the client can receive a BUSY packet (because it looks like the
previous call on the server hasn't finished yet). Nothing breaks, but
this means the client waits 3 seconds to retry.
This situation can probably happen with various rates of success in
almost any situation, but I can see it consistently happen with 'vos
move' when running 'vos' on the same machine as the source fileserver.
It is most noticeable when moving a large number of small volumes
(since you must wait an extra 3+ seconds per volume, where nothing is
happening).
To avoid this, create a new variant of rxi_FlushWrite, called
rxi_FlushWriteLocked. This just assumes the call lock is already held
by the caller, and avoids one extra lock/unlock pair. This is not the
only place where we unlock/lock the call during the rx_EndCall
situation described above, but it seems to be easiest to solve, and
it's enough (for me) to avoid the 3-second delay in the 'vos move'
scenario. Ideally, Rx should be able to atomically 'end' a call while
sending out this last packet, but for now, this commit is easy to do.
Note that rxi_FlushWrite previously didn't do much of note before
locking the call. It did call rxi_FreePackets without the call lock,
but calling that with the call lock is also fine; other callers do
that.
Andrew Deason [Mon, 9 Mar 2015 23:01:29 +0000 (18:01 -0500)]
LINUX: Don't compile syscall code with keyrings
osi_syscall_init() is not currently called if we have kernel keyrings
support, since we don't need to set up or alter any syscalls if we
have kernel keyrings (we track PAGs by keyrings, and we use ioctls
instead of the AFS syscall now).
Since we don't call it, this commit makes us also not compile the
relevant syscall-related code. This allows new platforms to be added
without needing to deal with any platform-specific code for handling
32-bit compat processes and such, since usually we don't need to deal
with intercepting syscalls.
To do this, we just define osi_syscall_init and osi_syscall_cleanup as
noops if we have keyrings support. This allows us to reduce the #ifdef
clutter in the actual callers.
Note that the 'afspag' module does currently call osi_syscall_init
unconditionally, but this seems like an oversight. With this change,
the afspag module will no longer alter syscalls when we have linux
keyrings support.
Change-Id: I219b92d89303975765743712587ff897b55a2631
Reviewed-on: https://gerrit.openafs.org/11936 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Marcio Barbosa [Mon, 28 Nov 2016 14:42:44 +0000 (09:42 -0500)]
afs: release the packets used by rx on shutdown
When the OpenAFS client is unmounted on DARWIN, the blocks of packets
allocated by RX are released. Historically, the memory used by those
packets was never properly released.
As we can see, ‘rx_mallocedP’ is a global pointer that stores the
first address of the last allocated block of packets. As a result, when
‘rxi_FreeAllPackets’ is called, only the last block is released.
However, 230dcebcd61064cc9aab6d20d34ff866a5c575ea moved the global
pointer in question to the end of the last block. As a result, when the
OpenAFS client is unmounted on DARWIN, the ‘rxi_FreeAllPackets’
function releases the wrong block of memory. This problem was exposed
on OS X 10.12 Sierra where the system crashes when the OpenAFS client
is unmounted.
To fix this problem, store the address of every single block of packets
in a queue and release one by one when the OpenAFS client is unmounted.
Mark Vitale [Mon, 7 Nov 2016 19:16:50 +0000 (14:16 -0500)]
dir: do not leak contents of deleted directory entries
Deleting an AFS directory entry (afs_dir_Delete) merely removes the
entry logically by updating the allocation map and hash table. However,
the entry itself remains on disk - that is, both the cache manager's
cache partition and the fileserver's vice partitions.
This constitutes a leak of directory entry information, including the
object's name and MKfid (vnode and uniqueid). This leaked information
is also visible on the wire during FetchData requests and volume
operations.
Modify afs_dir_Delete to clear the contents of deleted directory
entries.
Patchset notes:
This commit only prevents leaks for newly deleted entries. Another
commit in this patchset prevents leaks of partial object names upon
reuse of pre-existing deleted entries. A third commit in this
patchset prevents yet another kind of directory entry leak, when
internal buffers are reused to create or enlarge existing directories.
All three patches are required to prevent new leaks. Two additional
salvager patches are also included to assist administrators in the
cleanup of pre-existing leaks.
[kaduk@mit.edu: style nit for sizeof() argument]
Change-Id: Iabaafeed09a2eb648107b7068eb3dbf767aa2fe9
Reviewed-on: https://gerrit.openafs.org/12460 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Mon, 7 Nov 2016 05:29:22 +0000 (23:29 -0600)]
afs: do not leak stale data in buffers
Similar to the previous commit, zero out the buffer when fetching
a new slot, to avoid the possibility of leaving stale data in
a reused buffer.
We are not supposed to write such stale data back to a fileserver,
but this is an extra precaution in case of bugs elsewhere -- memset
is not as expensive as it was in the 1980s.
Change-Id: I344e772e9ec3d909e8b578933dd9c6c66f0a8cf6
Reviewed-on: https://gerrit.openafs.org/12459 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Fri, 13 May 2016 04:01:31 +0000 (00:01 -0400)]
dir: fileserver leaks names of file and directories
Summary:
Due to incomplete initialization or clearing of reused memory,
fileserver directory objects are likely to contain "dead" directory
entry information. These extraneous entries are not active - that is,
they are logically invisible to the fileserver and client. However,
they are physically visible on the fileserver vice partition, on the
wire in FetchData replies, and on the client cache partition. This
constitutes a leak of directory information.
Characterization:
There are three different kinds of "dead" residual directory entry
leaks, each with a different cause:
1. There may be partial name data after the null terminator in a live
directory entry. This happens when a previously used directory entry
becomes free, then is reused for a directory entry with a shorter name.
This may be addressed in a future commit.
2. "Dead" directory entries are left uncleared after an object is
deleted or renamed. This may be addressed in a future commit.
3. Residual directory entries may be inadvertently picked up when a new
directory is created or an existing directory is extended by a 2kiBi
page. This is the most severe problem and is addressed by this commit.
This third kind of leak is the most severe because the leaked
directory information may be from _any_ other directory residing on the
fileserver, even if the current user is not authorized to see that
directory.
Root cause:
The fileserver's directory/buffer package shares a pool of directory
page buffers among all fileserver threads for both directory reads and
directory writes. When the fileserver creates a new directory or
extends an existing one, it uses any available unlocked buffer in the
pool. This buffer is likely to contain another directory page recently
read or written by the fileserver. Unfortunately the fileserver only
initializes the page header fields (and the first two "dot" and "dotdot"
entries in the case of a new directory). Any residual entries in the
rest of the directory page are now logically "dead", but still
physically present in the directory. They can easily be seen on the
vice partition, on the wire in a FetchData reply, and on the cache
partition.
Note:
The directory/buffer package used by the fileserver is also used by the
salvager and the volserver. Therefore, salvager activity may also leak
directory information to a certain extent. The volserver vos split
command may also contribute to leaks. Any volserver operation that
creates volumes (create, move, copy, restore, release) may also have
insignificant leaks. These less significant leaks are addressed by this
commit as well.
Exploits:
Any AFS user authorized to read directories may passively exploit this
leak by capturing wire traffic or examining his local cache as he/she
performs authorized reads on existing directories. Any leaked data will
be for other directories the fileserver had in the buffer pool at the
time the authorized directories were created or extended.
Any AFS user authorized to write a new directory may actively exploit
this leak by creating a new directory, flushing cache, then re-reading
the newly created directory. Any leaked data will be for other
directories the fileserver had in the buffer pool within the last few
seconds. In this way an authorized user may sample current fileserver
directory buffer contents for as long as he/she desires, without being
detected.
Directories already containing leaked data may themselves be leaked,
leading to multiple layers of leaked data propagating with every new or
extended directory.
The names of files and directories are the most obvious source of
information in this leak, but the FID vnode and uniqueid are leaked as
well. Careful examination of the sequences of leaked vnode numbers and
uniqueids may allow an attacker to:
- Discern each layer of old directories by observing breaks in
consecutive runs of vnode and/or uniqueid numbers.
- Infer which objects may reside on the same volume.
- Discover the order in which objects were created (vnode) or modified
(uniqueid).
- Know whether an object is a file (even vnode) or a directory (odd
vnode).
Prevent new leaks by always clearing a pool buffer before using it to
create or extend a directory.
Existing leaks on the fileserver vice partitions may be addressed in a
future commit.
Change-Id: Ia980ada6a2b1b2fd473ffc71e9fd38255393b352
Reviewed-on: https://gerrit.openafs.org/12458 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Sun, 6 Nov 2016 21:06:02 +0000 (15:06 -0600)]
bos: re-add -salvagedirs for use with -all
The MR-AFS support code had a -salvagedirs option that was passed
through to the salvager (when running, and when -all was used),
that was removed in commit a9301cd2dc1a875337f04751e38bba6f1da7ed32
along with the rest of the MR-AFS commands and options.
However, it is useful in its own right, so add it back and allow
the use of -salvagedirs -all to rebuild every directory on the server.
Change-Id: Ifc9c0e4046bf049fe04106aec5cad57d335475e3
Reviewed-on: https://gerrit.openafs.org/12457 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Sun, 6 Nov 2016 20:31:22 +0000 (14:31 -0600)]
dafs: honor salvageserver -salvagedirs
Do not ignore the -salvagedirs option when given to the salvageserver.
When the salvageserver is running with this option, all directories will
be rebuilt by salvages spawned by the dafs salvageserver, including all
demand attach salvages and salvages of individual volumes initiated by
bos salvage.
This does not affect the whole partition salvages initiated by bos
salvage -all.
Change-Id: I4dd515ffa8f962c61e922217bee20bbd88bcd534
Reviewed-on: https://gerrit.openafs.org/12456 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Anders Kaseorg [Sat, 5 Nov 2016 00:17:32 +0000 (20:17 -0400)]
Remove NULL checks for AFS_NONNULL parameters
Recent GCC warns about opr_Assert(p != NULL), where p is an
__attribute__((__nonnull__)) parameter, just like clang did before those
clang warnings were silenced by 11852, 11853.
Now, we could go and add more autoconf tests and pragmas to silence the
GCC versions of these warnings. However, I maintain that silencing the
warnings is the wrong approach. The asserts in question have no
purpose. They do not add any safety, because GCC and clang are
optimizing them away at compile time (without proof!—they take the
declaration at its word that NULL will never be passed). Just remove
them.
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
In file included from casestrcpy.c:17:0:
casestrcpy.c: In function ‘opr_lcstring’:
casestrcpy.c:26:31: error: nonnull argument ‘d’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:26:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c:26:18: error: nonnull argument ‘s’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:26:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c: In function ‘opr_ucstring’:
casestrcpy.c:46:31: error: nonnull argument ‘d’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:46:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c:46:18: error: nonnull argument ‘s’ compared to NULL [-Werror=nonnull-compare]
opr_Assert(s != NULL && d != NULL);
^
/…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^~
casestrcpy.c:46:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(s != NULL && d != NULL);
^~~~~~~~~~
casestrcpy.c: In function ‘opr_strcompose’:
/…/openafs/include/afs/opr.h:28:12: error: nonnull argument ‘buf’ compared to NULL [-Werror=nonnull-compare]
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^
/…/openafs/include/afs/opr.h:37:25: note: in expansion of macro ‘__opr_Assert’
# define opr_Assert(ex) __opr_Assert(ex)
^~~~~~~~~~~~
casestrcpy.c:98:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(buf != NULL);
^~~~~~~~~~
kalocalcell.c: In function ‘ka_CellToRealm’:
/…/openafs/include/afs/opr.h:28:12: error: nonnull argument ‘realm’ compared to NULL [-Werror=nonnull-compare]
do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0)
^
/…/openafs/include/afs/opr.h:37:25: note: in expansion of macro ‘__opr_Assert’
# define opr_Assert(ex) __opr_Assert(ex)
^~~~~~~~~~~~
kalocalcell.c:117:5: note: in expansion of macro ‘opr_Assert’
opr_Assert(realm != NULL);
^~~~~~~~~~
Dave Botsch [Thu, 17 Nov 2016 18:22:17 +0000 (13:22 -0500)]
Mac OS Sierra deprecates syscall()
The syscall() function has been deprecated in MacOS 10.12 - Sierra. After
discussions with developers, it would appear that syscall() isn't really
needed, anymore, so we can just do away with it.
Dave Botsch [Thu, 3 Nov 2016 16:22:21 +0000 (12:22 -0400)]
Define OSATOMIC_USE_INLINED to get usable atomics on DARWIN
In Mac OS 10.12, legacy interfaces for atomic operations have been
deprecated. Defining OSATOMIC_USE_INLINED gets us inline implementations
of the OSAtomic interfaces in terms of the <stdatomic.h> primitives.
This is a transition convenience.
Also indent preprocessor directives within the main DARWIN block to
improve readability.
Michael Meffie [Sat, 5 Nov 2016 16:42:19 +0000 (12:42 -0400)]
SOLARIS: convert from ancient _depends_on to ELF dependencies
The ancient way of declaring module dependencies with _depends_on has
been deprecated since SunOS 2.6 (circa 1996). The presence of the old
_depends_on symbol triggers a warning message on the console starting
with Solaris 12, and the kernel runtime loader (krtld) feature of using
the _depends_on symbol to load dependencies may be removed in a future
version of Solaris.
Convert the kernel module from the ancient _depends_on method to modern
ELF dependencies. Remove the old _depends_on symbol and specify the -dy
and -N <name> linker options to set the ELF dependencies at link time,
as recommended in the Solaris device driver developer guidelines [1].
This commit does not change the declared dependencies, which may be
vestiges of ancient afs versions.
Mark Vitale [Thu, 4 Aug 2016 22:42:27 +0000 (18:42 -0400)]
LINUX: do not use d_invalidate to evict dentries
When working within the AFS filespace, commands which access large
numbers of OpenAFS files (e.g., git operations and builds) may result in
active files (e.g., the current working directory) being evicted from the
dentry cache. One symptom of this is the following message upon return
to the shell prompt:
"fatal: unable to get current working directory: No such file or
directory"
Starting with Linux 3.18, d_invalidate returns void because it always
succeeds. Commit a42f01d5ebb13da575b3123800ee6990743155ab adapted
OpenAFS to cope with the new return type, but not with the changed
semantics of d_invalidate. Because d_invalidate can no longer fail with
-EBUSY when invoked on an in-use dentry. OpenAFS must no longer trust it
to preserve in-use dentries.
Modify the dentry eviction code to use a method (d_prune_aliases) that
does not evict in-use dentries.
Change-Id: I1826ae2a89ef4cf6b631da532521bb17bb8da513
Reviewed-on: https://gerrit.openafs.org/12363 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Wed, 18 May 2016 04:36:12 +0000 (00:36 -0400)]
salvager: fix error message for invalid volumeid
If the specified volumeid is invalid (e.g. volume name was specified
instead of volume number), the error is reported via Log(). However,
commit 24fed351fd13b38bfaf9f278c914a47782dbf670 moved the log opening
logic from before this check to after it, effectively making this Log()
call a no-op.
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
rxperf.c: In function ‘rxperf_server’:
rxperf.c:930:4: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (ptr && *ptr != '\0')
^~
rxperf.c:932:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
break;
^~~~~
rxperf.c: In function ‘rxperf_client’:
rxperf.c:1102:4: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (ptr && *ptr != '\0')
^~
rxperf.c:1104:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
break;
^~~~~
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
curseswindows.c: In function ‘gator_cursesgwin_drawchar’:
curseswindows.c:574:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:576:9: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
curseswindows.c:579:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:581:9: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
curseswindows.c: In function ‘gator_cursesgwin_drawstring’:
curseswindows.c:628:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:630:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
curseswindows.c:633:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (params->highlight)
^~
curseswindows.c:635:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
if (code)
^~
Anders Kaseorg [Sat, 5 Nov 2016 00:44:00 +0000 (20:44 -0400)]
src/afsd/afsd.c: Fix misleading indentation
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
afsd.c: In function ‘afsd_run’:
afsd.c:2176:6: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (enable_rxbind)
^~
afsd.c:2178:3: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
afsd_syscall(AFSOP_ADVISEADDR, code, addrbuf, maskbuf, mtubuf);
^~~~~~~~~~~~
afsd.c:2487:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (afsd_debug)
^~
afsd.c:2490:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
afsd_syscall(AFSOP_GO, 0);
^~~~~~~~~~~~
Anders Kaseorg [Sat, 5 Nov 2016 00:39:34 +0000 (20:39 -0400)]
src/ubik/uinit.c: Fix misleading indentation
Fixes this warning (error with --enable-checking) from GCC 6.2:
uinit.c: In function ‘internal_client_init’:
uinit.c:96:2: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (code)
^~
uinit.c:98:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
return code;
^~~~~~
Anders Kaseorg [Sat, 5 Nov 2016 00:38:08 +0000 (20:38 -0400)]
src/rx/rx_packet.c: Fix misleading indentation
Fixes these warnings (errors with --enable-checking) from GCC 6.2:
rx_packet.c: In function ‘rxi_ReceiveDebugPacket’:
rx_packet.c:2009:9: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (rx_stats_active)
^~
rx_packet.c:2011:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
s = (afs_int32 *) & rx_stats;
^
rx_packet.c:2017:9: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (rx_stats_active)
^~
rx_packet.c:2019:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
rxi_SendDebugPacket(ap, asocket, ahost, aport, istack);
^~~~~~~~~~~~~~~~~~~
Anders Kaseorg [Sat, 5 Nov 2016 00:36:51 +0000 (20:36 -0400)]
src/rxgen/rpc_parse.c: Fix misleading indentation
Fixes this warning (error with --enable-checking) from GCC 6.2:
rpc_parse.c: In function ‘analyze_ProcParams’:
rpc_parse.c:861:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
if (tokp->kind != TOK_RPAREN)
^~
rpc_parse.c:863:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’
*tailp = decls;
^
Anders Kaseorg [Sat, 5 Nov 2016 00:18:52 +0000 (20:18 -0400)]
regen.sh: Use libtoolize -i, and .gitignore generated build-tools
Recent libtoolize actually deletes build-tools/missing, which Git was
treating as a change to the working copy. Besides, we should let
libtoolize copy in its more recent version of config.guess, config.sub,
and install-sh.
Mark Vitale [Thu, 4 Aug 2016 22:18:15 +0000 (18:18 -0400)]
LINUX: split dentry eviction from osi_TryEvictVCache
To make osi_TryEvictVCache clearer, and to prepare for a future change
in dentry eviction, split the dentry eviction logic into its own routine
osi_TryEvictDentries.
No functional difference should be incurred by this commit.
Change-Id: I5b255fd541d09159d70f8d7521ca8f2ae7fe5c2b
Reviewed-on: https://gerrit.openafs.org/12362 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Joe Gorse <jhgorse@gmail.com>
Mark Vitale [Thu, 20 Oct 2016 04:49:37 +0000 (00:49 -0400)]
Linux 4.9: inode_change_ok() becomes setattr_prepare()
Linux commit 31051c85b5e2 "fs: Give dentry to inode_change_ok() instead
of inode" renames and modifies inode_change_ok(inode, attrs) to
setattr_prepare(dentry, attrs).
Mark Vitale [Fri, 16 Sep 2016 23:01:19 +0000 (19:01 -0400)]
Linux 4.9: inode_operation rename now takes flags
In Linux 3.15 commit 520c8b16505236fc82daa352e6c5e73cd9870cff,
inode_operation rename2() was added. It takes the same arguments as
rename(), with an added flags argument supporting the following values:
RENAME_NOREPLACE: if "new" name exists, fail with -EEXIST. Without
this flag, the default behavior is to replace the "new" existing file.
RENAME_EXCHANGE: exchange source and target; both must exist.
OpenAFS never implemented a .rename2() routine because it was optional
when introduced at Linux v3.15.
In Linux 4.9-rc1 the following commits remove the last in-tree uses of
.rename() and converts .rename2() to .rename(). aadfa8019e81 vfs: add note about i_op->rename changes to porting 2773bf00aeb9 fs: rename "rename2" i_op to "rename" 18fc84dafaac vfs: remove unused i_op->rename 1cd66c93ba8c fs: make remaining filesystems use .rename2 e0e0be8a8355 libfs: support RENAME_NOREPLACE in simple_rename() f03b8ad8d386 fs: support RENAME_NOREPLACE for local filesystems
With these changes, it is now mandatory for OpenAFS afs_linux_rename()
to accept a 5th flag argument.
Add an autoconfig test to determine the signature of .rename(). Use this
information to implement afs_linux_rename() with the appropriate number
of arguments. Implement "toleration support" for the flags option by
treating a zero flag as a normal rename; if any flags are specified,
return -EINVAL to indicate the OpenAFS filesystem does not yet support
any flags.
Mark Vitale [Wed, 14 Sep 2016 22:01:22 +0000 (18:01 -0400)]
Linux 4.9: deal with demise of GROUP_AT
Linux commit 81243eacfa40 "cred: simpler, 1D supplementary groups"
refactors the group_info struct, removing some members (which OpenAFS
references only through the GROUP_AT macro) and adding a gid member.
The GROUP_AT macro is also removed from the tree.
Add an autoconfigure test for the new group_info member gid and define a
replacement GROUP_AT macro to do the right thing under the new regime.
Anders Kaseorg [Sun, 9 Oct 2016 10:39:12 +0000 (06:39 -0400)]
tests/util/ktime-t.c: Specify EST offset in TZ
This fixes test failures observed on new Debian build servers that no
longer install tzdata by default. As the tests expect, EST is defined
as UTC−05:00 with no daylight saving time.
Michael Meffie [Mon, 22 Aug 2016 23:53:34 +0000 (19:53 -0400)]
tests: avoid passing NULL strings to vprintf
Some libc implementations will crash when NULL string arguments are given to
*printf. Avoid passing NULL string arguments in the make check tests that did
so, and pass the string "(null)" instead.
Michael Meffie [Sat, 6 Aug 2016 14:41:24 +0000 (10:41 -0400)]
afsd: fix afsd -help crash
afsd crashes after the usage is displayed with the -help option.
$ afsd -help
Usage: ./afsd [-blocks <1024 byte blocks in cache>] [-files <files in cache>]
...
Segmentation fault (core dumped)
The backtrace shows the crash occurs when calling afsconf_Open() with an
invalid pointer argument, even though afsconf_Open() is not even needed
when -help is given.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:32
#1 0x00007ffff726fc36 in *__GI___strdup (s=0x0) at strdup.c:42
#2 0x0000000000408383 in afsconf_Open (adir=0x0) at cellconfig.c:444
#3 0x00000000004054d5 in afsd_run () at afsd.c:1926
#4 0x0000000000407dc5 in main (argc=2, argv=0x7fffffffe348) at afsd_kernel.c:577
afsconf_Open() is called with an uninitialized pointer because commit d72df5a18e0bb8bbcbf23df3e8591072f0cdb770 changed the libcmd
cmd_Dispatch() to return 0 after displaying the command usage when the
-help option is specified. (That fix was needed for scripts which use
the -help option to inspect command options with the -help option.)
The afsd_kernel main function then incorrectly calls the afsd_run()
function, even though mainproc() was not called, which sets up the afsd
option variables. The afsconf_Open() is the first function we call in
afsd_run().
Commit f77c078a291025d593f3170c57b6be5f257fc3e5 split afsd into afsd.c
and afsd_kernel.c to support libuafs (and fuse). This split the parsing
of the command line arguments and the running of the afsd command into
two functions. The mainproc(), which originally did both, was split
into two functions; one (still called mainproc) to check the option
values given and setup/auto-tune values, and another (called afsd_run)
to do the actual running of the afsd command. The afsd_parse() function
was introduced as a wrapper around cmd_Dispatch() which "dispatches"
mainproc.
With this fix, take the opportunity to rename mainproc() to the now more
accurately named CheckOptions() and change afsd_parse() to parse the
command line options with cmd_Parse(), instead of abusing
cmd_Dispatch().
Change the main fuction to avoid running afsd_run() when afsd_parse()
returns the CMD_HELP code which indicates the -help option was given.
afsd.fuse splits the command line arguments into afsd recognized options
and fuse options (everything else), so only afsd recognized arguments
are passed to afsd_parse(), via uafs_ParseArgs(). The -help argument is
processed as part of that splitting of arguments, so afsd.fuse never
passes -help as an argument to afsd_parse(). This means we to not need
to check for CMD_HELP as a return value from uafs_ParseArgs(). But
since this is all a bit confusing, at least check the return value in
uafs_ParseArgs().
Michael Meffie [Tue, 2 Aug 2016 20:52:42 +0000 (16:52 -0400)]
revert: "LINUX: Fix oops during negative dentry caching"
Commit fd23587a5dbc9a15e2b2e83160b947f045c92af1 was done to fix an oops
when parent_vcache_dv() was called without the GLOCK held. Since the
lockless code paths have been removed, and parent_vcache_dv() is always
called with the GLOCK held, revert the extra locked flag argument and
the calls obtain and release the GLOCK within parent_vcache_dv().
This commit made it possible to execute afs_linux_dentry_revalidate
without taking the GLOCK under some circumstances. However, it
achieved this by examining structure members outside of the GLOCK that
were previously only examined under the GLOCK (such as vcp->f.states
and vcp->f.m.DataVersion).
While that does of course improve performance, it is not known to be
completely safe. Revert this commit so we may implement a fastpath
through afs_linux_dentry_revalidate using more trusted lockless
techniques (atomics, RCU, etc).
Andrew Deason [Thu, 26 Jun 2014 22:47:46 +0000 (15:47 -0700)]
afs: Create afs_SetDataVersion
Several different places in the codebase change avc->f.m.DataVersion
for a particular vcache, when we've noticed that the DV for the vcache
has changed. Consolidate all of these occurrences into a single
afs_SetDataVersion function, to make it easier to change what happens
when we notice a change in DV number.
This should incur no behavior change; it is just simple code
reorganization.
Change-Id: I5dbf2678d3c4b5a2fbef6ef045a0b5bfa8a49242
Reviewed-on: https://gerrit.openafs.org/11791 Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Daria Phoebe Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Thomas Keiser <tkeiser@gmail.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Mon, 23 May 2016 02:54:30 +0000 (21:54 -0500)]
ubik: Return an error from ContactQuorum when inquorate
Currently, when we need to contact all other servers in the ubik
quorum (to create a write transaction, and send db changes, etc), we
call the ContactQuorum_* family of functions. To contact each server,
those functions follow an algorithm like the following pseudocode:
{
int rcode = 0;
int code;
int okcalls = 0;
for (ts = ubik_servers; ts; ts = ts->next) {
if (ts->up) {
code = contact_server(ts);
if (code) {
rcode = code;
} else {
okcalls++;
}
}
}
This means that if we successfully contact a majority of ubik sites,
we return success, even if some sites returned an error. If most sites
fail, then we return an error (we arbitrarily pick the last error we
got).
This means that in most situations, a successful write transaction is
guaranteed to have been transmitted to a majority of ubik sites, so
the written data cannot be lost (at least one of the sites that got
the new data will be in a future elected quorum).
However, if a site is already known to be down (ts->up is 0), then we
skip trying to contact that site, but we also don't set any errors.
This means that if a majority of sites are already known to be down
(ts->up is 0), then we can indicate success for a write transaction,
even though the relevant data has not been written to a majority of
sites. In that situation, it is possible to lose data.
Most of the time this is not possible, since a majority of sites must
be 'up' for the sync site to be elected and to allow write
transactions at all. There are a few ways, though, in which we can get
into a situation where most other sites are 'down', but we still let a
write transaction go through.
An example scenario:
Say we have sites A, B, and C. All 3 sites come up at the same time,
and A is the lowest IP so it starts an election (after around BIGTIME
seconds). Right after A is elected the sync site, sites B and C will
have 'lastYesState' set to 0, since site A hasn't yet sent out a
beacon as the sync site.
A client can then start a write to the ubik database on site A, which
site A will allow since it's the sync site (and presumably all the
relevant recovery flags are set). Site A will try to contact sites B
and C for a DISK_Begin call, but lastYesState is set to 0 on those
sites. This will cause DISK_Begin to return UNOQUORUM
(urecovery_AllBetter will return 0, because uvote_HaveSyncAndVersion
will return 0, because lastYesState is not set).
So site A will get a UNOQUORUM error from sites B and C, and so site A
will set 'ts->up' to 0 for sites B and C, and will return UNOQUORUM to
the client. The client may then try to retry the call (because
UNOQUORUM is not treated as a 'global' error in ubikclient.c's
ubik_Call_New), or another client write request could come in. Now
that 'ts->up' is unset for both sites B and C, we skip trying to
contact any remote sites, and the ContactQuorum functions will return
success. So the ubik write will go through successfully, but the new
data will only be on site A.
At this point, if site A crashes, then sites B and C will elect a
quorum, and will not have the modifications that were written to site
A (so the data written to site A is lost). If site A stays up, then it
will go through database recovery, sending the entire database file to
sites B and C.
In addition, it's very possible in this scenario for a client to write
to the database, and then try to read back data and confusingly get a
different result. For example, if someone issues the following two
commands while triggering the above scenario:
$ pts createuser testuser
$ pts examine testuser
If the second command contacts site B or C, then it will always fail,
saying that the user doesn't exist (even though the first command
succeeded). This is because sites B and C don't have the new data
written to site A, at least temporarily. While this confusing behavior
is not completely avoidable in ubik (this can always happen
'sometimes' due to network errors and such), with the scenario
described here, it happens 100% of the time.
The general scenario described above can also happen if sites B and C
are suddenly legitimately unreachable from site A, instead of throwing
the UNOQUORUM error. All of the steps are pretty much the same, but
there is a bit of a delay while we wait for the DISK_Begin call to
fail.
To fix this, do not let 0 be returned if a quorum has not been
reached. In some sense, UNOQUORUM could *always* be returned in
that case, but it is more in keeping with historical behavior to
return a "real" error if there is one available.
It is somewhat questionable whether we should even be propagating
errors received from calls like DISK_Begin/DISK_Commit to the ubik
client (e.g. if we get a -1 from trying to contact a remote site, we
return -1 to the client, so the client may think it couldn't reach the
site at all). But this commit does not change any of that logic, and
should only change behavior when a majority of sites have 'ts->up'
unset. A later commit might effect the change to always return
UNOQUORUM and ignore the actual error values from the DISK_ calls,
but that is not needed to fix the immediate issue.
An important note:
Before this commit, there was a window of about 15 seconds after a
sync site is elected where a write to the ubik db would appear to be
successful, but would only modify the ubik db on the sync site.
(Details described above.) With this commit, writes during that
15-second window will instead fail, because we cannot guarantee that
we won't lose that data. If someone relies on 'udebug' data from the
sync site to let them know when writes will go through successfully,
this commit could appear to cause new errors.
[kaduk@mit.edu: transfer long commit message describing the issue
from an alternative fix, and tidy up accordingly]
Change-Id: If6842d7122ed4d137f298f0f8b7f20350b1e9de6
Reviewed-on: https://gerrit.openafs.org/12289 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
There are some variations here and there, but all locations usually
involve at least some code like that. But they all do the same general
thing: invalidate a vcache so we hit the net the next time we need
that vcache.
In order to make it easier to modify what happens when we invalidate a
vcache, and just to improve the code, take all of these instances and
put the functionality in a single function, called afs_StaleVCache,
which marks the vcache as 'stale'.
To handle a few different situations that must be handled, we have
some flags that can also be passed to the new function. These are
primarily necessary to handle variations in the circumstances under
which we hit this code path; for instance, we may already have
afs_xcbhash locked, or we may be invalidating the entire osidnlc (if
we're invalidating vcaches in bulk, for example).
This should result in the same general behavior in all cases. The only
slight differences in a few cases is that we hold locks for a few more
operations than we used to; for example, we may clear an osidnlc entry
while holding the vcache lock. But these are minor and shouldn't
result in any actual differences in behavior.
So, this commit should just be code reorganization and should incur no
behavior change. However, this reorganization is complex, and should
not be considered a simple risk-free refactoring.
[kaduk@mit.edu: implement Tom Keiser's suggestion of a third argument
to afs_StaleVCacheFlags, add AFS_STALEVC_CLEARCB and
AFS_STALEVC_SKIP_DNLC_FOR_INIT_FLUSHED]
Change-Id: I2b2f606c56d5b22826eeb98471187165260c7b91
Reviewed-on: https://gerrit.openafs.org/11790 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Wed, 17 Aug 2016 14:57:48 +0000 (10:57 -0400)]
CODING: one-line if statements should not have braces
Update the style guide with a declaration of the prevailing and
preferred brace style for one-line if statements and loops. Provide an
example and counter-example.
Change-Id: Iafeea977203b76c0e67385779fb4ed57f3c6699a
Reviewed-on: https://gerrit.openafs.org/12370 Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Thu, 11 Jun 2015 15:25:51 +0000 (11:25 -0400)]
libafs: update the volume setup time when the vldb is rechecked
The vldb is rechecked when the fileserver returns certain error codes,
such as VMOVED. When the vldb is rechecked, update the volume
setupTime to reflect the most recent time the volume vldb information
is known to be correct.
Be sure the VRecheck flag is cleared after checking the vldb, since
the volume write lock was dropped after finding the volume.
Change-Id: I0ba389ee408de602e0059fbe8013012501c337d3
Reviewed-on: https://gerrit.openafs.org/11897 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Sat, 8 Aug 2015 21:13:54 +0000 (16:13 -0500)]
afs: Make ONEGROUP_ENV not Linux-specific
The functionality in AFS_LINUX26_ONEGROUP_ENV does not really need to
be Linux-specific (it's just only implemented for Linux right now).
Rename it to AFS_PAG_ONEGROUP_ENV, and remove some Linux-specific
checks when checking for "onegroup" PAG GIDs.
[mmeffie@sinenomine.net: Move AFS_PAG_ONEGROUP_ENV to param.h]
Michael Meffie [Wed, 29 Apr 2015 16:00:24 +0000 (12:00 -0400)]
afs: add afsd -inumcalc option
This commit adds the afsd -inumcalc command line switch to specify the
inode number calculation method in a platform neutral way.
Inode numbers reported for files within the AFS filesystem are generated
by the cache manager using a calculation which derives a number from a
FID. Long ago, a new type of calculation was added which generates inode
numbers using a MD5 message digest of the FID. The MD5 inode number
calculation variant is computationally more expensive but greatly
reduces the chances for inode number collisions.
The MD5 calculation can be enabled on the Linux cache manager using the
Linux sysctl interface. Other than the sysctl method of selecting the
inode calculation type, the MD5 inode number calculation method is not
specific to Linux.
This change introduces a command-line option which accepts a value to
indicate the calculation method, instead of a simple flag to enable MD5
inode numbers. This should allow for new inode calculation methods
in the future without the need for additional afsd command-line flags.
Two values are currently accepted for -inumcalc. The value of 'compat'
specifies the legacy inode number calculation. The value 'md5' indicates
that the new MD5 calculation is to be used.
Benjamin Kaduk [Wed, 13 Jul 2016 23:23:50 +0000 (18:23 -0500)]
Document minimum supported compiler versions
Pick some fairly old versions of clang and gcc and document them
as the minimum supported version. This will let us make assumptions
about compiler features that are available when using those compilers.
Change-Id: Ibb8df72c9b12cc7adff39ece9708a428975ba703
Reviewed-on: https://gerrit.openafs.org/12331 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Anders Kaseorg [Tue, 26 Jul 2016 01:04:59 +0000 (21:04 -0400)]
Linux 4.7: Follow key_alloc API change
Linux v4.7-rc1~124^2~2^2^2~9 adds an eighth optional argument
restrict_link. The same commit adds a KEY_ALLOC_BYPASS_RESTRICTION
macro, which we test so we can avoid adding another configure test.
Change-Id: I83e27b54ba5711124dccaa41de7155be77054f47
Reviewed-on: https://gerrit.openafs.org/12345 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Fri, 27 May 2016 20:44:17 +0000 (16:44 -0400)]
SOLARIS: corrupted content of mmap'd files over 4GiB
Many Solaris programs and utilities (notably mdb and cp) use mmap() in
their implementation. When AFS files exceeding 4GiB are mmap'd, the
contents of the file will be incorrectly mapped into memory. Starting at
4GiB + 1, the first 4GiB will be repeated for the remainder of the file.
If the mmap'd file is written back to storage (AFS or otherwise), the
newly created file will also be corrupted.
This is due to a bug in the afs_map() routine that supports mmap() of
AFS files on Solaris. The segvn_crarg.offset passed to the Solaris
virtual memory APIs is incorrectly cast to u_int, causing it to wrap at
4GiB.
Although Solaris passes the offset from fop_map() to afs_map() as type
offset_t, the destination segvn_crargs.offset is actually type
u_offset_t. Existing examples of other Solaris filesystems (e.g.
zfs_map() ) cast the offset from offset_t to u_offset_t when assigning to
segvn_crargs.offset. If it's good enough for ZFS, it's good enough for
AFS.
Correctly cast the offset to u_offset_t.
Thanks to Robert Milkowski for the report and diagnosis.
Change-Id: Id25363255ec011f2ad7e003ca3e4a1385bebff7e
Reviewed-on: https://gerrit.openafs.org/12292 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Mark Vitale [Thu, 26 May 2016 20:53:47 +0000 (16:53 -0400)]
SOLARIS: support mmap() over 4GiB
When mmap() is issued for exactly 4GiB of a large AFS-resident file,
mmap() fails with ENOMEM. This is because the AFS code is handling the
requested length as u_int instead of size_t, resulting in a 0 being
passed back to the caller.
When mmap() is issued for non-multiples of 4GiB, the subsequent mapping
will not contain all the requested pages, and for the same reason - the
mapped size has been truncated to 32 bits. This results in SIGSEGV when
accessing the non-mapped page(s).
Fix the signature of afs_map() to specify the correct type for the length.
Thanks to Robert Milkowski for the report and diagnosis.
Change-Id: I8a9f0cb04ff9b80de5516e14d0679b06ef0b3f9a
Reviewed-on: https://gerrit.openafs.org/12291 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
The automatically generated pkgbuild.sh file should not be tracked by
git. To fix this problem, add the name of this file to the proper
.gitignore file.
Change-Id: I9bdbad8e7cc02926de61e337ccb94d8a2c27ae43
Reviewed-on: https://gerrit.openafs.org/12343 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Andrew Deason [Sun, 1 May 2016 16:24:30 +0000 (11:24 -0500)]
ubik: Don't RECFOUNDDB if can't contact most sites
Currently, the ubik recovery code will always set UBIK_RECFOUNDDB
during recovery, after asking all other sites for their dbversions.
This happens regardless of how many sites we were actually able to
successfully contact, even if we couldn't contact any of them.
This can cause problems when we are unable to contact a majority of
sites with DISK_GetVersion. Since, if we haven't contacted a majority
of sites, we cannot say with confidence that we know what the best db
version available is (which is what UBIK_RECFOUNDDB represents; that
we've found which database is the one we should be using). This can
also result in UBIK_RECHAVEDB in a similar situation, indicating that
we have the best db version locally, even though we never actually
asked anyone else what their db version was.
For example, say site A is the sync site going through recovery, and
DISK_GetVersion fails for the only other sites B and C. Site A will
then set UBIK_RECFOUNDDB, and will claim that site A has the best db
version available (UBIK_RECHAVEDB). This allows site A to process ubik
write transactions (causing the db to be labelled with a new epoch),
or possibly to send the db to the other sites via DISK_SendFile, if
they quickly become available during recovery. Ubik write transactions
can succeed in this situation, because our ContactQuorum_* calls will
succeed if we never try to contact a remote site ('rcode' defaults to
0).
This situation should be rather rare, because normally a majority of
sites must be reachable by site A for site A to be voted the sync site
in the first place. However, it is possible for site A to lose
connectivity to all other sites immediately after sync site election.
It is also possible for site A to proceed far enough in the recovery
process to set UBIK_RECHAVEDB before it loses its sync site status.
As a result of all of this, if a site with an old database comes
online and there are network connectivity problems between the other
sites and a ubik write request comes in, it's possible for the "old"
database to overwrite the "new" database. This makes it look as if the
database has "rolled back" to an earlier version.
This should be possible with any ubik database, though how to actually
trigger this bug can change due to different ubik servers setting
different network timeouts. It is probably the most likely with the
VLDB, because the VLDB is typically the most frequently written
database.
If a VLDB reverts to an earlier version, it can result in existing
volumes to appear to not exist in the VLDB, and can result in new
volumes re-using volume IDs from existing volumes. This can result in
rather confusing errors.
To fix this, ensure that we have contacted a majority of sites with
DISK_GetVersion before indicating that we have located the best db
version. If we've contacted a majority of sites, then we are
guaranteed (under ubik assumptions) that we've found the best version,
since previous writes to the database should be guaranteed to hit a
majority of sites (otherwise they wouldn't be successful).
If we cannot reach a majority of sites, we just don't set
UBIK_RECFOUNDDB, and the recovery process restarts. Presumably on the
next iteration we'll be able to contact them, or we'll lose sync site
status if we can't reach the other sites for long enough.
Andrew Deason [Fri, 13 May 2016 02:34:31 +0000 (21:34 -0500)]
vlserver: rx_SetRxDeadTime before ubik init
Currently, vlserver calls rx_SetRxDeadTime to set the default rx
deadtime to 50 seconds, but it does so after calling
ubik_ServerInitByInfo. ubik_ServerInitByInfo creates several rx
connections before it returns, and so these connections get the
default rx deadtime (12 seconds), instead of the 50 seconds vlserver
tries to set.
When ubik detects that a remote site is down, ubik recreates the rx
connections for that site, and this new connection gets the new
deadtime of 50 seconds.
This means that ubik behavior can have different timings in the
vlserver, depending on if any remote sites have ever been detected as
being 'down' or not. This can result in seemingly-inconsistent or
confusing behavior, since some sequences of operations that appear
identical can produce different results, depending on if the 12-second
timeout or the 50-second timeout is being used.
This behavior is not directly to blame for any problems, but it can be
very confusing, especially when trying to diagnose or reproduce bugs.
So to make things more consistent, just call rx_SetRxDeadTime earlier,
so all conns always get the 50-second timeout.
In order to do this, though, we must also ensure that rx_Init is
called before rx_SetRxDeadTime (otherwise, rx_Init will overwrite our
configured deadtime). So also call rx_Init earlier; rx_Init is
idempotent, so it's okay that it may be called again after or before
this.
Note that vlserver is currently the only ubik server that sets a
deadtime of 50 seconds, and it's not clear why. Another way to solve
this is to just remove the call to rx_SetRxDeadTime, to make vlserver
behave more similar to ptserver. But this commit takes a conservative
approach to result in a deadtime that is probably the most common in
current use. Since, most long-running vlservers will probably
eventually lose contact with remote sites at one time or another, and
so will eventually use a deadtime of 50 seconds.
Change-Id: I49430144d9a62eb8cad1509c1aeafc9fcc927f8e
Reviewed-on: https://gerrit.openafs.org/12285 Tested-by: Andrew Deason <adeason@dson.org> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
macos: use pkgbuild to build the package on 10.10/10.11
PackageMaker is no longer part of OS X. As a result, it
is not possible to build the package on OS X 10.10 and
OS X 10.11 using the existing code.
To solve this problem, a new script, along with a couple
of new files, are provided.
- pkgbuild.sh
This script uses the command line tools pkgbuild and
productbuild to build the package on OS X 10.10 and
OS X 10.11. By default, the package built by this
script will not be signed. Optionally, the package
might be signed.
- Distribution.xml
This file is nothing more than an XML file used by
productbuild. It is mainly used to configure how the
installer will look and behave.
- conclusion.txt
Contains the text that is displayed by Installer at
the end of the installation process. Only used by
El Capitan and further.
- Uninstall.14.15
This script can be used by OS X 10.10/10.11 users
to uninstall OpenAFS.
Notes:
- This work is based on a patch made by Brandon Allbery
<ballbery@sinenomine.net> with fixes and updates from
Andrew Deason <adeason@dson.org>.
- El Capitan and further prevent us from touching
/usr/bin directly. As a result, /opt is used.
- If the package is not signed, the user will have
to disable the OS X security protections. Otherwise,
the client will not work.
- Now we have two different scripts to build the
package on OS X. For OS X 10.10 and newer versions,
pkgbuild.sh will be used. For older versions,
the existing buildpkg.sh will be used.