Michael Meffie [Thu, 10 Sep 2015 01:26:23 +0000 (21:26 -0400)]
salvager: convert salvager and salvagerserver to libutil logging
Use the libutil logging facility in the salvager and DAFS salvageserver
in order to have consistent logging features and time stamp formats with
the other OpenAFS servers.
Michael Meffie [Thu, 27 Aug 2015 17:06:05 +0000 (13:06 -0400)]
afs: shake harder in shake-loose-vcaches
Linux based cache managers will allocate vcaches on demand and
deallocate batches of vcaches in the background. This feature is called
dynamic vcaches.
Vcaches to be deallocated are found by traversing the vcache LRU list
(VLRU) from the oldest vcache to the newest. Up to a target number of
vcaches are attempted to be evicted. The afs_xvcache lock protecting
the VLRU may be dropped and re-acquired while attempting to evict a
vcache. When this happens, it is possible the VLRU may have changed, so
the traversal of the VLRU is restarted. This restarting of the VLRU
transversal is limited to 100 iterations to avoid looping indefinitely.
Vcaches which are busy cannot be evicted and remain in the VLRU. When a
busy cache was not evicted and the afs_xvache lock was dropped, the VLRU
traversal is restarted from the end of the VLRU. When the busy vcache is
encountered on the retry, it will trigger additional retries until the
loop limit is reached, at which point the target number of vcaches will
not be deallocated.
This can leave a very large number of unbusy vcaches which are never
deallocated. On a busy machine, tens of millions of unused vcaches can
remain in memory. When the busy vcache at the end of the VLRU is finally
evicted, the log jam is broken, and the background deamon will hold the
afs_xvcache lock for an excessively long time, hanging the system.
Fix this by moving busy vcaches to the head of the VLRU before
restarting the VLRU traversal. These busy vcaches will be skipped when
retrying the VLRU traversal, allowing the cache manager to make progress
deallocating vcaches down to the target level.
This was already done on the mac osx platform while attempting to evict
vcaches. Move the code to move busy vcaches to the head of the VLRU up
the the platform agnostic caller.
Thanks to Andrew Deason for the initial version of this patch.
Change-Id: I7768d00604e56d8d5369ac5215f7c2ab7996c4eb
Reviewed-on: https://gerrit.openafs.org/11654 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Thu, 25 Feb 2016 23:49:20 +0000 (18:49 -0500)]
LINUX: hold vcache while dropping dcache refs
Hold a reference on a vcache while attempting to evict the inode from
the dcache. Since the afs_xvcache lock is dropped, it could be possible
for the vcache to be flushed during this time, making it unsafe to use
the vcache after the eviction attempt.
Change-Id: I9d91db98387b7aaa986ed915420c6cafb4f12438
Reviewed-on: https://gerrit.openafs.org/12206 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Stephan Wiesand [Thu, 7 Apr 2016 08:58:30 +0000 (10:58 +0200)]
Linux: Fix misleading indentation and other whitespace
Commit 7edc6694e7632c9736bd1516935604a638165313 introduced a
misleading indentation of a line in afs_linux_prefetch. Correct
it, and once here remove trailing whitespace throughout the file.
Stephan Wiesand [Tue, 8 Mar 2016 13:15:17 +0000 (14:15 +0100)]
Linux 4.4: Do not use splice()
splice() may return -ERESTARTSYS if there are pending signals, and
it's not even clear how this should be dealt with. This potential
problem has been present for a long time, but as of Linux 4.4
(commit c725bfce7968009756ed2836a8cd7ba4dc163011) seems much more
likely to happen.
Until resources are available to fix the code to handle such errors,
avoid the riskier uses of splice().
If there is a default implementation of file_splice_{write,read},
use that; on somewhat older kernels where it is not available,
use the generic version instead.
[kaduk@mit.edu: add test for default_file_splice_write]
Change-Id: Ib4477cdfb2cd0f49f516da75edc3cb9d1a8817dc
Reviewed-on: https://gerrit.openafs.org/12217 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Michael Laß [Mon, 18 Jan 2016 17:29:00 +0000 (18:29 +0100)]
Linux 4.4: key_payload has no member 'value'
In Linux 4.4 (146aa8b1453bd8f1ff2304ffb71b4ee0eb9acdcc) type-specific and
payload data have been merged. The payload is now accessed directly and has
no 'value' member anymore.
FIXES 132677
Change-Id: Id26c40c80314a0087ecc0735029412787058ef07
Reviewed-on: https://gerrit.openafs.org/12169 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Chas Williams [Mon, 23 Nov 2015 19:15:08 +0000 (14:15 -0500)]
rxgen: Don't use size_t in struct rx_opaque with XDR
OpenAFS's XDR doesn't support size_t at this time. For now, use a
temporary stack variable to avoid 32/64-bit issues and copy back the
returned value upon success.
Change-Id: Ia3dd8abd665a19e04aa611f940728d088a8f87b7
Reviewed-on: https://gerrit.openafs.org/12115 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Fri, 18 Mar 2016 14:22:33 +0000 (10:22 -0400)]
doc: fs examine no longer requires read rights on the volume root vnode
Update the man page to reflect the current access rights required for fs
examine. Historically, fs examine required read access on the root
vnode of the volume housing the directory or file being examined. This
access check was relaxed in commit d2d591caf2c9b4cf2ebae708cc9b4c8b78ca5a5a,
since the information returned by the file server is already available
anonymously by other means.
Change-Id: If62b625bce8a260b98fb56a6feec49c674f2de53
Reviewed-on: https://gerrit.openafs.org/12223 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Tue, 15 Mar 2016 04:15:20 +0000 (23:15 -0500)]
OPENAFS-SA-2016-002 ListAddrByAttributes information leak
The ListAddrByAttributes structure is used as an input to the GetAddrsU
RPC; it contains a Mask field that controls which of the other fields
will actually be read by the server during the RPC processing.
Unfortunately, the client only wrote to the fields indicated by the
mask, leaving the other fields uninitialized for transmission on the
wire, leaking some contents of client memory.
Plug the information leak by zeroing the entire structure before use.
Benjamin Kaduk [Tue, 15 Mar 2016 04:15:20 +0000 (23:15 -0500)]
OPENAFS-SA-2016-002 VldbListByAttributes information leak
The VldbListByAttributes structure is used as an input to several
RPCs; it contains a Mask field that controls
which of the other fields will actually be read by the server
during the RPC processing. Unfortunately, the client only
wrote to the fields indicated by the mask, leaving the other
fields uninitialized for transmission on the wire, leaking
some contents of client memory.
Plug the information leak by zeroing the entire structure before use.
Benjamin Kaduk [Tue, 15 Mar 2016 04:15:20 +0000 (23:15 -0500)]
OPENAFS-SA-2016-002 AFSStoreVolumeStatus information leak
The AFSStoreVolumeStatus structure is used as an input to the
RXAFS_SetVolumeStatus RPC; it contains a Mask field that controls
which of the other fields will actually be read by the server
during the RPC processing. Unfortunately, the client only
wrote to the fields indicated by the mask, leaving the other
fields uninitialized for transmission on the wire, leaking
some contents of kernel memory.
Plug the information leak by zeroing the entire structure before use.
Benjamin Kaduk [Sun, 13 Mar 2016 17:56:24 +0000 (12:56 -0500)]
OPENAFS-SA-2016-002 AFSStoreStatus information leak
Marc Dionne reported that portions of the AFSStoreStatus structure
were not written to before being sent over the network for
operations such as create, symlink, etc., leaking the contents
of the kernel stack to observers. Which fields in the request
are used are controlled by a flags field, and so if a field was
not going to be used by the server, it was sometimes left
uninitialized.
Fix the information leak by zeroing out the structure before use.
Jeffrey Altman [Thu, 10 Mar 2016 02:38:10 +0000 (20:38 -0600)]
OPENAFS-SA-2016-001 group creation by foreign users
CVE-2016-2860:
In AFS 3.3 as part of the addition of the cross-cell support for foreign
user auto-registration a bug was introduced that permits foreign users
to create arbitrary groups as if they were system administrators. This
permits the groups to be created without any group quota checks, and
using group names that non-administrators would not normally be able to
create, such as groups with the "system:" prefix or groups with no colon
(that is, in the namespace for users).
Additionally, all entries created using the auto-registration service
were marked as being created by system:administrators. This behavior
should not be changed on the stable release branch, but for the next
release the behavior will change to show these entries as being
self-created, to better reflect reality.
FIXES 132822
[kaduk@mit.edu: reword commit message, minor style adjustments]
Jeffrey Altman [Thu, 10 Mar 2016 04:34:55 +0000 (22:34 -0600)]
ptserver: fix pt_util creation of groups
In commit 53ac98931adf9f04c150d9bc084cae31f3913476 the adjustment of
owner id was moved from CreateEntry() into CreateGroupName(). This was
done for two reasons:
1. to reuse the computation of "is administrator" within
CreateGroupName() in order to permit the owner id to be set
to the invalid values 0 and ANONYMOUSID.
2. to allow the owner id to be altered in ChangeEntry().
Unfortunately, CreateEntry() needs to be able to alter the owner id
when creating users not only groups.
This change moves the computation of "is administrator" and the
owner id assignment to CreateEntry() and ChangeEntry().
Michael Meffie [Wed, 24 Feb 2016 21:57:11 +0000 (16:57 -0500)]
LINUX: ifconfig is deprecated
ifconfig is deprecated and is no longer installed by default on RHEL 7 and
Centos 7. Use the replacement ip command in the init script for linux.
Fallback to ifconfig in the event the ip command is not available.
Thanks to Ben Kaduk for pointing out the hash built-in command.
Change-Id: I7ffe272eb712cd83a70a7d880d239f72b40cb5df
Reviewed-on: http://gerrit.openafs.org/12192 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Note: This crash was exposed by other bugs (to be addressed in future
commits) in OpenAFS large volume support. However, there may
be other failure paths (unrelated to large volumes) that expose
this error as well.
When VAllocVnode() must allocate a new vnode but fails while
updating the vnode index file (e.g. an "addled bitmap" due to other
bugs in working with a vnode index larger than 2^31 bytes), it branches
to common recovery logic at label error_encountered:.
Part of this recovery is to call VFreeBitmapEntry_r(). Commit 08ffe3e81d875b58ae5fe4c5733845d5132913a0 added a VOL_FREE_BITMAP_WAIT
flag to VFreeBitmapEntry() in order to prevent races with VAllocBitmapEntry().
If the caller specifies VOL_FREE_BITMAP_WAIT, VFreeBitmapEntry_r will
call VCreateReservation_r() and VWaitExclusiveState_r(). However, the
exit from VFreeBitmapEntry_r() calls VCancelReservation_r() unconditionally.
This works correctly with the majority of callers to VFreeBitmapEntry_r,
which do specify the VOL_FREE_BITMAP_WAIT flag.
However, the VAllocVnode() error_encountered logic must specify 0 for
this flag because the thread is already in an exclusive state
(VOL_STATE_VNODE_ALLOC). This correctly causes VFreeBitmapEntry_r() to
forgo both the reservation and wait-for-exclusive-state. However, before
exit it erroneously calls VCancelReservation_r(). We now have unbalanced
reservations (nWaiters); this causes an assert when the VAllocVnode()
error_encountered recovery code later calls VCancelReservation_r()
for what it believes is its own prior reservation.
Modify VFreeBitmapEntry_r() to make its final VCancelReservation_r()
conditional on flag VOL_FREE_BITMAP_WAIT.
Change-Id: Id6cf6b1279b11e6dfc4704bba5739912f663beca
Reviewed-on: http://gerrit.openafs.org/11983 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Mark Vitale [Sat, 18 Jul 2015 05:12:51 +0000 (01:12 -0400)]
bozo: allow start of fs and dafs bnodes with options
fs_create() and dafs_create() issue stat() to verify
the existence of each executable specified in the bnode.
However, commit fda2bc874751ca479365dc6389c0eebb41a0bda1
inadvertently removed the code that stripped any command
arguments before the stat(). Therefore, any bnode that
specifies arguments (e.g. /usr/afs/bin/dafileserver -d 5),
causes the stat() to fail and the bnode will not start.
Rename function AppendExecutableExtension() to a less
"window-ish" name: PathToExecutable().
Modify the Windows version of PathToExecutable()
to properly strip arguments.
Reimplement the Unix macro as function PathToExecutable()
that properly strips arguments.
Change-Id: I04f7ce2afb8211bd12b9063db1335738bff1cc1e
Reviewed-on: http://gerrit.openafs.org/11934 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Mon, 8 Feb 2016 15:10:32 +0000 (10:10 -0500)]
test: skip buserror test when SIGBUS is not defined in perl POSIX module
Older versions of the perl POSIX module do not define the SIGBUS symbol, which
causes the opr/softsig-t perl test to fail to compile. Instead of trying to
defined SIGBUS, which may be platform dependent, skip the buserror unit test on
these older platforms.
Michael Meffie [Fri, 30 Jan 2015 17:20:10 +0000 (12:20 -0500)]
volser: detect eof in dump stream while reading acl
Detect an EOF condition while reading the ACL in a dump stream
and return a restore error, instead of filling the ACL with
0xFF and then failing the restore due to an invalid tag.
Benjamin Kaduk [Sun, 22 Nov 2015 20:23:49 +0000 (14:23 -0600)]
cellconfig: check for invalid dotted quads
IP addresses entered into the CellServDB with components larger
than 255 would silently be trucated down to 8-bit unsigned integer
representations. This could cause confusing behavior with
occasional hangs.
FIXES 131794
Change-Id: I44834cb4662e178fdb4be2eeb03ad58d2fa7c556
Reviewed-on: http://gerrit.openafs.org/12109 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Sun, 12 Apr 2015 01:51:09 +0000 (20:51 -0500)]
afs: Log abnormally large chunk files
Any chunk in our cache for a regular file should be smaller than or
equal to our configured chunksize. If someone sets a chunk to be
larger than that, it is very strange and may cause other confusing
issues. Specifically, afs_DoPartialWrite determines if our cache is
"too full" by counting the number of dirty chunks. If we have a dirty
chunk that is much larger than the chunksize, it can throw off the
afs_DoPartialWrite calculation.
This is only true for dcaches backing regular files, though. For
directories, we fetch the entire directory into a single chunk file,
and the size of a directory blob can easily exceed the chunksize
without issues. The aforementioned issue with afs_DoPartialWrite does
not apply, since directory chunks cannot be dirty (we only locally
modify the chunk if we modify the dir on the server, and the DVs
match).
Anyway, it should not be possible to get a chunk for a regular file
larger than the chunksize. Log a message if it does occur, to help
assist anyone in tracking down issues when this does occur.
[mmeffie@sinenomine.net remove unnecessary casts in afs_warn args.]
Change-Id: I5cf58e3659dc04255c62fa56b044d5bc1c7ce877
Reviewed-on: http://gerrit.openafs.org/11831 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Chas Williams [Sat, 25 Apr 2015 20:38:12 +0000 (16:38 -0400)]
opr: Disable some warnings during opr assertions
Detect _Pragma(), a C99 extension for inline #pragma's, and use it to
disable to certain warnings during the use of opr_Verify() and
opr_Assert().
Because some versions of clang support _Pragma, do not have support
for -Wtautological-pointer-compare, and do set -Werror and -Wunknown-pragmas,
we must explicitly check for pragma support for -Wtautological-pointer-compare
as well.
Change-Id: Id3d5ee347f320a366a0571572b58414aa7044bf7
Reviewed-on: http://gerrit.openafs.org/11852 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Andrew Deason [Fri, 10 Apr 2015 02:26:25 +0000 (21:26 -0500)]
afs: Log weird 'size' fetchdata errors
There are a couple of situations that should never happen when issuing
a fetchdata, but cause errors when they do:
- The fileserver responds with more than 2^32 bytes of data
- The fileserver responds with more data than requested (but still
smaller than 2^32)
While these should normally never be encountered, it can be very
confusing when they do, since they cause file fetches to fail. To give
the user or investigating developer some hope of figuring out what is
going on, at least log a warning in these situations, to at least
indicate this is the area in which something is breaking.
Only log these once, in case something causes these conditions to be
hit, e.g., every fetch. Once is at least enough to say this is
happening.
[mmeffie@sinenomine.net remove unneeded casts in afs_warn args and
explicit static initializers.]
Andrew Deason [Wed, 8 Apr 2015 03:10:53 +0000 (22:10 -0500)]
afs: Fix fetchInit for negative/large lengths
Currently, the 'length64' variable in rxfs_fetchInit is almost
completely unused (it just goes into an icl logging function). For the
length that we actually use ('*alength'), we just take the lower 32
bits of the length that the fileserver told us. This method is
incorrect in at least the following cases:
- If the fileserver returns a length that is larger than 2^32-1,
we'll just take the lower 32 bits of the 64-bit length the
fileserver told us about. The client currently never requests a
fetch larger than 2^32-1, so this would be an error, but if this
occurred, we would not detect it until much later in the fetch.
- If the fileserver returns a length that is larger than 2^31-1, but
smaller than 2^32, we'll interpret the length as negative (which we
assume is just 0, due to bugs in older fileservers). This is also
incorrect.
- If the fileserver returns a negative length smaller than -2^31+1,
we may interpret the give length as a positive value instead of a
negative one. Older fileservers can do this if we fetch data beyond
the file's EOF (this was fixed in the fileserver in commit 529d487d65d8561f5d0a43a4dc71f72b86efd975). This positive length
will cause an error (usually), instead of proceeding without error
(which is what would happen if we correctly interpreted the length
as negative).
On Solaris, this can manifest as a failed write, when writing to a
location far beyond the file's EOF from the fileserver's point of
view, because Solaris writes can trigger a fetch for the same area.
Seeking to a location far beyond the file's EOF and writing can
trigger this, as can a normal copy into AFS, if the file is large
enough and the cache is large enough. To explain in more detail:
When copying a file into AFS, the cache manager will buffer the dirty
data in the disk cache until the file is synced/closed, or we run out
of cache space. While this data is buffering, the application will
write into an offset, say, 3GiB into the file. On Solaris, this can
trigger a read for the same region, which will trigger a fetch from
the fileserver at the offset 3GiB into the file. If the fileserver
does not contain the fix in commit 529d487d65d8561f5d0a43a4dc71f72b86efd975, it will respond with a large
negative number, which we interpret as a large positive number; much
larger than the requested length. This will cause the fetch to fail,
which then causes the whole write() call to fail. Specifically this
will fail with EINVAL on Solaris, since that is the error code we
return from afs_GetOnePage when we fail to acquire a dcache. If the
cache is small enough, this will not happen, since we will flush data
to the fileserver before we have a large amount of dirty data,
e.g., 3GiB. (The actual error occurs closer to 2GiB, but this is just
for illustrative purposes.)
To fix this, detect the various ranges of values mentioned above, and
handle them specially. Lengths that are too large will yield an error,
since we cannot handle values over 2^31-1 in the rxfs_* framework
currently.
For lengths that are negative, just act as if we received a length of
0. Do this for both the 64-bit codepath and the non-64-bit codepath,
just so they remain identical.
[mmeffie@sinenomine.net: directly use 64 bit comparisons, don't mask
end call error code, commit nits.]
Change-Id: I7e8f2132d52747b7f0ce4a6a5ba81f6641a298a8
Reviewed-on: http://gerrit.openafs.org/11829 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Most of the time, this is fine. However, if 'position' is more than
2GiB greater than file_length, 'size' will calculated to be smaller
than -2GiB. Since 'size' in this code is a signed 32-bit integer, this
can cause 'size' to underflow, and result in a value closer to
(positive) 2GiB.
This has two potential effects:
The afs_AdjustSize call in afs_GetDCache will cause the underlying
cache file for this dcache to be very large (if our offset is around
2GiB larger than the file size). This can confuse other parts of the
client, since our cache usage reporting will be incorrect (and can be
even way larger than the max configured cache size).
This will also cause a read request to the fileserver that is larger
than necessary. Although 'size' will be capped at our chunksize, it
should be 0 in this situation, since we know there is no data to
fetch. At worst, this currently can just result in worse performance
in rare situations, but it can also just be very confusing.
Note that an afs_GetDCache request beyond EOF can currently happen in
non-race conditions on at least Solaris when performing a file write.
For example, with a chunksize of 256KiB, something like this will
trigger the overflow in 'size' in most cases:
Michael Meffie [Thu, 21 Jan 2016 22:55:37 +0000 (17:55 -0500)]
doc: afsd -settime and -nosettime are obsolete
Update the afsd man page -settime and -nosettime options, which are obsolete
and no longer have any effect. Use the same wording as the other obsolete
options in the afsd man page. Keep the recommendations to use the time keeping
daemons provided by the operating system to maintain the system time.
Change-Id: I08a1bd5ae0b2d6618b3e212ebcbb98f470e33820
Reviewed-on: http://gerrit.openafs.org/12175 Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Mon, 23 Nov 2015 00:22:58 +0000 (18:22 -0600)]
Fix optimized IRIX kernel module builds
Commit 9f94892f8d996a522e7801ef6088a13769bee7c2 (from 2006)
introduced per-file CFLAGS, using $(CFLAGS-$@); this construct
is not parsed well by IRIX make, which ends up attempting to
expand '$@)' and finding mismatched parentheses.
Commit 5987e2923a2670a27a801461dc9668ec88ed7d2a (from 2007) followed,
fixing the IRIX build but only for the NOOPT case. This left the
problematic expression in CFLAGS_OPT until 2013, when another RT
ticket was filed reporting the continued breakage. That ticket
was then ignored until 2015 (now) with no particular cries of
outrage on the mailing lists. Perhaps this gives some indication
of the size and/or mindset of the IRIX userbase. (There have
been successful IRIX installations during this time period, so
presumably it was discovered that disabling optimizations helped
the build along.)
FIXES 131621
Change-Id: Id5298103221b016239723aa08ebe0dc54bdadc5e
Reviewed-on: http://gerrit.openafs.org/12111 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Chas Williams [Thu, 24 Dec 2015 22:58:32 +0000 (17:58 -0500)]
LINUX: don't cache negative entries for dynroot
The dynroot volume lacks any callbacks that would invalidate the directory
or change the data version. Further, the data version for the dynroot
is only updated for when a new cell is found or added (a positive lookup).
Change-Id: If0b022933de7335d3d94aafc77c50b85b99f4116
Reviewed-on: http://gerrit.openafs.org/12140 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Marcio Barbosa [Tue, 29 Dec 2015 13:31:43 +0000 (10:31 -0300)]
afs: do not allow two shutdown sequences in parallel
Often, ‘afsd -shutdown’ is called right after ‘umount’.
Both commands hold the glock before calling ‘afs_shutdown’.
However, one of the functions called by 'afs_shutdown', namely,
‘afs_FlushVCBs’, might drop the glock when the global
'afs_shuttingdown' is still equal to 0. As a result, a scenario
with two shutdown sequences proceeding in parallel is possible.
To fix the problem, the global ‘afs_shuttingdown’ is used as an
enumerated type to make sure that the second thread will not run
‘afs_shutdown’ while the first one is stuck inside ‘afs_FlushVCBs’.
Change-Id: Iffa89d82278b0df5fb90fc35608af66d8e8db29e
Reviewed-on: http://gerrit.openafs.org/12016 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Brian Torbich [Thu, 21 Jan 2016 15:08:27 +0000 (10:08 -0500)]
redhat: Correct permissions on systemd unit files
Change the systemd unit file permissions created via
openafs.spec to be 0644 instead of 0755. Having the
systemd unit files be executable will trigger a systemd
warning.
FIXES 132662
Change-Id: I9f5111c855941528193aaabeb42bf1b732246a7e
Reviewed-on: http://gerrit.openafs.org/12174 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Stephan Wiesand [Mon, 22 Jun 2015 08:44:11 +0000 (10:44 +0200)]
redhat: Avoid bogus dependencies when building the srpm
By default the spec defines that both userland and kernel module
packages should be built. This results in a dependency of the form
"kernel-devel-`uname -m` = `uname -r`" being added to the source
package created by makesrpm.pl, which is bogus because the uname
values are from the system on which the srpm is built and needn't
apply to the system where it is used. While rpm and rpmbuild ignore
such dependencies of source packages, other tools don't and may fail.
Some versions of rpmbuild will also enforce those requirements when
building the srpm itself, which is pointless too.
Avoid both problems by pretending not to attempt building modules
and ignoring any dependencies when makesrpm.pl invokes rpmbuild -bs.
Change-Id: I0134e1936638c7d9c3fd9ff0ccf1cba36710d0d3
Reviewed-on: http://gerrit.openafs.org/11903 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Wed, 4 Feb 2015 15:11:29 +0000 (10:11 -0500)]
Update extra-iput configure argument description
Commit 15260c7fdc5ac8fe9fb1797c8e383c665e9e0ccd did not function
as advertised, since the conditional which attempted to make
the configure option --(en|dis)able-linux-d_splice_alias-extra-iput
mandatory on linux checked a variable for the system type which
was not set at the time the check ran.
Subsequent discussion of this behavior produced a consensus that
there is not a need to make the configure option mandatory,
due to the narrow range of kernels affected by the bug in question,
so this follow-up commit just fixes the documentation and removes
the ineffective code.
Change-Id: I36d1f8801d355f33c3132fcab166ea76faab8e87
Reviewed-on: http://gerrit.openafs.org/11710 Reviewed-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Simon Wilkinson [Mon, 4 Mar 2013 16:15:37 +0000 (16:15 +0000)]
compile_et: Don't overflow input file buffer
Don't overlow the buffer that's used for the input filename by
copying in to much with sprintf. Use asprintf to dynamically
allocate a buffer instead.
Link roken for rk_asprintf where needed.
Build compile_et with libtool, to ensure that it is linked statically,
as is needed for build tools such as compile_et. (This requires
a preceding change to set a buildtool_roken make variable.)
Benjamin Kaduk [Wed, 25 Feb 2015 23:46:28 +0000 (18:46 -0500)]
Provide a buildtool_roken make variable
When using roken in build tools, i.e., binaries which must be
executed during the build stage, the roken library must be usable
prior to the 'install' stage. In particular, if the internal
rokenafs is used, the shared library will not be installed and
the runtime linker will not be able to find it, causing execution
of the build tool to fail. To avoid this failure, librokenafs
must be linked statically into these build tools.
Unfortunately, the way we currently use libtool is not very
well aligned to libtools model of how it should be used. As a result,
it does not seem feasible to cause libtool to link librokenafs
statically without breaking other parts of the build.
Libtool peeks at the compiler command-line arguments to affect its
behavior when invoked as a linker. The flags -static, -all-static,
and -static-libtool-libs can affect whether dynamic or static linkage
is used for various libraries being linked into the executable.
Passing -all-static tells libtool to not do any dynamic linking at
all, but is silently a no-op if static linking is not possible (the
default situation on most modern Linuxen, OS X, and Solaris).
Passing -static causes libtool to not do any dynamic linking of
libtool libraries which have not been installed, and passing
-static-libtool-libs causes libtool to not do any dynamic linking
of libtool libraries at all.
In order to get libtool to actually link statically in all cases,
we should pass -all-static, not just -static. However, because
too many platforms disallow static linking by default, this is
not a viable option.
If we retain the libtool archive librokenafs.la in the linker search
path, libtool then records the library dependency of libafshcrypto on
librokenafs in its metadata and refuses to install libafshcrypto.la to
any path other than the configured prefix. This restriction of
libtool is incompatible with our use in 'make dest', and it is not
feasible to desupport 'make dest' before the 1.8 release.
The most appropriate workaround seems to be to just pass the
path to librokenafs.a on the linker command line when linking
build tools. As such, provide a new make variable buildtool_roken
which is appropriate for linking roken into build tools -- this
variable will be set to the path to librokenafs.a when the internal
roken is used, and the normal -lrokenafs when an external roken
is used.
Change-Id: I079fc6de5d0aa6403eb1071f3d58a248b1777853
Reviewed-on: http://gerrit.openafs.org/11763 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Anders Kaseorg [Fri, 31 Jul 2015 05:42:55 +0000 (01:42 -0400)]
rxkad: Resolve warnings in ticket5.c
Resolves these warnings:
ticket5.c: In function ‘tkt_MakeTicket5’:
ticket5.c:574:33: warning: pointer targets in passing argument 1 of ‘_rxkad_v5_encode_EncTicketPart’ differ in signedness [-Wpointer-sign]
code = encode_EncTicketPart(encodebuf, allocsiz, &data, &encodelen);
^
In file included from ticket5.c:80:0:
v5gen-rewrite.h:43:30: note: expected ‘unsigned char *’ but argument is of type ‘char *’
#define encode_EncTicketPart _rxkad_v5_encode_EncTicketPart
^
v5gen.c:1889:1: note: in expansion of macro ‘encode_EncTicketPart’
encode_EncTicketPart(unsigned char *p, size_t len, const EncTicketPart * data, size_t * size)
^
ticket5.c:602:33: warning: pointer targets in passing argument 1 of ‘_rxkad_v5_encode_EncryptedData’ differ in signedness [-Wpointer-sign]
code = encode_EncryptedData(ticket + *ticketLen - 1, *ticketLen, &encdata, &tl);
^
In file included from ticket5.c:80:0:
v5gen-rewrite.h:16:30: note: expected ‘unsigned char *’ but argument is of type ‘char *’
#define encode_EncryptedData _rxkad_v5_encode_EncryptedData
^
v5gen.c:690:1: note: in expansion of macro ‘encode_EncryptedData’
encode_EncryptedData(unsigned char *p, size_t len, const EncryptedData * data, size_t * size)
^
ticket5.c: In function ‘tkt_DecodeTicket5’:
ticket5.c:320:10: warning: ‘plainsiz’ may be used uninitialized in this function [-Wmaybe-uninitialized]
code = decode_EncTicketPart((unsigned char *)plain, plainsiz, &decr_part, &siz);
^
Marcio Barbosa [Thu, 24 Dec 2015 20:23:23 +0000 (17:23 -0300)]
viced: do not overwrite possible failure
The function ‘hpr_Initialize’ overwrites the code
returned by ‘ubik_ClientInit’. As a result, ‘hpr_Initialize’
will not report any failure triggered by ‘ubik_ClientInit’.
To fix this problem, store the code returned by ‘rxs_Release’
in a new variable. Only return this code if the function
‘ubik_ClientInit’ worked properly. Otherwise, return the code
provided by ‘ubik_ClientInit’.
Mark Vitale [Fri, 7 Aug 2015 15:56:16 +0000 (11:56 -0400)]
afs: pioctl kernel memory overrun
CVE-2015-8312:
Any pioctl with an input buffer size (ViceIoctl->in_size)
exactly equal to AFS_LRALLOCSIZE (4096 bytes) will cause
a one-byte overwrite of its kernel memory working buffer.
This may crash the operating system or cause other
undefined behavior.
The attacking pioctl must be a valid AFS pioctl code.
However, it need not specify valid arguments (in the ViceIoctl),
since only rudimentary checking is done in afs_HandlePioctl.
Most argument validation occurs later in the individual
pioctl handlers.
Nor does the issuer need to be authenticated or authorized
in any way, since authorization checks also occur much later,
in the individual pioctl handlers. An unauthorized user
may therefore trigger the overrun by either crafting his
own malicious pioctl, or by issuing a privileged
command, e.g. 'fs newalias', with appropriately sized but
otherwise arbitrary arguments. In the latter case, the
attacker will see the expected error message:
"fs: You do not have the required rights to do this operation"
but in either case the damage has been done.
Pioctls are not logged or audited in any way (except those
that cause loggable or auditable events as side effects).
root cause:
afs_HandlePioctli() calls afs_pd_alloc() to allocate two
two afs_pdata structs, one for input and one for output.
The memory for these buffers is based on the requested
size, plus at least one extra byte for the null terminator
to be set later:
requested size allocated
================= =================================
> AFS_LRALLOCSIZ osi_Alloc(size+1)
<= AFS_LRALLOCSIZ afs_AllocLargeSize(AFS_LRALLOCSIZ)
afs_HandlePioctl then adds a null terminator to each buffer,
one byte past the requested size. This is safe in all cases
except one: if the requested in_size was _exactly_
AFS_LRALLOCSIZ (4096 bytes), this null is one byte beyond
the allocated storage, zeroing a byte of kernel memory.
Commit 6260cbecd0795c4795341bdcf98671de6b9a43fb introduced
the null terminators and they were correct at that time.
But the commit message warns:
"note that this works because PIGGYSIZE is always less than
AFS_LRALLOCSIZ"
Commit f8ed1111d76bbf36a466036ff74b44e1425be8bd introduced
the bug by increasing the maximum size of the buffers but
failing to account correctly for the null terminator in
the case of input buffer size == AFS_LRALLOCSIZ.
Commit 592a99d6e693bc640e2bdfc2e7e5243fcedc8f93 (master
version of one of the fixes in the recent 1.6.13 security
release) is the fix that drew my attention to this new
bug. Ironically, 592a99 (combined with this commit), will
make it possible to eliminate the "offending" null termination
line altogether since it will now be performed automatically by
afs_pd_alloc().
[kaduk@mit.edu: adjust commit message for CVE number assignment,
reduce unneeded churn in the diff.]
Michael Meffie [Fri, 30 Jan 2015 17:12:03 +0000 (12:12 -0500)]
volser: range check acl header fields during dumps and restores
Perform range checks on the acl header fields when reading an
acl from a dump stream and when writing an acl to a dump
stream.
Before this change, a bogus value in the total, positive, or
negative acl fields from a dump stream could cause an out of
bounds access of the acl entries table, crashing the volume
server.
Change-Id: Ic7d7f615a37491835af8d92f3c5f1b6a667d9d01
Reviewed-on: http://gerrit.openafs.org/11702 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Sun, 22 Nov 2015 19:24:43 +0000 (13:24 -0600)]
volser: set error, not code, before rfail
The rfail cleanup handler overwrites 'code' ~unconditionally, but
does use an existing 'error' value if present. Since the intent
is to return failure to the caller, preserve the code in the error
variable and do so.
Michael Meffie [Mon, 30 Mar 2015 17:20:42 +0000 (13:20 -0400)]
dafs: remove the salvageserver -showlog option
Remove the salvagerserver option to print log messages to stdout. This
was a carry over from the stand-alone salvager and is not appropriate for
a daemon.
configure now checks for the standard getmaxyx() macro; failing that,
it looks for the older but pre-standardization getmaxx() and getmaxy(),
then falls back to the 4.2BSD curses _maxx and _maxy fields; if all
else fails, gtx building is disabled.
gtx now defines getmaxyx() itself if necessary, based on the above.
This also fixes a bug in gtx with all ncurses versions > 1.8.0 on
platforms other than NetBSD and OS X: gtx was using the _maxx and
_maxy fields, which starting with ncurses 1.8.1 were off by 1 from
the expected values. As such, behavior of scout and/or afsmonitor
may change on most ncurses-using platforms.
Change-Id: I49778e87adacef2598f0965e09538dfc3d840dcc
Reviewed-on: http://gerrit.openafs.org/12107 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Chas Williams [Wed, 2 Dec 2015 15:38:42 +0000 (10:38 -0500)]
Open syscall emulation file O_RDONLY
As reported on the -info mailing list, docker is now exporting the
/proc filesystem as read only. ioctl() doesn't need write permissions
to do its work, so change O_RDWR to O_RDONLY.
Michael Meffie [Mon, 30 Mar 2015 17:17:25 +0000 (13:17 -0400)]
dafs: remove the salvageserver -datelogs option
Remove the undocumented -datelogs option from the salavageserver, which
was a carry over from the standalone salvager program, but is not
appropriate for a daemon.
Change-Id: Ia382d6550e0641edcba55a414e00323755487e18
Reviewed-on: http://gerrit.openafs.org/11814 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Benjamin Kaduk [Sun, 22 Nov 2015 22:34:16 +0000 (16:34 -0600)]
Fix ptserver -default_access parsing
Commit 0b9986c8758c13a1de66b8bdae51b11abaea6cf3 converted ptserver
to use libcmd for parsing, but erroneously listed the
-default_access argument as CMD_SINGLE instead of CMD_LIST, since
two arguments are needed. This made it impossible to use
-default_access at all, since libcmd would reject an extra argument
and the later argument processing would notice that the second
argument was missing.
Christof Hanke [Wed, 18 Nov 2015 13:02:50 +0000 (14:02 +0100)]
tabular_output: allocate footer-line when set for the first time
If the footer line is not allocated, programs segfault at runtime.
The printFooter functions should check if the footer
is allocated before printing them.
Commit a14e791541bf19c6c377e68bc2f978fba34f94b1
refactored and corrected the counting of requests and aborts.
However, it inadvertently introduced a new undercount for
VL_GetEntryByName* requests, counting them only if
NameIsId(volname), e.g. volname="536870911".
Ensure that the normal case of a non-"numeric" volname is
also counted.
Stephan Wiesand [Tue, 17 Nov 2015 14:03:03 +0000 (15:03 +0100)]
writeconfig: emit error messages again in VerifyEntries
Before commit e4a8a7a38dbf29e89bc1a7b6b017447a6aa0c764 an error message
was printed if looking up a server hostname failed. Restore this, and
also print a message in the now detected case that the lookup returns
loopback addresses only.
Change-Id: Idf7c3133ab5c83e081335ba1dc8fcbddb7da7329
Reviewed-on: http://gerrit.openafs.org/12097 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Wed, 21 Jan 2015 19:42:47 +0000 (14:42 -0500)]
bozo: create a syslog connection only if the -syslog option is given
Fix a minor bug in which an unnecessary syslog connection is opened when
the BosLog is not present (typically, the first time the bosserver is
started) or when the BosLog is a named pipe, even if the -syslog option
was not given.
Michael Meffie [Wed, 18 Feb 2015 02:54:46 +0000 (21:54 -0500)]
prdb_check: fix out of bounds array access in continuation entries
A continuation entry (struct contentry) contains 39 id elements, however
a regular entry (struct prentry) contains only 10 id elements.
Attempting to access more than 10 elements of a regular entry is
undefined behavior.
Use a stuct contentry when when processing continuation entries in
prdb_check. This is done to safely traverse the id arrays of the
continuation entries. Use the new pr_PrintContEntry to print
continuation entries.
The undefined behavior manfests as a segmentation violation in
WalkNextChain() when built with GCC 4.8 with optimization enabled.
Michael Meffie [Wed, 18 Feb 2015 01:58:27 +0000 (20:58 -0500)]
prdb_check: check for continuation entries in owner chains
Continuation entries may not be in owner chains. Fix the
comments in WalkOwnerChain (which were probably copied from
WalkNextChain) and add a check and error message for
continuation entries found on owner chains.
Michael Meffie [Wed, 18 Feb 2015 02:11:50 +0000 (21:11 -0500)]
libprot: add pr_PrintContEntry function
A continuation entry (struct contentry) contains 39 id elements, however
a regular entry (struct prentry) contains only 10 id elements. Attempting
to access more than 10 elements of a regular entry is undefined
behavior.
Add a new function to safely print continuation entries and change
pr_PrintEntry to avoid accessing the entries array out of bounds.
The pr_PrintEntry function is at this time only used by the prdb_check
and ptclient debugging utilities.
Michael Meffie [Wed, 21 May 2014 21:27:47 +0000 (17:27 -0400)]
doc: document the version subcommand
Document the built-in version sub-command which displays
the OpenAFS version string. This sub-command is provided
by the cmd library.
Document the switch style -version option provided by the cmd
library for the initcmd based commands: afsmonitor, scout,
xstat_fs_test, and xstat_cm_test.
Michael Meffie [Tue, 13 Oct 2015 02:16:54 +0000 (22:16 -0400)]
afs: fix for return an error from afs_readdir when out of buffers
Commit 9b0d5f274fe79ccc5dd0e4bba86b3f52b27d3586 added a return code to
BlobScan to allow afs_readdir to return an error when afs_newslot failed
to allocate a buffer. Unfortunately, that change introduced a false
EIO error.
Originally, BlobScan would return a blob number of 0 to indicate the end
of the file has been reached while traversing the directory blobs.
Restore that behavior by changing the cache manager's DRead function to
return ENOENT instead of the generic EIO error to indicate the page to
be read is out of bounds, and change BlobScan to return a blob of zero
to indicate to callers the last blob has been reached. All callers
already check for a blob number of zero, which is out of range.
Change-Id: I5baae8e5377dd49dcca6765b7a4ddc89cca70738
Reviewed-on: http://gerrit.openafs.org/12058 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Michael Meffie [Fri, 6 Nov 2015 16:56:31 +0000 (11:56 -0500)]
vos: reinstate the -localauth option for vos setaddrs
Commit d1d411576cf39c4bc55918df0eb64327718d566c added the vos remaddrs
subcommand, but unfortunately stole the common parameters from
setaddrs. Fix this bug and remove the extra blank line between
the subcommand syntax and the common params macro.
Tim Creech [Mon, 2 Nov 2015 13:12:32 +0000 (08:12 -0500)]
Make libuafs safe for parallel make
In src/libuafs, "make" with a large number of jobs (e.g., "make -j16")
can fail because some of the LT_objs depend on make_h_tree having been
called already.
Make "h" (the libuafs header subdirectory) a dependency of all of
LT_objs.
Jeffrey Altman [Fri, 9 Oct 2015 02:22:12 +0000 (22:22 -0400)]
rx: OPENAFS-SA-2015-007 "Tattletale"
CVE-2015-7762:
The CMU/Transarc/IBM definition of rx_AckDataSize(nAcks) was mistakenly
computed from sizeof(struct rx_ackPacket) and inadvertently added three
octets to the computed ack data size due to C language alignment rules.
When constructing ack packets these three octets are not assigned a
value before writing them to the network.
Beginning with AFS 3.3, IBM extended the ACK packet with the "maxMTU" ack
trailer value which was appended to the packet according to the
rx_AckDataSize() computation. As a result the three unassigned octets
were unintentionally cemented into the ACK packet format.
In OpenAFS commit 4916d4b4221213bb6950e76dbe464a09d7a51cc3 Nickolai
Zeldovich <kolya@mit.edu> noticed that the size produced by the
rx_AckDataSize(nAcks) macro was dependent upon the compiler and processor
architecture. The rx_AckDataSize() macro was altered to explicitly
expose the three octets that are included in the computation.
Unfortunately, the failure to initialize the three octets went unnoticed.
The Rx implementation maintains a pool of packet buffers that are reused
during the lifetime of the process. When an ACK packet is constructed
three octets from a previously received or transmitted packets will be
leaked onto the network. These octets can include data from a
received packet that was encrypted on the wire and then decrypted.
If the received encrypted packet is a duplicate or if it is outside the
valid window, the decrypted packet will be used immediately to construct
an ACK packet.
CVE-2015-7763:
In OpenAFS commit c7f9307c35c0c89f7ec8ada315c81ebc47517f86 the ACK packet
was further extended in an attempt to detect the path MTU between two
peers. When the ACK reason is RX_ACK_PING a variable number of octets is
appended to the ACK following the ACK trailers.
The implementation failed to initialize all of the padding region.
A variable amount of data from previous packets can be leaked onto the
network. The padding region can include data from a received packet
that was encrypted on the wire and then decrypted.
OpenAFS 1.5.75 through 1.5.78 and all 1.6.x releases (including release
candidates) are vulnerable.
Credits:
Thanks to John Stumpo for identifying both vulnerabilities.
Thanks to Simon Wilkinson for patch development.
Thanks to Ben Kaduk for managing the security release cycle.
Jeffrey Altman [Mon, 12 Oct 2015 13:56:07 +0000 (09:56 -0400)]
Windows: CM_ERROR_INEXACT_MATCH is not a fatal error
cm_BPlusDirLookup() and cm_Lookup() can return CM_ERROR_INEXACT_MATCH
which is not a fatal error. Instead it is an indication that the returned
cm_scache object was not a case sensitive match. Do not fail the request
and do not leak the cm_scache reference.
Windows: cm_Lookup return ambiguous filename to caller
cm_Lookup() must not mask a CM_ERROR_AMBIGUOUS_FILENAME error by
converting it to CM_ERROR_BPLUS_NOMATCH. Doing so results in the
redirector believing that the object does not exist instead of
there being a STATUS_OBJECT_NAME_COLLISION.
Windows: fix RDR detection of ambiguous directory entries
The redirector is supposed to reject access to file objects if there
is no case exact match and multiple entries match in a case insensitive
comparison. The check was only present in the AFSLocateNameEntry()
function and not elsewhere.
Fix the AFSLocateNameEntry() call and addd the missing checks.
Jeffrey Altman [Mon, 19 Oct 2015 00:32:06 +0000 (20:32 -0400)]
Windows: rdr pioctl operations are opaque
Although pioctl operations are delivered through the redirector the
contents of the operations are opaque to the redirector. Therefore,
the cm_req must not be initialized as a redirector operation. If they
are the necessary invalidation notifications for symlink and mount point
operations will not be delivered.
Jeffrey Altman [Fri, 9 Oct 2015 14:20:41 +0000 (10:20 -0400)]
Windows: if no known IP addrs, query the addr list
If cm_noIPAddrs == 0, then no servers will be probed. If
syscfg_GetIFInfo() fails then cm_noIPAddrs is set to 0. Therefore,
also set cm_LanAdapterChangeDetected to non-zero if syscfg_GetIFInfo()
fails so that the interface info can be queried again prior to a server
probe attempt.
In cm_CheckServersMulti() if cm_ConnByServer() fails or if cm_noIPAddr is
zero then a cm_server.pingCount will be leaked. This can result in
servers being marked down and never restored to an up state.
This change adds the necessary pingCount decrement and moves the
assignment of the cm_server_t pointer to serversp[] to make it clear
that the cm_server_t will not be in the array if a failure occurs.
Only objects in the array will have the pingCount decremented after
the RPCs are issued.
Windows: Replace CM_SERVERFLAG_PINGING with pingCount
Instead of replying upon a server flag use a pingCount interlocked
variable to track whether active ping operations are being performed
and whether or not to wait sleeping threads.
The cm_GetCell_Gen() function permits cells to be searched for by
prefix. The idea is to permit "cs.cmu.edu" to be abbreviated "cs"
when at CMU. There are two problems with the current behavior:
1. the existing match rules will accept "cs.c" and "cs.cmu.ed" as
valid prefix matches. By not restricting the prefix matching
to full components the Freelance symlink list can become
cluttered.
2. the existing match rules will accept the first cell that
matches even if there are more than one cells that would match.
this can result in unpredictable behavior since the ordering
of the cells is not guaranteed.
Instead, fail requests for cell prefixes that are not full component
matches or that would be ambiguous.
Jeffrey Altman [Mon, 4 May 2015 17:25:04 +0000 (13:25 -0400)]
Windows: Network Provider registration at service start
Windows 8, 8.1 and pre-releases of 10 have a horrible bug as part
of the upgrade process. All non-Microsoft network provider services
are removed from the NetworkProvider "Order" registry value. For
OpenAFS this has the side effect of breaking integrated logon and
all drive letter mappings to \\AFS.
During service start add code to:
1. Add "AFSRedirector" before "LanmanWorkstation" if not present
2. Add "TransarcAFSDaemon" to the end of the list if not present
Jeffrey Altman [Sun, 28 Jun 2015 19:06:34 +0000 (15:06 -0400)]
Windows: cm_Analyze mark server down for misc rx errors
In cm_Analyze() replace the token error retry logic for miscellaneous
rx errors and simply mark the server down. The most common error
that will be seen in this category is RX_INVALID_OPERATION which would
be received if the Rx service id or security class is not recognized
by the peer. This could happen if an AuriStor server is replaced by
an AFS3 server or if a packet is reflected.
A side effect of this change is that V* and CM_ERROR_* errors will
once again be retried. This will permit proper failover to occur.
Jeffrey Altman [Sun, 28 Jun 2015 18:56:47 +0000 (14:56 -0400)]
Windows: avoid vldb lookup race with network stack
If a VLDB query attempt occurs when there is no current cell db server
list then the VLDB query won't actually occur but the last query time
would be set. This prevents a query from taking place again on the volume
for 60 seconds. If the volume in question is the root.cell volume then
the redirector will be forced to return device not ready for the share
(aka \\afs\cell).
Check for a failure of cm_UpdateCell() and only set the last update time
for the volume if there was success or if the VLDB responded with volume
unknown.