Stephan Wiesand [Tue, 4 Aug 2015 11:28:35 +0000 (13:28 +0200)]
vlserver: Use the right variable for error code in SVL_GetStats
Commit 6c9fe7f80e4b5d9fb21609ee6743470d39dfb8f5 missed one instance
of "code" (as used on the master branch) that should have been changed
to "errorcode" (as used on the 1.6 branch) as part of the cherry-pick.
Fix this so that the right varlue is returned.
This is a 1.6-only change.
Change-Id: I97d9ac5961836843b617bab007d0c4d8bed82fef
Reviewed-on: http://gerrit.openafs.org/11970 Reviewed-by: Daria Phoebe Brashear <shadow@your-file-system.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Wed, 10 Dec 2014 19:07:14 +0000 (14:07 -0500)]
Handle backupDate of zero
In older versions of OpenAFS (prior to 2001), the backupDate was
never set. Try to provide somewhat more reasonable behavior in
this case, by using a different date in that case.
This patch modifies a patch committed as 1e6fb1b7b7, the dumpTimes.to is now
set to creationDate for R/O volumes. The old value copyDate is wrong, if the
R/O volumes is re-cloned. This does not happen with "vos dump -clone", but
may happen with dumping a R/O volume directly: "vos dump <R/O volume>".
Volume dumps can be created from backup volumes, cloned volumes, or
directly from RW volumes. The beginning and end of the time range
covered by the dump is recorded in the DumpHeader. The end time is
based on the type of the volume. Use backupDate for backup volumes,
use copyDate for cloned volumes, and updateDate for RW volumes.
Reviewed-on: http://gerrit.openafs.org/11389 Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 1e6fb1b7b7ed32e2035452db9fc221f38a8b4956)
Jeffrey Altman [Fri, 9 Oct 2015 02:22:12 +0000 (22:22 -0400)]
rx: OPENAFS-SA-2015-007 "Tattletale"
CVE-2015-7762:
The CMU/Transarc/IBM definition of rx_AckDataSize(nAcks) was mistakenly
computed from sizeof(struct rx_ackPacket) and inadvertently added three
octets to the computed ack data size due to C language alignment rules.
When constructing ack packets these three octets are not assigned a
value before writing them to the network.
Beginning with AFS 3.3, IBM extended the ACK packet with the "maxMTU" ack
trailer value which was appended to the packet according to the
rx_AckDataSize() computation. As a result the three unassigned octets
were unintentionally cemented into the ACK packet format.
In OpenAFS commit 4916d4b4221213bb6950e76dbe464a09d7a51cc3 Nickolai
Zeldovich <kolya@mit.edu> noticed that the size produced by the
rx_AckDataSize(nAcks) macro was dependent upon the compiler and processor
architecture. The rx_AckDataSize() macro was altered to explicitly
expose the three octets that are included in the computation.
Unfortunately, the failure to initialize the three octets went unnoticed.
The Rx implementation maintains a pool of packet buffers that are reused
during the lifetime of the process. When an ACK packet is constructed
three octets from a previously received or transmitted packets will be
leaked onto the network. These octets can include data from a
received packet that was encrypted on the wire and then decrypted.
If the received encrypted packet is a duplicate or if it is outside the
valid window, the decrypted packet will be used immediately to construct
an ACK packet.
CVE-2015-7763:
In OpenAFS commit c7f9307c35c0c89f7ec8ada315c81ebc47517f86 the ACK packet
was further extended in an attempt to detect the path MTU between two
peers. When the ACK reason is RX_ACK_PING a variable number of octets is
appended to the ACK following the ACK trailers.
The implementation failed to initialize all of the padding region.
A variable amount of data from previous packets can be leaked onto the
network. The padding region can include data from a received packet
that was encrypted on the wire and then decrypted.
OpenAFS 1.5.75 through 1.5.78 and all 1.6.x releases (including release
candidates) are vulnerable.
Credits:
Thanks to John Stumpo for identifying both vulnerabilities.
Thanks to Simon Wilkinson for patch development.
Thanks to Ben Kaduk for managing the security release cycle.
Benjamin Kaduk [Mon, 8 Sep 2014 17:47:33 +0000 (13:47 -0400)]
Tweak AFSDIR_PATH_MAX definition
On recent Debian, we run into runtime errors in the test suite
because _POSIX_PATH_MAX is only 256, and that buffer is too small
for a call to realpath(). Use PATH_MAX if it's available and larger
than _POSIX_PATH_MAX, in a way that should be safe even when PATH_MAX
is not defined.
Reviewed-on: http://gerrit.openafs.org/11453 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com> Tested-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit ec2382e060753dfdcaf84b9ac03e1534c65fcdbc)
Andrew Deason [Mon, 27 Oct 2014 21:39:34 +0000 (16:39 -0500)]
rx: Reset lastSendData when resetting call
Currently we use call->lastSendData to attempt to detect a stalled
call, if it's been too long since the last time the call sent any
data. However, we never initialize lastSendData to anything when
creating a new call.
This means that when rx_NewCall (or rxi_NewCall) returns, lastSendData
can be nonzero. This can happen if we reuse a DALLY call, or if we
pull a call off of rx_freeCallQueue. This can be a time very far in
the past, since the lastSendData time has not changed since the last
time the call was used; it will remain unchanged until a user of the
new call writes something to the call stream.
This can be a problem between the time when a caller creates a new
call with rx_NewCall and when the caller actually writes something to
the stream. Between those two times, if lastSendData happens to be set
to a time in the past, we may call rxi_CheckCall on that call, and
abort the call for being idle. The call will thus be aborted before it
even sent any data on the wire.
This is of particular concern for multi_Rx calls, since those can
create a large number of call structures, possibly introducing a delay
between calling rx_NewCall and writing anything to the stream (if one
of the later rx_NewCall invocations blocks waiting for an open call
channel, for instance, all of the previous allocated calls will stick
around unused for potentially a long time).
One such multi_Rx call is done by the cache manager, where it
periodically uses multi_Rx to call RXAFS_GetCapabilities to probe
fileservers for reachability. If this issue occurs during that
operation you can see a large number of servers get marked down for
code -9 (RX_CALL_IDLE), and then get marked as coming back up.
To fix this, set lastSendData to 0 when resetting a call, along with
most of the other fields in a call, to indicate that the call has
never sent any data. As long as lastSendData is 0, the call will never
get aborted with RX_CALL_IDLE, and this situation will be avoided.
This ensures that this issue cannot happen, since rxi_ResetCall is
guaranteed to be called at some point whenever we reuse a call
structure for any reason.
Reviewed-on: http://gerrit.openafs.org/11557 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 8c78a44cf5197ceee6907e947074973138c442f0)
Russ Allbery [Sat, 29 Jun 2013 21:27:55 +0000 (14:27 -0700)]
Fix restorevol crash on corrupt nDumpTimes value
If the number of dump times claimed in the volume header was greater
than MAXDUMPTIMES, restorevol would happily write over random stack
memory and crash. Sanity-check the loaded value and cap it to
MAXDUMPTIMES with a warning.
Bug found by Mayhem and reported by Alexandre Rebert.
In various places where we intentionally ignore the return values of system
calls and standard library routines, this changes the way in which we do so,
to avoid compiler warnings when building on Ubuntu 12.10, with gcc 4.7.2 and
eglibc 2.15-0ubuntu20.1.
Reviewed-on: http://gerrit.openafs.org/9980 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 73cad3be0a3489237ab7e66d3b12c52ffb0b67d0)
Michael Meffie [Tue, 18 Feb 2014 20:23:54 +0000 (15:23 -0500)]
vos: cross-device link error message
Print a better diagnostic message for cross-device link errors, which
happens when a clone volume is not in the same partition as the
parent read-write volume.
Reviewed-on: http://gerrit.openafs.org/10850 Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit da1597d74a0f56e35a156ec27df231f965934910)
Change-Id: I30cb0e87612732bfbce2c001831324d1a9e54409
Reviewed-on: http://gerrit.openafs.org/11587 Reviewed-by: Daria Phoebe Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Tue, 18 Feb 2014 18:59:59 +0000 (13:59 -0500)]
volser: log message for cross-device link errors
Add a log entry to the volume server to help diagnose those pesky
'Invalid cross-link device' errors returned by vos, which occur when
a clone volume is located in a different partition than the parent
read-write volume, or when a read-only volume is on the incorrect
partition on the server.
With this change, a new log entry is added when the volume server
fails to create a clone or a read-write volume because a volume with
the target volume id already exists on a different partition. For a
clone volume, this would be a different partition than the
read-write volume. For a read-only volume, this would be a different
partition than indicated in the vldb.
Examples:
Volume foobar is on /vicepb, but foobar.backup is incorrectly on
partition /vicepa.
$ vos backup foobar
Failed to clone the volume 536870934
: Invalid cross-device link
VolserLog:
VCreateVolume: volume 536870936 for parent 536870934 found on /vicepa; unable to create volume on /vicepb.
1 Volser: Clone: Couldn't create new volume 536870936 for parent 536870934; clone aborted
...
The vldb indicates a read-only volume should be on /vicepa on a
remote site, but the actual volume is on /vicepb.
$ vos release xyzzy
Failed to create the ro volume: : Input/output error
The volume 536870921 could not be released to the following 1 sites:
mantis /vicepa
VOLSER: release could not be completed
...
VolserLog on mantis:
VCreateVolume: volume 536870922 for parent 536870921 found on /vicepb; unable to create volume on /vicepa.
...
Reviewed-on: http://gerrit.openafs.org/10849 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit 21a85792c44e2145eea6d10dc31d58028ba933b8)
Michael Meffie [Fri, 12 Jun 2015 00:28:43 +0000 (20:28 -0400)]
libafs: reset all the volumes with fs flushall
Fix a logic bug in fs flushall in which only the first volume in each
hash chain is reset (invalidated). Instead, reset all the volumes in
the volume hash.
Also, when flushing a single volume with fs flushvolume, don't bother
searching all the hash chains, instead start on the hash chain
containing the volume being flushed.
Reviewed-on: http://gerrit.openafs.org/11892 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 82e02157fec248293e7336f0e0b3d1c9da545228)
Change-Id: I5dddbaed265ee1ce5dc14e88e22abcb29d96db58
Reviewed-on: http://gerrit.openafs.org/11894 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Mon, 10 Feb 2014 22:23:07 +0000 (16:23 -0600)]
bozo: Constify bozo_Log 'format' argument
We clearly do not need to modify the format string; declare it const.
This makes the signature of bozo_Log identical to FSLog, which can
make it easier to use these functions interchangeably.
Reviewed-on: http://gerrit.openafs.org/10830 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit ed1b1df3c8acf9a2c5d4dface88ac15dcb8d7a2e)
Change-Id: I29fb3df82866dc8457d92a0b88eb02ae50879db7
Reviewed-on: http://gerrit.openafs.org/11931 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Fri, 26 Jun 2015 13:09:18 +0000 (09:09 -0400)]
doc: bosserver runs in the background
Since OpenAFS 1.0 bosserver automatically puts itself into the
background and removes it's controlling terminal. Update the examples in
the Admin and Quick Start Guides to remove the unneeded '&' on the
command line to start the bosserver.
Reviewed-on: http://gerrit.openafs.org/11906 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Tested-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 4ef47f787a64dc5c8ebb73a454b0851c86d7c06b)
Change-Id: Ife77c07bb5a00244ef346c0eb70782685c2bb2b1
Reviewed-on: http://gerrit.openafs.org/11932 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Stephan Wiesand [Fri, 14 Aug 2015 06:46:36 +0000 (08:46 +0200)]
Make OpenAFS 1.6.14.1
Update configure version strings for 1.6.14.1. Note that macos kext
can be of form XXXX.YY[.ZZ[(d|a|b|fc)NNN]] where d dev, a alpha,
b beta, f final candidate so we have no way to represent 1.6.14.1.
Switch to 1.6.15 dev 1 for macos.
Change-Id: I733de0ef5d359bffdb7ffe6a7c12cf60f18618c0
Reviewed-on: http://gerrit.openafs.org/11982 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Marc Dionne [Wed, 29 Jul 2015 12:03:14 +0000 (09:03 -0300)]
Linux: Only use automount for volume roots
As long as we avoid using directory aliases when crossing
a mount point (at the volume root), we should always get
to a given non root directory with the same dentry.
The mechanism added by commit de381aa0 ("Linux: Make dir
dentry aliases act like symlinks") is therefore only really
necessary for a volume root.
With kernel 4.2 it is not possible to tweak the "total link
count", resulting in ELOOP errors when looking up a path
with 40 or more directories that are being looked up for
the first time. With this change, only mountpoints will
count against the limit.
Reviewed-on: http://gerrit.openafs.org/11945 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 05f64de7d723a8d5430d9b5928c2025838a6fa52)
Change-Id: I16e855c8322174604288b7d440b342951dd3a015
Reviewed-on: http://gerrit.openafs.org/11989 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Add a new macro to check the signature of a particular
operation against a provided typed argument list.
One of the arguments is an arbitrary label that is used
to construct the pre-processor define name. This will
allow for testing of different forms for the same
operation.
This can be used to replace many of the remaining odd
checks in src/cf/linux_test4.m4.
Marc Dionne [Mon, 6 Jul 2015 14:00:13 +0000 (11:00 -0300)]
Linux 4.2: total_link_count is no longer accessible
The value is now stored in the nameidata structure which
is private to fs/namei.c, so we can't modify it here.
The effect is that using a path that contains 40+ directories
may fail with ELOOP, depending on which directories in the
path were previously used. After a directory is accessed once
its D_AUTOMOUNT flag is reset and it will no longer count
against the symlink limit in later path lookups.
Simon Wilkinson [Sun, 17 Apr 2011 22:43:51 +0000 (23:43 +0100)]
Linux CM: Use kernel allocator directly
In another few locations within the Linux portion of the cache
manager, directly use the kernel allocator. We can do so here
because we can guarantee that the amount of memory being allocated
is less than the page size, and there is a kfree() in all of the
exit paths, so we don't need the magic freeing behaviour, either.
Reviewed-on: http://gerrit.openafs.org/4752 Reviewed-by: Derrick Brashear <shadow@dementia.org> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Tested-by: Derrick Brashear <shadow@dementia.org>
(cherry picked from commit 7a70c2907b0435653098a611a140fea1ac0b2fac)
Change-Id: I72fd6a2109022af5e14d90ce147705da7ccec587
Reviewed-on: http://gerrit.openafs.org/11933 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Jeffrey Altman [Sat, 1 Aug 2015 13:32:35 +0000 (09:32 -0400)]
vlserver: ListAttributesN2 volume name safety
The vlserver ListAttributesN2 RPC permits filtering the result set
by volume name in addition by site or volume id.
Two issues identified by Andrew Deason (Sine Nomine Associates) are
addressed by this patch. First, the size of the volumename[] buffer
is insufficient to store the valid input read over the network. The
buffer needs to be able to store VL_MAXNAMELEN characters of the volume
name, two characters for the regular expression '^' and '$', and the
trailing NUL.
Second, sprintf() is used to write to the buffer and even with valid
input from the caller SVL_ListAttributesN2 can overflow the buffer
when ".backup" and ".readonly" are appended to the volume name. If
there is an overflow the search name is invalid and there can not be
a valid match.
This patch increases the size of volumename[] to VL_MAXNAMELEN+3.
It also uses snprintf() instead of sprintf() and performs error
checking. The error VL_BADNAME is returned when the network input is
invalid.
D Brashear [Fri, 18 Jul 2014 20:00:12 +0000 (16:00 -0400)]
vlserver: limit use of regex to admins always
allow regexes only if the querying user is a superuser.
if the superuser uses up all the resources, well, they could just do
whatever damage directly anyway. means even in unrestricted mode
we are not vulnerable
Reviewed-on: http://gerrit.openafs.org/11968 Reviewed-by: Daria Brashear <shadow@your-file-system.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 049323e7e03c64f534a73ff452d218f19d5b8132)
Change-Id: I1e3f11bd14b071be69eb6e00c26ea2209596c82a
Reviewed-on: http://gerrit.openafs.org/11975 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Mark Vitale [Wed, 8 Jul 2015 18:28:50 +0000 (14:28 -0400)]
Solaris: setpag should verify that ngroups will not overflow
Our ngroups management (since PAGs are still encoded as 2 groups) needs
to ensure that we do not overflow what we are prepared to handle,
and do not panic due to misheld mutexes if we have to return an error
when handling it.
Marc Dionne [Wed, 22 Apr 2015 18:06:12 +0000 (15:06 -0300)]
Linux: mmap: Apply recursion check only to recursion cases
The CPageWrite flag was originally added to prevent a scenario
where a thread doing "writepage" would realize that the cache
was too full and that some of its contents need to be written
back to the server. Before writing back it would ask the OS to
flush any dirty VM associated with the vcache entries that are
to be written, to make sure the data is not stale. This flush
could itself trigger writeback, leading to deadly recursion.
One such scenario is a process doing mmap writes to a file larger
than the cache.
With some kernel versions and some callers of writepage, this
can cause the mapping to be marked as being in an error state,
leading to EIO errors passed back to user space.
Make the recursion check more specific to only bail when the
calling thread is one that was originally seen writing. A list
of current writers is maintained instead of a single state flag.
This lets other threads (like the flusher thread) go on with
writeback to the same file, and limits the WRITEPAGE_ACTIVATE
return case to call sites that can deal with it.
In testing this helps avoid EIO errors when writing large
chunks of data through mmap.
Thanks to Yadav Yadavendra for extensive analysis and testing.
Marc Dionne [Mon, 20 Apr 2015 13:41:53 +0000 (10:41 -0300)]
Linux 4.1: Don't define or use ->write directly
We no longer have to define a ->write operation, and we can't
expect the underlying disk cache filesystem to have one. Use
the new __vfs_read/write helpers that will select the operation
to use based on what's available for that particular filesystem.
Reviewed-on: http://gerrit.openafs.org/11849 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 5c1237432edf4600111845d175c92252430d5f76)
Change-Id: I21bca85637e07d0e03ef471896d0454eeef68a14
Reviewed-on: http://gerrit.openafs.org/11873 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Marc Dionne [Mon, 20 Apr 2015 13:37:40 +0000 (10:37 -0300)]
Linux 4.1: No need for do_sync_read
Make the test here a bit more specific. do_sync_read no longer
exists, but we don't use it for new kernels. Trying to define it
here in terms of generic_file_read is not helpful as that doesn't
exist anymore.
Reviewed-on: http://gerrit.openafs.org/11848 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit fcfa5ae2468d878db962a93d6013fcd3042e6c13)
Change-Id: I87bf0fc856d244d15bdae300f0cd6b80ecb63797
Reviewed-on: http://gerrit.openafs.org/11872 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Wed, 20 May 2015 14:57:53 +0000 (10:57 -0400)]
afsio: switch BreakUpPath to strdup
The current version of BreakUpPath is slightly broken, since
commit 4e68282e26b0c4569d25d076d54274f0da47a691 -- it has two
output parameters but takes only one length parameter for the
size of the output buffers passed in. The callers ended up using
the shorter of the buffer lengths in question, so there is not
a risk of a buffer overrun, but long paths would not be properly
handled.
There is not really any need to pass in a length at all, since
what is going on is conceptually strdup, and there is no real
need to use strlcpy at all. Make the change from strlcpy to
str(n)dup, and adjust callers to free the outputs as appropriate.
While here, convert writeFile() to use goto and a cleanup handler
to avoid leaks.
Jeffrey Altman [Mon, 12 Nov 2012 03:00:07 +0000 (22:00 -0500)]
afsio: process windows file paths consistently
Windows file paths can use either '\' or '/' as a path
separator. libafscp on the other hand requires '/' and argv[0]
will always use '\'.
Introduce a new function ConvertAFSPath() which converts the
input path to '/' and converts \\afs to /afs. A future commit
should access the registry and make use of the NetbiosName and
MountRoot values to perform the conversion correctly.
Simon Wilkinson [Fri, 30 Mar 2012 18:35:51 +0000 (19:35 +0100)]
venus: Make clang happy with strlcpy use
clang now expects that strlcpy will always be used to prevent overflow
of the destination string, and gives a warning if the size parameter is
based solely on the length of the source string.
Modify the BreakUpPath function so that it takes the size of the
destination string as an argument, and uses this to limit the amount of
data pasted into it.
Ben Kaduk [Wed, 11 Feb 2015 22:47:10 +0000 (17:47 -0500)]
Remove spurious NULL checks
clang 3.5 is more aggressive about these checks than the previous
FreeBSD system compiler, so new warnings (which became errors)
appeared on FreeBSD 11-CURRENT.
In afs_dcache.c, checking &tdc->f for NULL-ness has no effect.
The struct fcache f member of struct dcache is an ordinary structure
element; its address will be the value of tdc plus the offset of
f within struct dcache, which will not be NULL even if tdc is NULL.
In ubik_db_if.c, udbHandle is a file-scope global and thus has
allocated storage; the address of a member variable will never
be NULL. The 0 it was compared against was spelled RX_SECIDX_NULL,
which shows the intended check, which is for the value of the
uh_scIndex member variable, not its address.
In afscp_server.c, srv->conns can never be NULL since conns is a member
variable of struct afscp_server (of array type, containing pointers
to struct rx_connection). Comparing the array member variable against
NULL is comparing the address of the array, which is never NULL since
it is not allocated separately from struct afscp_server.
In fssync-debug.c, state.vop->partName is never NULL because
common_volop_prolog always allocates for state.vop, and the
partName member variable of struct fssync_state is of array type,
and thus is not separately allocated from the containing structure.
Reviewed-on: http://gerrit.openafs.org/11739 Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit fb499c2406450fa5dc423a0b038266d3b8e79e33)
Change-Id: I13799a3362508672136f8c603eabdfc0f3ee072d
Reviewed-on: http://gerrit.openafs.org/11843 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Wed, 22 Apr 2015 17:43:43 +0000 (13:43 -0400)]
kauth: fix clock skew detection
Commit 5b3c1042969daec38ccb260e61d665eda0c713ea changed/removed some
uses of abs() on unsigned time values. While the previous use of abs()
was indeed incorrect, the result wasn't necessarily much better, even
though it built with recent compilers, since it only checked for skew
in one direction.
Define and use a macro to correctly evaluate the conditionals in 64-bit
precision, avoiding C's integer promotion rules which prefer unsigned types
(Date) to signed types of the same width (time_t on 32-bit systems).
Reviewed-on: http://gerrit.openafs.org/11850 Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 810f0ccd0354dac30af024ca7b5acf3ebabf5f4b)
Change-Id: I29337e1ecd410fcf7733408287930c50c055ff90
Reviewed-on: http://gerrit.openafs.org/11863 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Ben Kaduk [Fri, 13 Feb 2015 14:47:20 +0000 (09:47 -0500)]
Fix incorrect uses of abs()
abs(3) is a function of one variable of type int returning int.
labs(3) is a function of one variable of type long returning long.
labs(3) should be used when the input is of type long, as in
kaprocs.c.
Calling anything from the abs(3) family on a variable of unsigned
type is a bogus type pun, and a logical operation which is a no-op.
(Unsigned values are never negative and thus the absolute value
function is the identity over the entire range of values representable
in an unsigned type.) Just remove the use of abs() for unsigned
values, as in kaprocs.c, krb_udp.c, and vldb_check.c
While in kaprocs.c, wrap a long line that was touched for the
conversion to labs(3), spell the argument to time(3) as NULL
instead of 0, remove unneeded parentheses, and correct the spelling
of "reserved".
Reviewed-on: http://gerrit.openafs.org/11745 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 5b3c1042969daec38ccb260e61d665eda0c713ea)
Change-Id: I82038e41346479dad39466907b95f2d7540f6258
Reviewed-on: http://gerrit.openafs.org/11842 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Jeffrey Altman [Thu, 22 Jan 2015 06:14:28 +0000 (01:14 -0500)]
ubik: SDISK_Begin no quorum, wrong db, no transaction
When processing an DISK_Begin RPC verify that there is an active quorum
and that the local database is current. Otherwise, fail the RPC with
a UNOQUORUM error.
The returned error must be UNOQUORUM instead of USYNC becase the returned
error code will be returned by the coordinator's ContactQuorum_iterate()
to the client that triggered the write transaction. Most ubik clients
will only retry if the error is UNOQUORUM.
FIXES 131997
Change-Id: Icaa30e6aca82e7e7d33e9171a4f023970aba61df
Reviewed-on: http://gerrit.openafs.org/11689 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit d47beca13236c64ed935fabeff9d1001e8a8871f)
Reviewed-on: http://gerrit.openafs.org/11773 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Michael Meffie [Fri, 14 Nov 2014 03:28:08 +0000 (22:28 -0500)]
libafs: remove "Please install afsd with check server daemon" warning
Apparently, ancient versions of afsd did not start the check server
daemon (AFSOP_START_CS). The afs_Daemon tries to detect when the check
server daemon is not running and issues a warning to upgrade afsd. The
afs_Daemon waits for the cache initialization to complete (AFSOP_GO)
before detecting if the cache server daemon is started.
Unfortunately, when running with memcache, the cache initialization is
fast enough to race with the start of the check server daemon, and the
"Please install afsd with check server daemon" message is sometimes
printed to the syslog.
Since all modern versions of afsd do start the check server daemon, this
error message is no longer needed, so just remove the message and the
flag used to print it on only once.
Reviewed-on: http://gerrit.openafs.org/11602 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 8ce37d0d4aa4e6107f79efaf5027f31ea5a17604)
Change-Id: I292052c9ba629c85ddc4b76c4b3db7d54ce1d852
Reviewed-on: http://gerrit.openafs.org/11680 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Tue, 10 Jun 2014 19:47:31 +0000 (14:47 -0500)]
doc: Document fs listquota 2TB partition limit
We have previously documented that volumes over 2TB can result in
inaccuracies, but this documentation does not say how the 'partition'
field in "fs listquota" can be inaccurate. It is confusing to see a
usage of 0% for a partition that you know is being used, so try to
briefly explain in what way this field is inaccurate.
The reason we _under_-report the partition usage is that the
fileserver actually gives back PartBlocksAvail and PartMaxBlocks (not
"blocks used" and "blocks total"). So 1TB used and 4TB total is
truncated to 2TB and given back as 2TB free and 2TB total. One we hit
3TB used we'll report it as 1TB free 2TB total (50%) when the actual
usage is 75%.
Reviewed-on: http://gerrit.openafs.org/11245 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit cd8f24d9a1ba8563c6bef2b8d30885a753e8d30c)
Change-Id: I2bd72cca994414a88073d26d44bef49e9cac3be1
Reviewed-on: http://gerrit.openafs.org/11626 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Stephan Wiesand [Wed, 1 Apr 2015 13:52:46 +0000 (15:52 +0200)]
Make OpenAFS 1.6.11.1
Update configure version strings for 1.6.11.1. Note that macos kext
can be of form XXXX.YY[.ZZ[(d|a|b|fc)NNN]] where d dev, a alpha,
b beta, f final candidate so we have no way to represent 1.6.11.1.
Switch to 1.6.12 dev 1 for macos.
Anders Kaseorg [Mon, 23 Feb 2015 04:43:49 +0000 (23:43 -0500)]
Treat Linux 4 (and greater) as Linux 2.6/3
In an age where Linux version numbers are determined by Google+ polls,
it’s clear that they aren’t going to be very useful for marking major
API compatibility boundaries like they were in the days of 2.2/2.4.
Reviewed-on: http://gerrit.openafs.org/11755 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit a5b091e1ec69d4a43d6f1b1efc93134ef7ed2167)
Change-Id: I5b0da6b43e3cbf5d9a6fa883a09deccb359e53e9
Reviewed-on: http://gerrit.openafs.org/11760 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Benjamin Kaduk [Wed, 25 Mar 2015 04:26:42 +0000 (00:26 -0400)]
FBSD: do not set -mno-align-long-strings
The new clang imported for FreeBSD 10.1 has stopped accepting
this argument as a no-op. Fix the kernel module build by
stopping passing it on the compiler command line.
Andrew Deason [Wed, 4 Feb 2015 16:25:38 +0000 (10:25 -0600)]
rx: Zero unitialized uio structs
We use some uio structures that were allocated on the stack, but we
only initialize them by initializing individual fields. On some
platforms (Solaris is one known example, but probably not the only
one), there are additional fields we do not initialize. Since we
cannot be certain of what any additional fields there may be, just
zero the whole thing.
This is basically the same change as
I0eae0b49a70aee19f3a9ec118b03cfb3a6bd03a3, but in the rx subtree.
Reviewed-on: http://gerrit.openafs.org/11711 Tested-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Daria Brashear <shadow@your-file-system.com>
(cherry picked from commit a762e6871ad6837ee126cec9e63d99388b4bf119)
Change-Id: Ie6a2cce500d6a0a7a09c305296f4b34d122d3108
Reviewed-on: http://gerrit.openafs.org/11714 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Marc Dionne [Thu, 18 Dec 2014 13:43:22 +0000 (08:43 -0500)]
Linux: d_splice_alias may drop inode reference on error
d_splice_alias now drops the inode reference on error, so we
need to grab an extra one to make sure that the inode doesn't
go away, and release it when done if there was no error.
For kernels that may not drop the reference, provide an
additional iput() within an ifdef. This could be hooked up
to a configure option to allow building a module for a kernel
that is known not to drop the reference on error. That hook
is not provided here. Affected kernels should be the early
3.17 ones (3.17 - 3.17.2); 3.16 and older kernels should not
return errors here.
[kaduk@mit.edu add configure option to control behavior, which
is mandatory on non-buildbot linux systems]
Reviewed-on: http://gerrit.openafs.org/11643 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 15260c7fdc5ac8fe9fb1797c8e383c665e9e0ccd)
Andrew Deason [Fri, 30 Jan 2015 19:29:57 +0000 (13:29 -0600)]
afs: Zero uninitialized uio structs
In several places in the code, we allocate a 'struct uio' on the
stack, or allocate one from non-zeroed memory. In most of these
places, we initialize the structure by assigning individual fields to
certain values. However, this leaves any remaining fields assigned to
random garbage, if there are any additional fields in the struct uio
that we don't know about.
One such platform is Solaris, which has a field called uio_extflg,
which exists in Solaris 11, Solaris 10, and possibly further back.
One of the flags defined for this field in Solaris 11 is UIO_XUIO,
which indicates that the structure is actually an xuio_t, which is
larger than a normal uio_t and contains additional fields. So when we
allocate a uio on the stack without initializing it, it can randomly
appear to be an xuio_t, depending on what garbage was on the stack at
the time. An xuio_t is a kind of extensible structure, which is used
for things like async I/O or DMA, that kind of thing.
One of the places we make use of such a uio_t is in afs_ustrategy,
which we go through for cache reads and writes on most Unix platforms
(but not Linux). When handling a read (reading from the disk cache
into a mapped page), a copy of our stack-allocated uio eventually gets
passed to VOP_READ. So VOP_READ for the cache filesystem will randomly
interpret our uio_t as an xuio_t.
In many scenarios, this (amazingly) does not cause any problems, since
generally, Solaris code will not notice if something is flagged as an
xuio_t, unless it is specifically written to handle specific xuio_t
types. ZFS is one of the apparent few filesystem implementations that
can handle xuio_t's, and will detect and specially handle a
UIOTYPE_ZEROCOPY xuio_t differently than a regular uio_t.
If ZFS gets a UIOTYPE_ZEROCOPY xuio_t, it appears to ignore the uio
buffers passed in, and supplies its own buffers from its cache. This
means that our VOP_READ request will return success, and act like it
serviced the read just fine. However, the actual buffer that we passed
in will remain untouched, and so we will return the page to the VFS
filled with garbage data.
The way this typically manifests is that seemingly random pages will
contain random data. This seems to happen very rarely, though it may
not always be obvious what is going on when this occurs.
It is also worth noting that the above description on Solaris only
happens with Solaris 11 and newer, and only with a ZFS disk cache.
Anything older than Solaris 11 does not have the xuio_t framework
(though other uio_extflg values can cause performance degradations),
and all known non-ZFS local disk filesystems do not interpret special
xuio_t structures (networked filesystems might have xuio_t handling,
but they shouldn't be used for a cache).
Bugs similar to this may also exist on other Unix clients, but at
least this specific scenario should not occur on Linux (since we don't
use afs_ustrategy), and newer Darwin (since we get a uio allocated for
us).
To fix this, zero out the entire uio structure before we use it, for
all instances where we allocate a uio from the stack or from
non-zeroed memory. Also zero out the accompanying iovec in many
places, just to be safe. Some of these may not actually need to be
zeroed (since we do actually initialize the whole thing, or a platform
doesn't have any additional unknown uio fields), but it seems
worthwhile to err on the side of caution.
Thanks to Oracle for their assistance on this issue, and thanks to the
organization experiencing this issue for their patience and
persistence.
1.6 note: This differs noticeably from the master commit in two
places:
- src/afs/NBSD/osi_vnodeops.c: On master there is no stack-allocated
uio struct here.
- src/afs/VNOPS/afs_vnop_write.c and afs_vnop_read.c: On master,
these code paths are structured quite differently, and are handled
in afs_osi_uio.c instead.
Reviewed-on: http://gerrit.openafs.org/11705 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Daria Brashear <shadow@your-file-system.com>
(cherry picked from commit 5ef1de5eddccce0e7b135bb9ed4decaa62fc19ce)
Change-Id: I8dbf60637774dff81ff839ccd78f58b3b1e85c5b
Reviewed-on: http://gerrit.openafs.org/11713 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Fri, 30 Jan 2015 19:08:19 +0000 (13:08 -0600)]
SOLARIS: Avoid uninitialized caller_context_t
Currently we pass a caller_context_t* to some of Solaris' VFS
functions (VOP_SETATTR, VOP_READ, VOP_WRITE, VOP_RWLOCK,
VOP_RWUNLOCK), but the pointer we pass is to uninitialized memory.
This code was added in commit 51d76681, and this particular argument
is mentioned in
<https://lists.openafs.org/pipermail/openafs-info/2004-March/012657.html>,
where the author doesn't really know what the argument is for.
Over 10 years later, it's still not obvious what this argument does,
since I cannot find any documentation for it. However, browsing
publicly-available Illumos/OpenSolaris source suggests this is used
for things like non-blocking operations for network filesystems, and
is only interpreted by certain filesystems in certain codepaths.
In any case, it's clear that we're not supposed to be passing in an
uninitialized structure, since the struct has actual members that are
sometimes interpreted by lower levels. Other callers in
Illumos/OpenSolaris source seem to just pass NULL here if they don't
need any special behavior. So, just pass NULL.
I am not aware of any issues caused by passing in this uninitialized
struct, and browsing Illumos source and discussing the issue with
Oracle engineers suggest there would currently not be any issues with
the cache filesystems we would be using.
However, it's always possible that issues could arise from this in the
future, or there are issues we don't know about. Any such issues would
almost certainly appear to be non-deterministic and be a nightmare to
track down. So just pass NULL, to avoid the potential issues.
Reviewed-on: http://gerrit.openafs.org/11704 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Daria Brashear <shadow@your-file-system.com>
(cherry picked from commit b9647ac1062509d6a3997ca575ab1542d04677a2)
Change-Id: I5d247cfa6ada3773d20e3938957dcc31c8664bb2
Reviewed-on: http://gerrit.openafs.org/11712 Reviewed-by: Perry Ruiter <pruiter@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
When a corrupt directory is discovered, scanning stops immediately and
readdir returns ENOENT. Currently, the vcache lock is unlocked and the
dcache containing the directory is released, but that's not enough.
It's also necessary to unlock the dcache, on which we hold a read lock,
and to clear the vcache state which records an in-progress readdir.
Andrew Deason [Mon, 1 Dec 2014 16:23:23 +0000 (10:23 -0600)]
LINUX: Avoid mvid NULL deref in check_bad_parent
check_bad_parent dereferences vcp->mvid, assuming it is not NULL (vcp
is a root vcache here, so mvid refers to the parent fid). However, in
some situations, vcp->mvid can be NULL.
When we first afs_GetVCache the fid, we try to set mvid by setting
mvid to the 'dotdot' structure in the volume struct. But we get that
volume struct from afs_GetVolume, which can fail (at the very least,
this can fail on network failure when looking up vldb information). If
it fails, then we do not set the mvid parent. On future lookups for
the fid, afs_GetVCache will return early for a fastpath, if the vcache
is already in memory. So, mvid will never get set in such a situation.
We also set the mvid parent fid in afs_lookup if we resolved a
mountpoint to the root vcache. However, this is skipped if CMValid is
not set on the vcache, so if CMValid is cleared right after resolving
the mountpoint (say, perhaps done by some other thread e.g. a callback
break or other reasons), then the mvid parent fid will not be set.
To avoid crashing in these situations, if vcp->mvid is NULL in
check_bad_parent, don't check the mvid, and assume it does not match
(since we don't know what it is).
Marc Dionne [Mon, 5 Jan 2015 12:03:16 +0000 (07:03 -0500)]
Linux 3.19: No more f_dentry
Back in kernel 2.6 .20 struct file lost its f_dentry field
which was replaced by f_path.To ease transition f_dentry
was defined as f_dpath.dentry in the same header.This
define finally gets removed with kernel 3.19.
Keep using f_dentry in the code, but add a configure test
for the presence of f_path and the absence of the f_dentry
macro so we can add it if its missing.
Marc Dionne [Thu, 18 Dec 2014 12:13:46 +0000 (07:13 -0500)]
Linux: d_alias becomes d_u.d_alias
The fields in struct dentry are re-arranged so that d_alias
shares space wth d_rcu inside the d_u union. Some references
need to change from d_alias to d_u.d_alias.
The kernel change was introduced for 3.19 but was also backported
to the 3.18 stable series in 3.18.1, so this commit is required
for 3.19 and current 3.18 kernels.
Reviewed-on: http://gerrit.openafs.org/11642 Reviewed-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit d6f29679098aff171e69511823b340ccf28e5c31)
Marc Dionne [Thu, 18 Dec 2014 11:57:22 +0000 (06:57 -0500)]
Linux: Move code to reset the root to afs/LINUX
Move the Linux specific bit of code to reset the root to
afs/LINUX platform specific files. Things that play with
the Linux vfs internals should not be exposed here.
No functional change, but this helps cleanup some ifdef
mess.
Reviewed-on: http://gerrit.openafs.org/11641 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de> Reviewed-by: Daria Brashear <shadow@your-file-system.com>
(cherry picked from commit 6ca324e565c34d9d04f3c553b7d0febe675ae538)
Andrew Deason [Sun, 14 Sep 2014 19:10:11 +0000 (14:10 -0500)]
afs: Fix some afs_conn overcounts
The usual pattern of using afs_Conn looks like this:
do {
tc = afs_Conn(...);
if (tc) {
code = /* ... */
} else {
code = -1;
}
} while (afs_Analyze(...));
The afs_Analyze call, amongst other things, puts back the reference to
the connection obtained from afs_Conn. If anything inside the do/while
block exits that block without calling afs_Analyze or afs_PutConn, we
will leak a reference to the conn.
A few places currently do this, by jumping out of the loop with
'goto's. Specifically, in afs_dcache.c and afs_bypasscache.c. These
locations currently leak references to our connection object (and to
the underlying Rx connection object), which can cause problems over
time. Specifically, this can cause a panic when the refcount overflows
and becomes negative, causing a panic message that looks like:
afs_PutConn: refcount imbalance 0xd34db33f -32768
To avoid this, make sure we afs_PutConn in these cases where we 'goto'
out of the afs_Conn/afs_Analyze loop. Perhaps ideally we should cause
afs_Analyze itself to be called in these situations, but for now just
fix the problem with the least amount of impact possible.
FIXES 131885
Reviewed-on: http://gerrit.openafs.org/11464 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Daria Brashear <shadow@your-file-system.com> Tested-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit 54c0ee608f4afd2b178c9b60eabfc3564293d996)
Marc Dionne [Fri, 19 Dec 2014 15:11:53 +0000 (10:11 -0500)]
Unix CM: Avoid using stale DV in afs_StoreAllSegments
It was reported in RT 131976 that on Linux some file
corruption was observed when doing mmap writes to
a file substantially larger than the cache size.
osi_VM_StoreAllSegments drops locks and asks the OS to flush
any dirty pages in the file 's mapping. This will trigger
calls into our writepage op, and if the number of dirty
cache chunks is too high (as will happen for a file larger
than the cache size), afs_DoPartialWrite will recursively
call afs_StoreAllSegments and some chunks will be written
back to the server. After potentially doing this several
times, control will return to the original afs_StoreAllSegments.
At that point the data version that was stored before
osi_VM_StoreAllSegments is no longer correct, leading to
possible data corruption.
Triggering this bug requires writing a file larger than the
cache so that partial stores are done, and writing enough
data to exceed the system's maximum dirty ratio and cause
it to initiate writeback.
To fix, just wait until after osi_VM_StoreAllSegments to
look at and store the data version
Andrew Deason [Tue, 26 Apr 2011 19:32:25 +0000 (14:32 -0500)]
Fix --without-krb5
Currently, specifying --without-krb5 causes the AM_CONDITIONAL
KRB5_USES_COM_ERR to not be defined, which makes configure refuse to
run successfully. Fix this by forcing KRB5_USES_COM_ERR to always be
false if we are running explicitly without krb5.
Benjamin Kaduk [Wed, 27 Aug 2014 16:23:20 +0000 (12:23 -0400)]
Appease compile_et for objdir builds
The argument we pass to -p needs to be in the source tree, not
the object tree -- compile_et will not find the input files it
wants in the objdir tree.
For tbudb we can do this as is done on master, by just including
it in the local variable BUDB, but for tptserver and tvlserver
that is a rather invasive change.
Benjamin Kaduk [Thu, 30 Oct 2014 23:51:29 +0000 (19:51 -0400)]
Build fix for recent FreeBSD -current
r273707 added a flags argument to syscall_register(), so
add the appropriate version check in param.generic_fbsd.h
and ues that in the main code.
Reviewed-on: http://gerrit.openafs.org/11565 Tested-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit f39dd54e11dff5e2b4da3eec419ae7c0825c210f)
Note that the 1.6 branch does not have param.generic_fbsd.h, so
the patch was modified for backport to just use AFS_FBSD110_ENV
as the conditional.
Change-Id: Id2934d83940caff4f64c2a7f2187b0eca5881c8f
Reviewed-on: http://gerrit.openafs.org/11610 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Wed, 5 Nov 2014 16:22:00 +0000 (10:22 -0600)]
LINUX: Avoid check for key_type.match existence
Commit b5de4a9f removed our key_type 'match' function for kernels that
do not have such a 'match' function pointer. However, this added a
configure test where we are supposed to fail for the "new" behavior,
which is discouraged.
This causes an actual problem, because this test will fail on at least
RHEL5, due to arguably unrelated reasons (the header file for the
relevant struct is in key.h instead of key-type.h). And so, in that
situation we avoid defining a 'match' function callback, meaning our
'match' function callback is NULL, which causes a panic when we try to
actually look up keys for a PAG.
To fix this, transform the 'match' config test into one where we
succeed for the "new" behavior. We do this by testing for the
existence of the new functionality that replaced the old 'match'
function, which is the match_preparse function (specifically, the
'cmp' field in the structure accepted by match_preparse). This should
cause unrelated compilation errors to cause us to revert to the "old"
behavior instead of the "new" behavior. At worst, this should cause
build issues if we get the config test wrong (since we will try to use
the 'match' function definition that does not exist), instead of
panicing at runtime.
Note that while we test for key_type.match_preparse, we don't actually
use that function, since our 'match' functionality is the same as the
default behavior (according to b5de4a9f). So, we can avoid defining
any such function for newer kernels.
Thanks to Stephan Wiesand for bisecting this issue.
Reviewed-on: http://gerrit.openafs.org/11589 Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
(cherry picked from commit a9a3cb2efff7e6c020be4687b004d157bc070ac6)
Change-Id: I59f40258c5ea35a59681f436095922d111e344f6
Reviewed-on: http://gerrit.openafs.org/11595 Tested-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Anders Kaseorg <andersk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Marc Dionne [Thu, 23 Oct 2014 15:27:55 +0000 (11:27 -0400)]
Linux 3.18: key_type no longer has a match op
Structure key_type no longer has a match op, and
overriding the default matching has to be done
differently.
Our current match op doesn't do anything special so there's
no need to try to override the defaults; just remove the
assignment of .match and the associated function.
Reviewed-on: http://gerrit.openafs.org/11563 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit b5de4a9f42bb83ae03f2f647b11a1200a502d013)
Change-Id: I7baca4a7f02eac45671e1e9ebf48534cdd5830be
Reviewed-on: http://gerrit.openafs.org/11570 Reviewed-by: Anders Kaseorg <andersk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Tue, 28 Oct 2014 05:10:56 +0000 (00:10 -0500)]
LINUX: Avoid d_revalidate failure on mtpt mismatch
Currently, if afs_linux_dentry_revalidate is given an inode that
corresponds to a mtpt vcache ('vcp'), it resolves the mtpt to its root
dir if it's easy to do so (mvid and CMValid are set). Later on, we run
afs_lookup to see if looking up our dentry's name returns the same
vcache that we got; afs_lookup presumably will also resolve the mtpt
if it's easy to do so.
However, it is possible that afs_linux_dentry_revalidate and
afs_lookup will make different decisions as to whether or not they
resolve a mtpt to a dir. Specifically, if CMValid is cleared after
afs_linux_dentry_revalidate checks for it, but before afs_lookup does,
then afs_lookup will return a different vcache than
afs_linux_dentry_revalidate is expecting, even though the relevant
directory entry has not changed. That is, tvc is not equal to vcp, but
tvc could be a mtpt that resolves to vcp, or vice versa. CMValid can
be cleared by another thread at virtually any time, since this is
cleared in some situations when we're not sure if the mtpt resolution
is still valid (callbacks are broken, vldb cache entries expire, etc).
afs_linux_dentry_revalidate interprets this situation to mean that the
directory entry has changed, and so it eventually d_drop's the
associated dentry. The way that this manifests to users is that a
"fakestatted" mtpt can appear to be deleted effectively randomly, even
when nothing has changed. This can be a problem because this causes
the getcwd() syscall to return ENOENT when the working directory
involves such an affected directory.
To fix this situation, we just detect if afs_lookup returned either
'vcp' (our possibly-resolved vcache), or the original inode associated
with the dentry we are revalidating. If the returned vcache matches
either of these, then the entry is okay and we don't need to
invalidate or drop anything.
FIXES 131780
Reviewed-on: http://gerrit.openafs.org/11559 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: D Brashear <shadow@your-file-system.com>
(cherry picked from commit ba1cc838ab4a80b7a7787c441a79aca31d84808c)
Change-Id: I3273cc15ebe7fd94f3127840fdc5316bd7458e7c
Reviewed-on: http://gerrit.openafs.org/11568 Reviewed-by: Anders Kaseorg <andersk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>