Jeffrey Altman [Sun, 7 Aug 2011 18:11:17 +0000 (14:11 -0400)]
Windows: make osi_Log macro safe for if..else
wrap the osi_Log macro's internal if statement with
a do {...} while(0) block in order to ensure that
it is safe for use in if..else controls without bracing.
Jeffrey Altman [Thu, 4 Aug 2011 21:25:01 +0000 (17:25 -0400)]
Windows: adjust scache LRU postion upon deletion
If the object represented by a scache object is deleted,
update the LRU position of the scache object to make it
the first object in the LRU queue to be recycled. This
preserves the cached objects for those that might prove
useful in the future.
Instead of using malloc() and free() to allocation lock reference
structures, cache allocated objects in a free list. This reduces
memory fragmentation.
Jeffrey Altman [Thu, 4 Aug 2011 21:08:45 +0000 (17:08 -0400)]
Windows: after dir enum adjust dir scache LRU
During a directory enumeration the directory scache object
is reference counted so it can't be recycled. However, if
there are more directory entries than the maximum number
of cached scache objects the directory scache object will
end up being the next object to be recycled after the refcount
is dropped. Since the directory is clearly a hot object, before
dropping the reference, adjust the scache LRU position so that
it is the last object to be recycled.
Fix the variable name for the directory scache to be 'dscp'
for consistency.
Will Maier [Sun, 31 Jul 2011 13:24:12 +0000 (14:24 +0100)]
RedHat: Return status values from client init
The init script provided with OpenAFS always returns 0 when the status
subcommand is called, even if the service is not running.
For example:
$ sudo service afs status; echo $?
afsd is stopped
0
This change makes sure the init script exits with the value returned
by the status function from /etc/init.d/functions. With this patch,
the afs init script behaves as expected when used, for example, in a
Chef service resource:
$ sudo service afs status; echo $?
afsd is stopped
3
Andrew Deason [Fri, 29 Jul 2011 21:44:11 +0000 (16:44 -0500)]
SOLARIS: Do not release NULL root vp on unmount
When we unmount, and afs_globalVp is NULL (e.g. because root.afs was
unavailable when the client was started), we will panic the machine if
we try to release it. So, if afs_globalVp is NULL when we hit our
unmount handler, don't touch it.
Simon Wilkinson [Wed, 3 Aug 2011 17:45:01 +0000 (18:45 +0100)]
volser doesn't depend on tviced, but on vlserver
Nothing within the volser/ directory depends on tviced, so remove the
unecessary dependency. Add an explicit dependency on vlserver, so that
libvldb is available to us.
This is required to get rid of some potential circular loops when we
start including volser objects in libafsauthent
Garrett Wollman [Sun, 7 Aug 2011 03:15:14 +0000 (23:15 -0400)]
vos: don't free stack garbage on error
If wantExtendedInfo is true, then pntr is used uninitialized.
In the other case, UV_ListVolumes will have set it to NULL
before doing anything (even if it returns an error), so this
free() is dead anyway.
Garrett Wollman [Sun, 7 Aug 2011 03:49:10 +0000 (23:49 -0400)]
butc: avoid testing stack garbage; remove dead initializer
"code" is unconditionally set early in saveDbToTape() so there's
no need to initialize it. On the other hand, dumpEntry.id is used
before dumpEntry is initialized, so set it to what appears to be
the expected value before any non-local exits could cause it to be
inspected.
Set the executable bits on the libraries installed in libdir. This
change is important because it causes 'rpmbuild' to generate Provide
tag metadata for the libraries in the package, which is necessary now
that some binaries in other packages have generated Requires tags for
libraries packaged in the base package. 'rpmbuild' will not generate
the Provides tag if the libraries lack executable permission.
Andy Cobaugh [Fri, 15 Jul 2011 16:06:12 +0000 (12:06 -0400)]
rpm: remove postinstall message from openafs-client
Printing out information on how to configure cacheinfo and ThisCell
is a bit noisy, and pam_afs.so is probably not what most people
want to use nowadays.
Reviewed-on: http://gerrit.openafs.org/5026 Reviewed-by: Derrick Brashear <shadow@dementia.org> Reviewed-by: Simon Wilkinson <sxw@inf.ed.ac.uk> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 30cd8dafa73d90a943f00af05e4841699bc18534)
Simon Wilkinson [Tue, 12 Jul 2011 00:45:10 +0000 (01:45 +0100)]
rpms: Fix handling of x86 architectures
Once upon a time, our specfile would assume that if you were
building for i386 you were building userspace, and that i586 or i686
implied doing a kernel only build. This is no longer the case, and
now everything on modern Fedora is built for i686, so we should adapt
the spec file for this.
Garrett Wollman [Tue, 9 Aug 2011 01:18:15 +0000 (21:18 -0400)]
kdb: don't dereference a null pointer on corrupt database
When iterating through the database, kdb would dereference a
null pointer if it encountered an error retrieving the value
or if the value was not the right length, in code that was clearly
cut-and-pasted from the other branch of an "if" statement where a
specific entry was requested on the command line. Print the name
of the entry with the problem as was apparently intended.
Garrett Wollman [Wed, 10 Aug 2011 04:18:28 +0000 (00:18 -0400)]
FBSD: catch up with the disappearance of VOP_GETVOBJECT
The vnode operation VOP_GETVOBJECT disappeared in FreeBSD 6.0, an
embarrassingly long time ago. Six years ago, a kluge was added
to emulate its behavior, but it did not correctly emulate the
return value of the old VOP implementation. As a result,
osi_VM_StoreAllSegments() could never actually do anything. Since we
don't support FreeBSD before 8.0, remove all references to VOP_GETVOBJECT
and examine vp->v_object directly instead.
This has the result that osi_VM_StoreAllSegments() will actually do
something now, which may not be desirable. (Previously, if somehow
the vnode had no associated VM object, it would crash, and otherwise
it would do nothing at all.)
Jeffrey Altman [Mon, 1 Aug 2011 15:05:51 +0000 (11:05 -0400)]
Windows: conditionalize mappings of error values
Visual Studio 10 adds a large number of additional POSIX C99
error values to errno.h. Wrap each mapping with #ifndef to ensure
that we do not redefine the C runtime errno.h definition.
Jeffrey Altman [Mon, 1 Aug 2011 15:00:55 +0000 (11:00 -0400)]
Windows: unified afs errors must use nt mapping
On Windows, error.h does not provide a complete list of POSIX
C99 error values. OpenAFS fills in the gaps with a private
error mapping table afs/errmap_nt.h (src/util/errmap_nt.h).
If errmap_nt.h is not included prior to processing unified_afs.h,
values such as ELOOP will be mapped to EIO instead of the unique
value defined by errmap_nt.h.
If a cm_BkgDaemon thread finds a queued request whose cm_scache_t
has the CM_SCACHEFLAG_DELETED flag set, do not execute the request
and fail it immediately with CM_ERROR_BADFD. Any attempt to execute
the request will fail with VNOVNODE from the file server.
Windows: cm_BkgDaemon should not do cm_SyncOp's job
cm_SyncOp is designed to synchronize operations among multiple
threads. The background daemon threads should not filter requests
based upon cm_SyncOp states. Doing so is racy and does not produce
better performance.
If the cm_scache_t flags include CM_SCACHEFLAG_DELETED, do not
bother releasing an outstanding file lock to the file server.
The lock went away when the file was deleted. Any attempt to
release will fail with VNOVNODE which is translated locally into
CM_ERROR_BADFD.
If a RXAFS_ReleaseLock RPC fails with VNOVNODE, treat it as
success.
Add PERL variable to the build system. If not specified
externally the variable will be set to 'perl'. However,
ActiveState Perl should be used and not Cygwin Perl. The build
environment should indicate that by specifying a PERL setting.
On Windows, the git repository is checked out as CR-LF.
Tell perl to open the pod file with cr-lf as the end of line.
On Windows, the input file names are of the form podX\foo.pod.in.
Cygwin perl cannot parse the directory for the file name unless
the path separator is converted from \ to /.
The userrealm string in KFW_AFS_get_cred() should not include
the '@' symbol from the user principal. Including the '@' produces
an invalid realm name.
Add KTC and PT error messages to those that can be
translated within afskfw.lib. This improves the error
logging for afslogon.dll, afscreds.exe, and afssrvadm.exe
When computing whether or not to perform an offline volume
check it is critical that the 'lastBusyVolCheck' variable
be assigned the current time instead of 'lastVolCheck'.
By setting the wrong variable a new offline volume check is
performed every 10 seconds which is undesireable.
Add an explicit message that the shutdown sequence is complete.
This is necessary because during a Windows OS shutdown, the service
is frequently killed prior to the memory mapped file is fully released.
because of how the root fid is created we can end up being dumb.
turns out we never want to bypass doing the full pass for root anyway
so just force fixup to not happen.
the shimmed heimdal in Lion crashes on this call now.
the shim also exports diddly squat. fine, we pick over what
IS exported and use only calls available to us.
Simon Wilkinson [Wed, 13 Jul 2011 10:53:57 +0000 (11:53 +0100)]
Add make dist and make srpm targets
Add targets to generate distribution tarballs, and srpms, from a tree.
These will generate packages for whatever the current HEAD of the tree
is - if the HEAD is a release tag, then the packages will be named for
that release, if the HEAD is between releases, then git describe will
be used to create an appropriate version identifier.
The tarballs are generated from the current git repository contents,
anything not checked in will not be included.
Simon Wilkinson [Wed, 13 Jul 2011 13:35:48 +0000 (14:35 +0100)]
vol: Initialise list before error exit when cloning
The inode list wasn't being initialised before the first call into the
error handler. This makes it possible that we end up trying to discard
items from an uninitialised list, with all the chaos that would cause.
Fix things so that this list is correctly set up.
Simon Wilkinson [Wed, 13 Jul 2011 13:33:57 +0000 (14:33 +0100)]
volser: Actually return errors from ListOneVolume
The return code from GetVolInfo was being thrown away, and success
returned to the caller, regardless of the success of this function.
As GetVolInfo's exit codes aren't suitable for sending over the wire,
just return ENODEV if this function returns failure.
Throughout cm_server.c, input parameters to functions that
are protected by cm_serverLock are dereferenced by assignment
during variable initialization prior to the cm_serverLock being
obtained. As a result there is a race which can result in
either list corruption or dereferencing freed memory.
When rx was converted to use pthreads, the code that allocates
a call to a connection channel in rxi_ReceivePacket() was not
made thread safe. The code prior to this patchset permitted a race
in the server connection case. The rx_connection channel assignment
in rxi_ReceivePacket() and the call destruction in rxi_FreeCall()
and rxi_DestroyConnectionNoLock() did not consistently protect the
rx_connection channel array using the conn_call_lock.
This race could result in rxi_ReceivePacket() operating on a
rx_call which was disconnected from the previously assigned
rx_connection.
In addition, the code in rxi_ReceivePacket() that was intended
to protect the allocation of a call using rxi_NewCall() to the
connection channel array was racy with itself.
This patchset consistently applies the conn_call_lock to protect
the allocation / deallocation of calls to the connection channel
array and in the process simplifies the logic in rxi_ReceivePacket()
as it is no longer necessary to protect against a null call pointer
since the race can no longer be lost.
OpenBSD: Add <sys/queue.h> header for <sys/lockf.h>
On OpenBSD, the <sys/lockf.h> header requires the TAILQ_* macros
which are defined in <sys/queue.h>. The latter is not automatically
included by <sys/lockf.h> . This patch makes sure that it is
available by putting it into the OpenBSD-specific param.h files
(so as not to impact any other OS).
Windows: always open dscp in smb_ReceiveNTTranCreate
There were two code paths in smb_ReceiveNTTranCreate that included
asserts in case the directory cm_scache_t object had not been
evaluated. RT129299 contains a report that at least one of
them had been tripped in production. There is no reason to avoid
evaluating the directory scp. It must exist in the cache and
obtaining a reference in all cases simplifies the logic of this
overly complex function.
Windows: Refactor cm_Unlock*() to avoid code duplication
cm_Unlock() and cm_UnlockByKey() duplicate a significant amount
of code. Refactor it into a new static function, cm_IntUnlock()
which handles the process of downgrading or releasing a file
server lock depending upon the lock state of the cm_scache_t
object.
Windows: Do not probe new servers from cm_UpdateVolumeLocation
cm_NewServer() can result in a call to cm_UpdateVolumeLocation()
if a server probe is performed. In order to avoid recursive
calls to cm_UpdateVolumeLocation() do not probe new servers from
within cm_UpdateVolumeLocation().
Andrew Deason [Fri, 1 Jul 2011 21:58:06 +0000 (16:58 -0500)]
vol: Don't always FDH_REALLYCLOSE on linktable ops
If we dec a linktable entry or get a free tag from the link table,
there is no reason to FDH_REALLYCLOSE the linktable fd handle.
FDH_REALLYCLOSE is the same as FDH_CLOSE, except that it tells the
ihandle package that the file handle will not be used again soon. If
we dec a linktable entry or get a free tag, there is no reason to
think that, so just FDH_CLOSE the handle instead.
Andrew Deason [Fri, 1 Jul 2011 19:25:05 +0000 (14:25 -0500)]
DAFS: Do not clear salv state on fssync salvage
When a volume is put into an error state via the FSYNC_VOL_FORCE_ERROR
command, we clear the salvage state informaton on it, since we're
forcing it offline and thus inaccessible. However, if we are forcing
it to an error state because the volume needs salvaging, we just
salvage it. In this case, do not clear the salvage state, since we
need to know if we've already requested or scheduled a salvage so we
can correctly keep track of the number of salvages performed.
Andrew Deason [Wed, 29 Jun 2011 18:51:22 +0000 (13:51 -0500)]
SOLARIS: Granular multiPage detection
Currently, a struct vcache has a multiPage counter, indicating how
many afs_getpage requests are in-flight for that vcache that involve
retrieving multiple pages. Any dcache associated with such vcaches are
then avoided when choosing dcache entries to evict from the cache,
since we may deadlock when trying to evict a dcache entry from one of
the earlier afs_GetOnePage calls in a particular afs_getpage request.
This behavior can cause the client to become unusable if the cache
becomes full, and the only items in the cache are dcache entries in a
file that has an in-flight multi-page afs_getpage request. Since, in
that case, we cannot kick out any entries from the cache, and so we
wait forever to wait for the cache utilization to go down.
To prevent this from occurring, record exactly which ranges in the
file have in-flight multi-page afs_getpage requests, and just avoid
dcache entries in those ranges. This way afs_GetDownD can evict dcache
entries in the same file, but still avoid entries that would cause a
deadlock.
Also add some comments explaining this situation a bit more.
Revert "Rx: When call receive is done, send ack all packet"
This reverts commit 3cd3715e608b801b4848399e42cb47464e6e3cc3,
which replaces an ack with an ackall; ackall processing does
not actually mark all packets acked when it is received, so
it is insufficient.
Andrew Deason [Tue, 21 Jun 2011 21:25:14 +0000 (16:25 -0500)]
DAFS: Do not attach a specialStatus'd vol
If we encounter a preattached volume during GetVolume, we currently
ignore vp->specialStatus before trying to attach. However, we will
generally always fail to attach due to a conflicting vol op, but even
if we don't, GetVolume always returns an error later on if
vp->specialStatus is set. So, same some processing and attempted
attachments by bailing out sooner if vp->specialStatus is set.
Andrew Deason [Tue, 21 Jun 2011 23:08:21 +0000 (18:08 -0500)]
salvager: Clear summary in RecordHeader
Not every field in the summary header in RecordHeader is set, leaving
some used uninitialized when we copy to the given volumeSummaryp (like
'deleted'). Zero out the header before we do anything.
Andrew Deason [Tue, 21 Jun 2011 22:51:32 +0000 (17:51 -0500)]
Build a separate copy of vlib for dasalvager
Currently dasalvager links to vlib.a. But vlib.a is built without any
DAFS defines, and so the size of a struct DiskPartition64 is different
(since dasalvager is built with AFS_DEMAND_ATTACH_UTIL). Build our own
copies of the volume package files instead, with
AFS_DEMAND_ATTACH_UTIL defined.
Andrew Deason [Tue, 21 Jun 2011 19:58:42 +0000 (14:58 -0500)]
vol: Do not overwrite specialStatus in attach2
attach2 wants to set specialStatus to VBUSY in certain conditions
(such as, it discovers a conflicting vol op where VVolOpSetVBusy_r is
true). However, specialStatus may already be set to something else,
like VMOVED if the volume is being moved off of the server. This can
happen if the volserver has checked out and FSYNC_VOL_MOVE'd a
preattached volume but hasn't deleted or checked the volume back in
yet.
So, if specialStatus is already set, don't touch it, so we don't start
reporting VBUSY errors to clients when we should be reporting VMOVED,
or some other error code previously set.
Simon Wilkinson [Sat, 18 Jun 2011 14:50:08 +0000 (15:50 +0100)]
rx: Exit fast restart on non-duplicate ACK
The current code only exits fast restart when we receive an ACK
packet that contains no missing chunks at all. On a network that is
dropping a reasonable chunk of its packets, this means that we spend
most of the call in fast recovery. (I originally found this by running
with the intentionally drop packets feature set to 10%)
TCP's fast retransmit behaviour is that we stay in fast recovery until
we receive our first non-duplicate acknowledgement. In TCP that means an
acknowledgement that moves the window. In RX, it is an acknowledgment
that ACKs a new packet.
Simon Wilkinson [Sat, 18 Jun 2011 12:17:07 +0000 (13:17 +0100)]
rx: Don't limit the # of packets sent in recovery
The RX transmit engine limits the number of packets sent whilst in
loss recovery to one per invocation of the transmit engine. As the
engine cannot be called by the application thread whilst in recovery,
this means that we end up being limited to one packet per ACK received,
which means that despite a growing congestion window we'll only send
one packet per RTT (in effect, a congenstion window of 1).
This will remain the case until we exit recovery, and all of a sudden
can send a large number of packets. If this is larger than the current
capacity of the network, we'll probably end straight back in recovery
again.
Let the congestion window do its job, by removing this arbitrary limit.
Simon Wilkinson [Sat, 18 Jun 2011 12:01:35 +0000 (13:01 +0100)]
rx: Don't wait for TQ busy when entering recovery
Two different threads can cause a call to enter recovery. The event
thread will move a call into recovery as a result of a timeout, or
the listener thread will move it there following a fast retransmit.
In both of these cases, recovery looks different. In the case of
a timeout, we enter slow start, starting as if we were begininning
transmission for the first time. Following fast retransmit, we enter
fast recovery, with different starting parameters than those coming
from slow start.
As a reslt, the current behaviour, where either call sitting in
FAST_RECOVERY_WAIT causes the other to simply return is inappropriate.
Further investigation indiciates that FAST_RECOVER_WAIT is actually
uncessary. There is no harm caused to a thread which is currently
blocked on the network in the middle of a transmit, in adjusting the
window size underneath it. As both of these states collapse the window,
that thread will simply cease sending earlier.
So, simplify the code, and remove the potential race between event and
listener by removing the FAST_RECOVER_WAIT state.
Simon Wilkinson [Sat, 18 Jun 2011 11:43:44 +0000 (12:43 +0100)]
rx: Enter loss recovery when we retransmit
Since I mistakenly wrote commit 36e2d13b, RX hasn't entered congestion
avoidance when a loss event occurs. This is bad, because on todays
networks the majority of packet losses are due to some form of
congestion.
Now that the timeout code has been restructured, the chances of entering
the retransmit routine in error are much much smaller, so this code
needs to be restored.
This change reverts 36e2d13b55085c996d38b30d003296c602ef8ee3. However,
the original RX code has the problem that it assumes that all forms of
fast recovery are the same - in particular, that the call settings that
result from entering fast recovery due to a fast retransmit are
identical to those resulting from a timeout. This is not the case, and
this will be fixed in a later change.
Simon Wilkinson [Sat, 18 Jun 2011 10:58:57 +0000 (11:58 +0100)]
rx: Add Karn-style backoffs to RX retransmits
When we retransmit a packet, we may be doing so because the RTT of the
connection has grown dramatically larger than earlier within the call.
However, RX doesn't permit all ACKs to retransmitted packets to be
counted within the RTT calculation.
So, adopt the same approach as Karn developed for TCP, and as described
in detail in RFC2988. When a retransmit event occurs, backoff the
connection RTT by doubling its value, and hold at this doubled value
until either another retransmit occurs (in which case we back off again,
up to a predetermined ceiling), or we receive an ACK packet which we
can use within the RTT calculation, in which case we drop back down to
the newly measured value.
This change replaces the per-packet backoff strategy originally
implemented in RX (which, whilst allowing resent packets more chance of
arriving, doesn't help with computing a correct RTT).
Simon Wilkinson [Sat, 18 Jun 2011 10:48:45 +0000 (11:48 +0100)]
rx: Make clock_Add correctly add to itself
With the existing clock_Add code, the following:
struct clock a = {2, 800000};
clock_Add(&a, &a);
gives a clock value of {6, 600000}, rather than the expected {5, 60000}.
This is because the ordering of instructions leads it to double count
the carry on the seconds field. Reorder the instructions so that the
carry is correctly applied.
Simon Wilkinson [Sat, 18 Jun 2011 10:35:30 +0000 (11:35 +0100)]
rx: Remove resending logic into its own function
Create a new function, rxi_Resend, which is the entry point to running
the transmit queue as a result of a resend event. This concentrates all
of the resend logic into one place, removes the need for
rxi_StartUnlocked, and means that rxi_Start's arguments don't need to
match those of an event handler.
Simon Wilkinson [Mon, 25 Oct 2010 09:14:12 +0000 (10:14 +0100)]
rx: Don't let timeouts force fast recovery
The current RX implementation goes into fast recovery whenever a
timeout occurs. This is incredibly wasteful, particularly on fast
connections. So, remove this in favour of TCP style behaviour.
Simon Wilkinson [Mon, 25 Oct 2010 08:16:09 +0000 (09:16 +0100)]
rx: Fix resend accounting
rxi_Start flagged itself as 'resending' whenever it flushed the
transmit queue due to a resend event. However, it would flush the
entire transmit queue at this point, rather than only transmitting
packets that require a resend. When running with large window sizes
this results an a large number of packets erroneously being marked
as resent.
Instead, let SendXmitList decide whether a packet is being
retransmitted by using the presence of a serial number. This takes
advantage of the fact that a retransmitted packet must be the only
entry in a packet list - we just flag the packet list, instead of
having to maintain counters for each individual packet.
Jeffrey Altman [Tue, 12 Oct 2010 14:53:43 +0000 (10:53 -0400)]
Rx: Consolidate wait for tq busy and make its use uniform
rxi_WaitforTQBusy() is now used wherever a wait for the transmit
queue is required. It returns either when the transmit queue is
no longer busy or when the call enters an error state.
Having made this change it is clear that call->currentPacket is
not always validated when the call->lock is reacquired which may be
true when rxi_WaitforTQBusy() is called.