Simon Wilkinson [Mon, 11 Oct 2010 17:14:02 +0000 (13:14 -0400)]
Rx: Reject out of order ACK packets
Our RX implementation virtually guarantees that we will see out of
order ACK packets, even on well behaved networks, as we send acks
simultaneously from multiple threads.
Currently we only reject out-of-order ACKS which change the window
position (so a window that advances, can never go back). However,
we fail to deal with the explicit acknowledgement portion of the ACK
packet in the same way...
For example, if we have a packet A that acknowledges packets 1 and 2,
and then a packet B acknowledging 1,2,3 and 4. If B arrives before A,
then we mark 1, 2, 3, 4 as acknowledged, and then treat the arrival of
A as nAcking 3 and 4. This has the same effect as an explicitly stated
nack, triggers an early and unnecessary resend and may, in some situations,
cause the call to go into congestion avoidance.
We can solve this using the previousPacket field of the ACK. This
indicates the last packet seen by the peer. In the same way as
firstPacket, this should never go backwards, and so can be used to
detect out of order acknowledgements, and reject them.
Andrew Deason [Fri, 15 Oct 2010 21:35:32 +0000 (16:35 -0500)]
pts: Specifically check for group id 0
For consistency with the code checking user ids in createuser, check
for a specified group id of 0 specifically and give a slightly
different error message for it.
Russ Allbery [Thu, 14 Oct 2010 20:41:45 +0000 (13:41 -0700)]
Return SRV record ports in network byte order
Convert the port extracted from the SRV record return to network byte
order before assigning it to the port array.
The port in a SRV record is extracted by pulling out the high byte
and low byte and then mathematically combining them, which implicity
converts from network byte order to host byte order. However, the
callers of afsconf_LookupServer expect the port array to be returned
in network byte order since ports are assigned without modification
to the .sin_port field of a struct sockaddr_in. See also the byte
order of the default afsdbPort value.
Reported by Jan Christoph Nordholz (Debian Bug#600228).
Derrick Brashear [Fri, 15 Oct 2010 15:28:34 +0000 (11:28 -0400)]
add objc build rules to make-type makefiles
sadly this needs to be here unless we want os-specific includes
of e.g. shared, lwp, pthreads makefiles for extra rules. as long
as no .m files are built in generic makefiles, this is a reasonable
approach.
Jeffrey Altman [Thu, 14 Oct 2010 21:24:33 +0000 (17:24 -0400)]
Rx: use osi_Assert/osi_Panic instead of assert
Avoid using the openafs src/util/assert.h implementation for Rx
and Rx security classes. Use the built-in osi_Assert() and osi_Panic()
functionality instead. This avoids all references to assert.h except
for rx_pthread.c (Unix only) which requires it for the assert()
references in the src/util/pthread_nosigs.h macros.
Jeffrey Altman [Thu, 14 Oct 2010 22:18:40 +0000 (18:18 -0400)]
Util: include assert.h in pthreads_nosig.h when required
If assert() will be used within pthreads_nosig.h, include assert.h.
Also, permit assert() to be a macro that is a no-op by always evaluating
the expression.
Marc Dionne [Thu, 14 Oct 2010 22:45:32 +0000 (18:45 -0400)]
LINUX/osi_vnodeops.c: minor coding style fixes
Re-indent and correct a few coding style issues in this section
of code. In particular, it clears up possible confusion on the
scope of the preceding if statement.
Purely cosmetic, no functional changes.
Change-Id: Id6dea6326c9878b41f821de00267f75195fea394
Reviewed-on: http://gerrit.openafs.org/2989 Reviewed-by: Matt Benjamin <matt@linuxbox.com> Reviewed-by: Jeffrey Altman <jaltman@openafs.org> Tested-by: Jeffrey Altman <jaltman@openafs.org>
Andrew Deason [Thu, 11 Mar 2010 18:19:47 +0000 (12:19 -0600)]
Parallel I/O extensions to namei backend
This adds the ability for certain namei operations (currently only
ListViceInodes) to occur across multiple different threads in
parallel. Currently this is only enabled when built with the
not-yet-existant AFS_SALSRV_ENV.
Marc Dionne [Wed, 13 Oct 2010 23:11:25 +0000 (19:11 -0400)]
Linux: fix statfs configure test
The change to the statfs configure test that was made for 2.6.36
broke the test for older kernels. The new test is based on a call,
and that will generate a warning but not an error when the arguments
don't match the prototype.
Take another tack, and revert to the old style test, but with the
simple_statfs function instead of vfs_statfs.
Tom Keiser [Wed, 13 Oct 2010 05:10:09 +0000 (01:10 -0400)]
don't release Volume lightweight ref too early
FSYNC_com_VolOff was releasing its lightweight ref before the error handling
code for VGetVolumeByVp_r was executed; this code needs to dereference the
Volume pointer for some of its logic. This was unsafe since
VCancelReservation_r() could have resulted in the Volume object being freed.
Move VCancelReservation_r() below the error handling block. NB: the error
handling block now relies upon the goto done/deny to cancel its lightweight
ref.
Marc Dionne [Wed, 13 Oct 2010 01:05:45 +0000 (21:05 -0400)]
Linux: fix aklog -setpag to work with ktc_SetTokenEx
The bit of code that allows aklog -setpag to work with recent
linux needed to be moved along with the change from ktc_SetToken
to ktc_SetTokenEx.
While we're in this bit of code, make it depend on the definition
of the syscall in the user space headers instead of relying on a
kernel configure test.
Simon Wilkinson [Mon, 11 Oct 2010 18:08:00 +0000 (14:08 -0400)]
rx: Don't count unknown packets as missing
Just because a packet is in the transmit queue, don't assume that
the other side has instantly seen it! Currently, if we receive an
ACK packet which doesn't include the entire transmit queue, then we
will end backing off, even if we haven't sent the packets.
Restrict this behaviour to packets which are implicitly acked (or
otherwise) by the sender.
Jeffrey Altman [Tue, 12 Oct 2010 14:53:43 +0000 (10:53 -0400)]
Rx: Consolidate wait for tq busy and make its use uniform
rxi_WaitforTQBusy() is now used wherever a wait for the transmit
queue is required. It returns either when the transmit queue is
no longer busy or when the call enters an error state.
Having made this change it is clear that call->currentPacket is
not always validated when the call->lock is reacquired which may be
true when rxi_WaitforTQBusy() is called.
Simon Wilkinson [Sun, 10 Oct 2010 12:04:41 +0000 (08:04 -0400)]
rx: Don't malloc the xmit list
Building the transmit list happens in a time critical section of
code. Using malloc to allocate the list which holds the packets to
be transmitted slows down this critical section. Instead, just
allocate the space as part of the call structure.
Locking of xmitList is somewhat tricksy, as the call->lock is
dropped over calls to sendmsg(). However, the xmitList is protected
by the TQ_BUSY call flag, which prevents multiple threads from
usign the transmit queue, and hence the xmitList, simultaneously.
Andrew Deason [Fri, 8 Oct 2010 16:51:30 +0000 (11:51 -0500)]
RX: Force sane timeout values
Currently we do not check the specified timeout values when someone
changes a connection's dead, idle, or hard dead time. However, if the
conn's dead time is larger than the other two times, a loss of network
activity will result in one of the other timeouts getting triggered
first.
To prevent this and possibly other problems from happening, force a
connection's timeouts to always obey the relationship
secondsUntilDead <= idleDeadTime <= hardDeadTime, by checking these
values whenever they are changed.
Andrew Deason [Wed, 6 Oct 2010 22:24:02 +0000 (17:24 -0500)]
RX: Adjust all timeouts for RTT
Previously only the deadTime RX network timeout was getting adjusted
for the peer's rtt and rtt_dev values. Do this for the idle and hard
timeouts as well, since a higher RTT is going to make everything
potentially take longer.
Tom Keiser [Wed, 13 Oct 2010 06:15:36 +0000 (02:15 -0400)]
update fssync-debug to handle the VOL_LOCKED flag
Allow fssync-debug to dump the VOL_LOCKED flag, rather than the
current behavior of printing absolutely nothing when this flag
is asserted. In addition, increase the flag buffer size since
it turns out we would truncate if all nine flags were asserted
at once.
There doesn't seem to be a need to limit the rx message size when
using rx_WritevAlloc. If there arent enough rx buffers to hold
the entire message at once, it will simply return less space.
Simon Wilkinson [Tue, 5 Oct 2010 20:21:38 +0000 (21:21 +0100)]
rx: Don't call gettimeofday for every packet ack
Every time we receive an ACK packet, we call gettimeofday() for
every entry in the transmit queue that's permanently ack'd by that
packet. Instead, just make a note of the time when we start
processing the packet queue, and use it for every packet in the
queue.
This shaves around 5% off rxperf's runtime with a window size of 128.
Jeffrey Altman [Mon, 11 Oct 2010 19:11:52 +0000 (15:11 -0400)]
Windows: Build hcrypto shared library
Build a single afshcrypto.dll shared library on Windows.
There are no lwp vs pthread differences on Windows due to
the use of the Windows random data sources.
Jeffrey Altman [Mon, 11 Oct 2010 19:00:08 +0000 (15:00 -0400)]
Windows: Cleanup build scripts; no include\afs or include\rx
As part of the build system cleanup, minimize the number of
directories in which include\afs and include\rx paths are included
by default. To acheive this goal the windows openafs dirent.h is
moved from include\afs to include, references whenever possible to
openafs headers included in include\afs or include\rx are prefixed
with afs\ or rx\ as appropriate.
Some source files or directories have a broad range of interdependencies
that make separation quite challenging. For those directories or files
the inclusion of the path is added at the smallest possible level.
At some point in the future the WINNT\afsd\ headers should be moved
from include\afs to include\WINNT and should be installed there first
and then referenced internally from that location instead of from the
WINNT\afsd directory. That will permit further cleanup to be performed.
Jeffrey Altman [Sat, 9 Oct 2010 07:06:07 +0000 (03:06 -0400)]
Windows: Do not issue RXAFS change RPCs on known RO volumes
If the cm_scache_t is known to be on a RO volume, do not permit
RXAFS_xxx RPCs that would attempt to make a change to the volume
to be issued to the file server. Instead, return CM_ERROR_READONLY
immediately. This avoids triggering the abort threshold for
the current connection on the file server.
Simon Wilkinson [Mon, 4 Oct 2010 12:49:16 +0000 (13:49 +0100)]
Unix: Rework build system
Rework the unix build system so that we support taking CFLAGS and
LDFLAGS from the command line, and don't replace them with our own
settings. Also, take the opportunity to bring some sanity and
consistency into our Makefiles.
The standard Makefile.config now defines rules for LWP, pthreaded
and shared library builds. The CFLAGS settings for these are
called LWP_CFLAGS, PTH_CFLAGS and SHD_CFLAGS, respectively.
Similarly named variables are provided for LDFLAGS.
A module may select to use a particular build type for its suffix
rule by including either Makefile.lwp, Makefile.pthread or
Makefile.shared from src/config. This creates an appropriate .c.o
suffix rule, defines AFS_CFLAGS and AFS_LDFLAGS as appropriate, and
creates two rules AFS_CCRULE and AFS_LDRULE, which can be used to
build, and link objects. For example:
foo.o: foo.c
$(AFS_CCRULE) foo.c
foo: foo.o
$(AFS_LDRULE) foo.o
If a you wish to override the CFLAGS or LDFLAGS for an object build
using these rules (or through the .c.o suffix rule) you can do so,
by defining CFLAGS_<object> or LDFLAGS_<object>. For example:
CFLAGS_foo.o= -DDEBUG
LDFLAGS_foo = -ldebugging
A module may also alter the behaviour of the compile and link steps
module wide by defining MODULE_CFLAGS or MODULE_LDFLAGS.
This functionality is now used throughout the tree:
*) Suffix rules are used wherever possible, removing a number of
unecessary build rules.
*) All link steps are replaced with AFS_LDRULE
*) All standard compile steps are replaced with AFS_CCRULE
*) Unusal compile steps are defined, as far as possible, int
terms of the LWP_ PTH_ and SHD_ variables.
*) The use of $? has been removed entirely, as it makes it
impossible to provide build rules with dependency information
Andrew Deason [Thu, 29 Jul 2010 16:06:28 +0000 (11:06 -0500)]
fssync-debug: exec DAFS version if DAFS detected
If the user requests something that differs depending on whether the
server is DAFS or not, try to exec the DAFS-enabled fssync-debug
(dafssync-debug) for them.
Phillip Moore [Thu, 7 Oct 2010 23:25:09 +0000 (19:25 -0400)]
Extract the .version file when building the srpm file
If you are building the source and binary rpms from a released
tarball, instead of a real git repo, the .version file is required by
build-tools/git-version. With out this, the version defaults to
UNKNOWN, and although the source rpm will build, it won't compile.
Phillip Moore [Tue, 5 Oct 2010 16:46:35 +0000 (12:46 -0400)]
Quick Start Guide updated for RHEL rpms, and CLI syntax
The names of the RHEL rpms have been updated to reflect what is
actually published on openafs.org, and the CLI syntax of the commands
run using -noauth have had the unnecessary -cell options removed.
Christof Hanke [Tue, 5 Oct 2010 15:01:41 +0000 (17:01 +0200)]
volserver: Do not return ENOMEM on AIX from XVolListPartitions
When calling "vos partinfo" or "vos listpart" towards a server
running AIX with no partitions attached, it would return a
ENOMEM, because unlike on linux, malloc(0) returns NULL on AIX.
Thus, just don't do any malloc, when we have no partitions anyway.
Simon Wilkinson [Sat, 2 Oct 2010 03:17:56 +0000 (23:17 -0400)]
rx: Reduce dependence on call->lock
This patch reduces our dependence on call->lock, by allowing more
of the reader thread to run lock free. Doing so requires that
call->mode only be set by the reader thread. As a result, call->mode
can only be set to RX_CALL_ERROR by rxi_CallError(). The mode is
set to RX_CALL_ERROR by the reader thread immediately after regaining
the call->lock when it has been dropped.
Simon Wilkinson [Tue, 5 Oct 2010 00:20:32 +0000 (01:20 +0100)]
hcrypto: Fix builds on Irix
The recent hcrypto pullup added a depedence on the errx() function,
which isn't present on Irix. Solve this by pulling in a load more
of libroken, in order to provide this function.
In the long term, libroken should get split out into its own
directory, and the ability to use a previously installed
libroken should be added. For now, this will hopefully get Irix on
its feet again.
Simon Wilkinson [Tue, 5 Oct 2010 08:01:00 +0000 (09:01 +0100)]
Irix: Make compiler less chatty
Supress a few of our errors from the Irix compiler and linker, so its
output is a little less verbose.
This change suppresses the function declared but not used and
multiple declaration errors that we get due to our static_inline fudge
and the paramater declared but not used errors.
Other error suppression is possible - you just need the number
immediately after the 'cc-' in the build logs to say which number to
add to the -woff line.
Simon Wilkinson [Thu, 30 Sep 2010 12:58:26 +0000 (13:58 +0100)]
Kill AFS_64BIT_ENV
The AFS_64BIT_ENV was being pretty much universally defined. So,
remove the definition, and the conditional code that it controlled.
From this point on, all platforms are assumed to be capable of
handling 64bit values.
Simon Wilkinson [Mon, 4 Oct 2010 19:22:50 +0000 (20:22 +0100)]
Heimdal: Fix 32bit build problems
The earlier inclusion of sha512 from Heimdal broke the build on
32bit platforms, because this file doesn't flag 64bit constants as
being such.
The correct place for this fix is in Heimdal, and it will be replaced
with a fix from Heimdal as soon as one is available. For now though,
this gets our build going again.
Andrew Deason [Sun, 3 Oct 2010 23:27:19 +0000 (18:27 -0500)]
vol: Log ignored dirs that look like partitions
If we see a /vicep*-like directory when we VAttachPartitions, and we
ignore it because it lacks an AlwaysAttach, log that we ignored it.
This may make things less confusing to admins that just try to create
a /vicep* directory and don't know why it doesn't work.
Simon Wilkinson [Mon, 4 Oct 2010 11:33:24 +0000 (12:33 +0100)]
configure: Restore saved CFLAGS
When we test for whether the C compiler can take the
-fno-strength-reduce flag, we add the flag to CFLAGS to do so.
However, we were not restoring the old value of this flag when we
completed the test, and so we were always setting -fno-strength-reduce
in the userspace compile.
Previously, this was harmless, as we always overwrote CFLAGS, but if
we're moving to a world where we honour the user's setting of CFLAGS,
we need to not leak changes in this way.
Andrew Deason [Fri, 10 Sep 2010 20:52:34 +0000 (15:52 -0500)]
vos release: Force full dump on RO_DONTUSE sites
When releasing a volume, currently we perform an incremental dump on
RO_DONTUSE sites if the volume exists on the remote site. Since
RO_DONTUSE implies that we do not expect a site to exist there, delete
the extant volume if we find one and force a full dump.
If we perform an incremental dump, we run the risk of incrementally
dumping to a temporary volume that the administrator is not aware of,
and doesn't have any actual data but has a last update time late
enough that it may be missing some data after the incremental dump. So
to avoid that, force a full dump every time.
sin_family isnt sent over the network and therefore doesnt need
htons(). sin_family is essential the same as domain, and no one
does socket(htons(AF_INET), ...)
Windows: Ensure that cm_NameI errors are acted upon promptly
There are many cases in the SMB server where an error from cm_NameI()
was either ignored or not acted upon until several other operations
are performed that could result in the same error being repeated.
This is a mistake which did not have negative side effects until
additional checks for callback status were added recently.
At present, if a CM_ERROR_ACCESS error is returned and ignored,
subsequent attempts to operate on the same cm_scache_t will result
in additional queries to the file server that will also end in an
abort response. This can trigger the file server to delay responses
to the client.
This patchset ensures that all cm_NameI() errors are acted upon
promptly.
Windows: Fix Parent(path) computation to permit mp and symlink creation
The parent path computation was leaving trailing slashes on the
path names which prevented the creation of mount points and symlinks
when UNC paths were used that contained mount points.
Jeffrey Altman [Sat, 2 Oct 2010 03:47:11 +0000 (23:47 -0400)]
Rx: raise rx_minPeerTimeout to 20ms
At 2ms it is possible for the packet we are sending to be resent
just about immediately as the retryTime computation occurs before
the send takes place and not afterwards. If the network send blocks,
the retryTime may have already expired.
We do not want rx_minPeerTimeout to be too large though because the
value will end up being used as the backoff time period when the
actual RTT for the connection is less than the rx_minPeerTimeout
value. Extensive testing shows 20ms to be an adequate compromise
that avoids the vast majority of unnecessary resends without
unnecessarily slowing down the connection if a packet is in fact
lost.
Rx: When call receive is done, send ack all packet
When all of the packets for a call have been received,
immediately send an ack all packet to the peer. This permits
the peer to free the contents of the transmit queue and cancel
all pending resend events.
Rx: protect rx_conn and rx_call refCount field with rx_refcnt_mutex
Add a new global mutex rx_refcnt_mutex to protect the conn->refCount
and call->refCount in place of relying upon the conn->conn_data_lock
and the call->lock.
This will relieve some lock contention with rx_ReceivePacket().
Jeffrey Altman [Sat, 2 Oct 2010 04:49:38 +0000 (00:49 -0400)]
Windows: Pass Volume Root Fid to cm_Analyze after RXAFS_GetVolumeStatus
RXAFS_GetVolumeStatus can return VNOVOL, VMOVED, etc. In order to
process them and update volume state a fid must be passed to cm_Analyze().
Use the volume root fid.
Andrew Deason [Thu, 11 Mar 2010 16:39:56 +0000 (10:39 -0600)]
Provide an abstract work queue object
Add some routines for specifying chunks of work to be done. The idea
is to be able to pass these to different threads, and specify
dependencies between them, wait on them completing, etc.
This adds the afs_wq* family of functions. Originally written by Tom
Keiser.
Andrew Deason [Tue, 14 Sep 2010 14:45:10 +0000 (10:45 -0400)]
DAFS: Raise LogLevel for per-chain vol stats
Only report detailed per-chain volume statistics on shutdown/SIGXCPU
if LogLevel is 125 (or 25 for smaller per-chain stats). If a
fileserver is configured with a large -vhashsize, printing out stats
for each chain can take awhile and use up a nontrivial amount of disk
space for logging, so only print out these stats if we're asked for
them.
configure: --with-linux-kernel-packaging should default to disabled
the test for this build feature is reversed. by default, the value for
with_linux_kernel_packaging will not be defined which makes the existing
test pick MPS='SP' instead of LINUX_WHICH_MODULES. based on the configure
help messages, this would appear to be an opt-in not an opt-out.
...
Optional Packages:
...
--with-linux-kernel-packaging
use standard naming conventions to aid Linux kernel
build packaging (disables MPS, sets the kernel
module name to openafs.ko, and installs kernel
modules into the standard Linux location)
...
Simon Wilkinson [Tue, 28 Sep 2010 23:21:43 +0000 (00:21 +0100)]
rx: Don't have different args for rxi_FreeCall
rxi_FreeCall changes its number of arguments depending on whether
locks are enabled or not. That's a little bit nasty to read, so just
change it so that it always takes two arguments, and ignores the
second when it doesn't need it.
Simon Wilkinson [Tue, 28 Sep 2010 23:11:53 +0000 (00:11 +0100)]
rx: Make statistics interface use Atomics
Make the rx_statistics statistics gathering infrastructure use
atomics for all of its counters. This significantly reduces
lock contention. However, it also (potentially) changes the format
of the rx_stats variable which has been used by callers in the past.
To simplify this process, and to aid with future changes, we remove
direct access to rx_stats. Instead, two additional API functions
rx_GetStatistics and rx_FreeStatistics are provided. These give the
caller access to an 'normal' rx_statistics structure.
Tom Keiser has suggested that we should explore using thread-local
statistics structures, and just aggregating them when we are asked
to report. This is a fine idea, and is equally possible with the
new interface that this patch introduces.
Simon Wilkinson [Tue, 28 Sep 2010 22:48:50 +0000 (23:48 +0100)]
rx: Add rx_NewThreadId function
The fileserver and the fsync server were locking an internal RX
mutex, and incrementing an internal counter in order to obtain fake
pthread thread IDs. Instead of letting them muck around in the
internals of RX, provide an API that can be called to obtain a
ThreadId counter, and use that API throughout the code.
Simon Wilkinson [Tue, 28 Sep 2010 22:37:54 +0000 (23:37 +0100)]
rx: Add atomic operations code
Add support for an atomic type, and atomic operators for RX. This
builds on work which has already been done for Windows, where
InterlockedOperations are used for statistics gathering.
A new opaque type, rx_atomic_t is introduced so that normal arithmetic
operations will fail on atomic data.
An implementation using native atomic methods is provided for Darwin,
Solaris and Windows. A native kernel implementation is used for Linux.
Where OpenAFS is built with a sufficiently modern gcc, gcc's atomic
primitives will be used. Sadly, gcc's builtin operations are not
available for i386, they will only be used with builds the set
-march=i486 (or later).
Otherwise, we fall back to a single mutex which protects all atomic
operations.
Simon Wilkinson [Sun, 26 Sep 2010 14:48:54 +0000 (15:48 +0100)]
RX: Tidy reader data locking
Data which is accessed only by the reader thread doesn't need to be
protected by call->lock
Remove the call->lock protection where it isn't required, which makes
certain read/write calls lock free.
Stop rx_ResetCall from manipulating reader thread data. This data will
be zero'd and cleared when the reader thread calls rx_EndCall, and
doesn't need to be reset by the Listener thread.
The change which made rx_ResetCall reset reader thread information
was originally part of 559ea99b. It caused race conditions that were
fixed by adding additional lock protection in d0cc6e, 4dadd2 and 423ab97e. This commit reverts portions of all of those changes. It
is safe to not clear the iovc in ResetCall because any NewCall must
be balanced by a corresponding EndCall in the reader thread, and
EndCall does the appropriate freeing of reader elements.
Ben Kaduk [Wed, 29 Sep 2010 00:03:25 +0000 (20:03 -0400)]
More FBSD syscall tweaking
We're now properly registered in syscalls.master for HEAD
(i.e. proto-9.0) and RELENG_8 (proto-8.2), which means that
afs3_syscall is prototyped in sys/sysproto.h . Accordingly,
don't declare it in afs_prototypes.h for those cases.
Also add FBSD82_ENV checks for the new syscall-registration code,
and cast afs3_syscall to sy_call_t* for the sysent structure.
Simon Wilkinson [Mon, 27 Sep 2010 22:50:23 +0000 (23:50 +0100)]
rx: Limit window size to max acks
The RX ack packet can only acknowledge 255 packets at once. In the
current implementation, this limits our maximum window size to 255,
as we can't acknowledge any packets we receive outside of that window
size.