Jeffrey Altman [Thu, 28 Oct 2010 04:43:26 +0000 (00:43 -0400)]
vol: Use OSI_NULLSOCKET and not -1 to indicate invalid fssync fd
The FSync file descriptor is an osi_socket which has an invalid
value of OSI_NULLSOCKET which is not necessarily -1. Be sure to
compare against OSI_NULLSOCKET and not -1 when checking an invalid
value.
Jeffrey Altman [Thu, 28 Oct 2010 04:40:32 +0000 (00:40 -0400)]
vol: Always use INVALID_FD to indicate an invalid fd
file descriptors on Windows are not ints and therefore
cannot be safely compared against -1. Always use INVALID_FD
which is -1 on UNIX and INVALID_HANDLE_VALUE on Windows.
Simon Wilkinson [Fri, 29 Oct 2010 11:40:31 +0000 (12:40 +0100)]
shlib-build: Add ignore option
Add an option to shlib-build to ignore missing symbols in the map file.
This is already the default on some platforms, but others (such as
Darwin) require that all symbols in the mapfile be present in the
objects. This is a pain for libraries such as libroken, which will
have different symbols on different platforms.
Specifying -i adds the necessary magic to Darwin's ld to relax this
check. Changes may also be necessary for other platforms, but I
don't currently have those available for testing.
Marc Dionne [Wed, 27 Oct 2010 00:33:41 +0000 (20:33 -0400)]
bucoord: parallel make fix
Fix an instance of a Makefile rule with multiple targets.
This can cause a parallel make to fail when two instances of
compile_et compete to write the same output files.
Spotted by a build failure with a corrupt bc.h header.
Rod Widdowson [Thu, 28 Oct 2010 17:34:41 +0000 (18:34 +0100)]
Windows: fix built in touch
Recent versions of windows add a whole bunch of attributes above
A_ARCH. (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED was what bit be but
encryption of compression would do it).
This makes ~_A_ARCH not a good choice for testing nonwritability
of a file - so files with these new attributes just get silently ignored.
Using an explicit mask is much better. So do that.
Simon Wilkinson [Sun, 24 Oct 2010 10:50:25 +0000 (11:50 +0100)]
Improve commit messages for git imports
Improve the quality of the commit messages produced by git import
by adding an explicit author (obtained from the $module-author file),
and by including a list of all of the upstream changes that are being
imported.
Derrick Brashear [Mon, 18 Oct 2010 03:39:47 +0000 (23:39 -0400)]
down with assert, up with osi_Assert
because NDEBUG breaks things which happen inside an assert,
be done with that. instead, call osi_Assert wherever possible.
doesn't work for code which builds before rx; those cases we handle
by ensuring no operations happen inside the assert(). side effect:
move all pthread operations wrapped in asserts to MUTEX_mumble and
CV_mumble calls where those exist, so the assertions happen all in
one set of macroes.
Simon Wilkinson [Mon, 25 Oct 2010 09:14:12 +0000 (10:14 +0100)]
rx: Don't let timeouts force fast recovery
The current RX implementation goes into fast recovery whenever a
timeout occurs. This is incredibly wasteful, particularly on fast
connections. So, remove this in favour of TCP style behaviour.
Simon Wilkinson [Mon, 25 Oct 2010 19:50:29 +0000 (20:50 +0100)]
rx: Fix starting of transmit timers
The code used to start the transmit timer once for every set of packets
that it sends. However, these packets might be sent individually or in
clumps, with blocking for sendmsg, and on peer->lock, between each set
of packet sends. This has the effect of, even on a very stable network,
producing a high degree of variation in RTTs and timeouts. This is a
particular issue where the connection size is larger, as the number of
packets being sent individually under the one timer grows too.
Fix this by moving timer initialisation to SendList. This already takes
the peer lock, so obtain the timeout value here too. This means that
each jumbo gram, or individual packet (where jumbograms are disabled)
is sent with its own start time, and stabilises RTTs.
Simon Wilkinson [Mon, 25 Oct 2010 08:16:09 +0000 (09:16 +0100)]
rx: Fix resend accounting
rxi_Start flagged itself as 'resending' whenever it flushed the
transmit queue due to a resend event. However, it would flush the
entire transmit queue at this point, rather than only transmitting
packets that require a resend. When running with large window sizes
this results an a large number of packets erroneously being marked
as resent.
Instead, let SendXmitList decide whether a packet is being
retransmitted by using the presence of a serial number. This takes
advantage of the fact that a retransmitted packet must be the only
entry in a packet list - we just flag the packet list, instead of
having to maintain counters for each individual packet.
Ben Kaduk [Sun, 24 Oct 2010 04:29:07 +0000 (00:29 -0400)]
FBSD: band-aid vnode locking in lookup
The lock order requires that we acquire vnode locks from the root
towards the leaf. When looking up "..", this requires that we
unlock the directory before locking the child, otherwise we
are susceptible to deadlock.
This is only a band-aid, as afs_vop_lookup should be rewritten.
Ben Kaduk [Tue, 26 Oct 2010 02:15:49 +0000 (22:15 -0400)]
Fix build on systems with .y.o rules
On systems with system .y.o rules (such as FreeBSD), the system
rule for making error_table.o from error_table.y can bypass
AFS_CCRULE and thus fail to pull in the necessary include paths
for compilation. Present an explicit dependency on error_table.c
to force that file to be generated, and then our .c.o rule gets
used as desired.
Many conditionals involving osi_fsplock were changed to depend
on AFS_PRIVATE_OSI_ALLOCSPACES instead of constants or other
things (like AFS_FBSD_ENV). The condition on the initializaion
in afs_init was changed but not the declaration in afs_prototypes.h,
breaking the build on FBSD.
Use the same conditional in afs_prototypes.h, fixing the FBSD build.
Asanka C. Herath [Mon, 18 Oct 2010 08:52:34 +0000 (04:52 -0400)]
Windows: Backup and restore configuration across installs
The MSI installer for OpenAFS does not preserve configuration data
across installs. This patch creates a backup of specific
configuration registry values when uninstalling OpenAFS and uses this
backup when subsequently installing OpenAFS.
Simon Wilkinson [Sat, 23 Oct 2010 14:07:42 +0000 (15:07 +0100)]
rx: Tidy up variables in RTT calc
We used to do rttp = &thisRtt, and then use rttp and thisRtt to
interchangably refer to the same data. This is just confusing, and
unnecessary. Replace all of the occurences of rttp with &thisRtt.
Take the opportunity to use the Clock_IsZero macro rather than doing
an explicit zero clock check.
Simon Wilkinson [Sat, 23 Oct 2010 13:51:56 +0000 (14:51 +0100)]
rx: More improvments to RTT calculation
Move the decision about whether a packet contributes to the peer's
rount trip time into the CalculateRoundTripTime function, and improve
the criteria used.
Previously, we only computed the RTT if we had not retransmitted. This
is bad, because it means that places where we have backed off in order
to retransmit never actually lengthen the RTT, and so the RTT is kept
artificially low, and we see a large number of retransmits. Instead,
use the serial of the ACK packet to determine which transmission is
being acknowledged, and if it is the first, or the last, transmission
use the appropriate sent time to calculate the RTT.
If we have no serial in the ACK (for a delayed ack, for example), or
if the serial doesn't match (where a single acknowledgement is soft
acking a number of packets), fall back to only using the ack if the
packet has not be retransmitted.
Also, avoid multiple counting of packets which have arrived as part
of a jumbogram by only permitting the last packet in a jumbogram to
contribute to the RTT. This avoids giving the RTT of jumbograms more
weight than those of normal packets - doing so would pull down the
RTT, as it in effect favours packets which have not be retransmitted.
Jeffrey Altman [Thu, 21 Oct 2010 18:13:03 +0000 (14:13 -0400)]
Rx: Treat rx_minPeerTimeout not as a minimum but as padding
An improved RTT and timeout calculation algorithm is being
developed but until we have it, treat rx_minPeerTimeout not as
a minimum value for the timeout but as padding to be added to
the measured RTT when computing the peer timeout value.
With this change rx does not begin to send large numbers of
resends when the RTT begins to exceed the rx_minPeerTimeout
value. Timeout triggered resends at the moment can force rx
into fast recovery mode which in turn kills performance. It
is better to avoid that problem for now.
Jeffrey Altman [Thu, 21 Oct 2010 18:23:18 +0000 (14:23 -0400)]
Rx: Fix socket() handling so errors are properly detected
socket() returns an osi_socket which on Windows is an
unsigned type (HANDLE). Therefore, tests of osi_socket < 0
will never identify when the INVALID_SOCKET value is returned.
On Windows, the OSI_NULLSOCKET is assigned to INVALID_SOCKET.
Replace all comparisons of (osi_socket < 0) with
(osi_socket == OSI_NULLSOCKET) as a means of detecting errors.
In addition, do not pass socket() the protocol value 0 when
IPPROTO_UDP is what is desired.
Finally, perror() on Windows never reports any error from Winsock.
perror() is a CRT function. To get the real socket error
WSAGetLastError() must be called and its value be written to
stderr.
Jeffrey Altman [Sun, 17 Oct 2010 04:35:36 +0000 (00:35 -0400)]
Windows: Use rx_Readv / rx_Writev
When USE_RX_IOVEC is defined, cm_BufWrite() will utilize rx_Writev()
instead of rx_Write() and cm_GetBuffer() will use rx_Readv() instead
of rx_Read() to improve throughput.
maxDgramPackets is initially assigned this value after correcting
for the wire endian. This compare is harmless on little endian
since the network endian value will typically be huge and redundant
on big endian machines.
Allow private implementations of osi_AllocSmall/LargeSpace
NBSD seemed to already do this at one point but was partly disabled.
This patches generalizes this feature by adding a define to disable the
standard pool macros. Linux's slab based allocator should out perform
this single threaded allocator/pool.
Simon Wilkinson [Mon, 11 Oct 2010 17:25:38 +0000 (13:25 -0400)]
rx: Simplify round trip time calculation
Move the logic for deciding whether to compute RTT out of PeerNetStats
and into the callers. This means that we can share decisions about
whether a packet is ACK'd or not, and avoid uneccessary multiple tests
and function calls.
This change also stops us from computing RTT times for packets outside
of the set of explicit ACKs that we have received. This means that we
no longer compute RTTs for packets that are on the transmit queue, but
not yet on the wire.
Jeffrey Altman [Sat, 16 Oct 2010 17:14:03 +0000 (13:14 -0400)]
Rx: Do not compute RTT on non-last packets of a jumbogram
A jumbogram is constructed as a series of rx packets that are
all sent at once and acknowledged at the same time. Computing the
RTT for all of the packets that makes up the jumbogram provides
the jumbogram RTT more weight than for a non-jumbogram packet.
To restore fairness, only compute the RTT for the last packet of
a jumbogram. The non-last packets with have the RX_JUMBO_PACKET flag
set in the packet header.
Simon Wilkinson [Mon, 11 Oct 2010 17:14:02 +0000 (13:14 -0400)]
Rx: Reject out of order ACK packets
Our RX implementation virtually guarantees that we will see out of
order ACK packets, even on well behaved networks, as we send acks
simultaneously from multiple threads.
Currently we only reject out-of-order ACKS which change the window
position (so a window that advances, can never go back). However,
we fail to deal with the explicit acknowledgement portion of the ACK
packet in the same way...
For example, if we have a packet A that acknowledges packets 1 and 2,
and then a packet B acknowledging 1,2,3 and 4. If B arrives before A,
then we mark 1, 2, 3, 4 as acknowledged, and then treat the arrival of
A as nAcking 3 and 4. This has the same effect as an explicitly stated
nack, triggers an early and unnecessary resend and may, in some situations,
cause the call to go into congestion avoidance.
We can solve this using the previousPacket field of the ACK. This
indicates the last packet seen by the peer. In the same way as
firstPacket, this should never go backwards, and so can be used to
detect out of order acknowledgements, and reject them.
Andrew Deason [Fri, 15 Oct 2010 21:35:32 +0000 (16:35 -0500)]
pts: Specifically check for group id 0
For consistency with the code checking user ids in createuser, check
for a specified group id of 0 specifically and give a slightly
different error message for it.
Russ Allbery [Thu, 14 Oct 2010 20:41:45 +0000 (13:41 -0700)]
Return SRV record ports in network byte order
Convert the port extracted from the SRV record return to network byte
order before assigning it to the port array.
The port in a SRV record is extracted by pulling out the high byte
and low byte and then mathematically combining them, which implicity
converts from network byte order to host byte order. However, the
callers of afsconf_LookupServer expect the port array to be returned
in network byte order since ports are assigned without modification
to the .sin_port field of a struct sockaddr_in. See also the byte
order of the default afsdbPort value.
Reported by Jan Christoph Nordholz (Debian Bug#600228).
Derrick Brashear [Fri, 15 Oct 2010 15:28:34 +0000 (11:28 -0400)]
add objc build rules to make-type makefiles
sadly this needs to be here unless we want os-specific includes
of e.g. shared, lwp, pthreads makefiles for extra rules. as long
as no .m files are built in generic makefiles, this is a reasonable
approach.
Jeffrey Altman [Thu, 14 Oct 2010 21:24:33 +0000 (17:24 -0400)]
Rx: use osi_Assert/osi_Panic instead of assert
Avoid using the openafs src/util/assert.h implementation for Rx
and Rx security classes. Use the built-in osi_Assert() and osi_Panic()
functionality instead. This avoids all references to assert.h except
for rx_pthread.c (Unix only) which requires it for the assert()
references in the src/util/pthread_nosigs.h macros.
Jeffrey Altman [Thu, 14 Oct 2010 22:18:40 +0000 (18:18 -0400)]
Util: include assert.h in pthreads_nosig.h when required
If assert() will be used within pthreads_nosig.h, include assert.h.
Also, permit assert() to be a macro that is a no-op by always evaluating
the expression.
Marc Dionne [Thu, 14 Oct 2010 22:45:32 +0000 (18:45 -0400)]
LINUX/osi_vnodeops.c: minor coding style fixes
Re-indent and correct a few coding style issues in this section
of code. In particular, it clears up possible confusion on the
scope of the preceding if statement.
Purely cosmetic, no functional changes.
Change-Id: Id6dea6326c9878b41f821de00267f75195fea394
Reviewed-on: http://gerrit.openafs.org/2989 Reviewed-by: Matt Benjamin <matt@linuxbox.com> Reviewed-by: Jeffrey Altman <jaltman@openafs.org> Tested-by: Jeffrey Altman <jaltman@openafs.org>
Andrew Deason [Thu, 11 Mar 2010 18:19:47 +0000 (12:19 -0600)]
Parallel I/O extensions to namei backend
This adds the ability for certain namei operations (currently only
ListViceInodes) to occur across multiple different threads in
parallel. Currently this is only enabled when built with the
not-yet-existant AFS_SALSRV_ENV.
Marc Dionne [Wed, 13 Oct 2010 23:11:25 +0000 (19:11 -0400)]
Linux: fix statfs configure test
The change to the statfs configure test that was made for 2.6.36
broke the test for older kernels. The new test is based on a call,
and that will generate a warning but not an error when the arguments
don't match the prototype.
Take another tack, and revert to the old style test, but with the
simple_statfs function instead of vfs_statfs.
Tom Keiser [Wed, 13 Oct 2010 05:10:09 +0000 (01:10 -0400)]
don't release Volume lightweight ref too early
FSYNC_com_VolOff was releasing its lightweight ref before the error handling
code for VGetVolumeByVp_r was executed; this code needs to dereference the
Volume pointer for some of its logic. This was unsafe since
VCancelReservation_r() could have resulted in the Volume object being freed.
Move VCancelReservation_r() below the error handling block. NB: the error
handling block now relies upon the goto done/deny to cancel its lightweight
ref.
Marc Dionne [Wed, 13 Oct 2010 01:05:45 +0000 (21:05 -0400)]
Linux: fix aklog -setpag to work with ktc_SetTokenEx
The bit of code that allows aklog -setpag to work with recent
linux needed to be moved along with the change from ktc_SetToken
to ktc_SetTokenEx.
While we're in this bit of code, make it depend on the definition
of the syscall in the user space headers instead of relying on a
kernel configure test.
Simon Wilkinson [Mon, 11 Oct 2010 18:08:00 +0000 (14:08 -0400)]
rx: Don't count unknown packets as missing
Just because a packet is in the transmit queue, don't assume that
the other side has instantly seen it! Currently, if we receive an
ACK packet which doesn't include the entire transmit queue, then we
will end backing off, even if we haven't sent the packets.
Restrict this behaviour to packets which are implicitly acked (or
otherwise) by the sender.
Jeffrey Altman [Tue, 12 Oct 2010 14:53:43 +0000 (10:53 -0400)]
Rx: Consolidate wait for tq busy and make its use uniform
rxi_WaitforTQBusy() is now used wherever a wait for the transmit
queue is required. It returns either when the transmit queue is
no longer busy or when the call enters an error state.
Having made this change it is clear that call->currentPacket is
not always validated when the call->lock is reacquired which may be
true when rxi_WaitforTQBusy() is called.
Simon Wilkinson [Sun, 10 Oct 2010 12:04:41 +0000 (08:04 -0400)]
rx: Don't malloc the xmit list
Building the transmit list happens in a time critical section of
code. Using malloc to allocate the list which holds the packets to
be transmitted slows down this critical section. Instead, just
allocate the space as part of the call structure.
Locking of xmitList is somewhat tricksy, as the call->lock is
dropped over calls to sendmsg(). However, the xmitList is protected
by the TQ_BUSY call flag, which prevents multiple threads from
usign the transmit queue, and hence the xmitList, simultaneously.
Andrew Deason [Fri, 8 Oct 2010 16:51:30 +0000 (11:51 -0500)]
RX: Force sane timeout values
Currently we do not check the specified timeout values when someone
changes a connection's dead, idle, or hard dead time. However, if the
conn's dead time is larger than the other two times, a loss of network
activity will result in one of the other timeouts getting triggered
first.
To prevent this and possibly other problems from happening, force a
connection's timeouts to always obey the relationship
secondsUntilDead <= idleDeadTime <= hardDeadTime, by checking these
values whenever they are changed.
Andrew Deason [Wed, 6 Oct 2010 22:24:02 +0000 (17:24 -0500)]
RX: Adjust all timeouts for RTT
Previously only the deadTime RX network timeout was getting adjusted
for the peer's rtt and rtt_dev values. Do this for the idle and hard
timeouts as well, since a higher RTT is going to make everything
potentially take longer.
Tom Keiser [Wed, 13 Oct 2010 06:15:36 +0000 (02:15 -0400)]
update fssync-debug to handle the VOL_LOCKED flag
Allow fssync-debug to dump the VOL_LOCKED flag, rather than the
current behavior of printing absolutely nothing when this flag
is asserted. In addition, increase the flag buffer size since
it turns out we would truncate if all nine flags were asserted
at once.
There doesn't seem to be a need to limit the rx message size when
using rx_WritevAlloc. If there arent enough rx buffers to hold
the entire message at once, it will simply return less space.
Simon Wilkinson [Tue, 5 Oct 2010 20:21:38 +0000 (21:21 +0100)]
rx: Don't call gettimeofday for every packet ack
Every time we receive an ACK packet, we call gettimeofday() for
every entry in the transmit queue that's permanently ack'd by that
packet. Instead, just make a note of the time when we start
processing the packet queue, and use it for every packet in the
queue.
This shaves around 5% off rxperf's runtime with a window size of 128.
Jeffrey Altman [Mon, 11 Oct 2010 19:11:52 +0000 (15:11 -0400)]
Windows: Build hcrypto shared library
Build a single afshcrypto.dll shared library on Windows.
There are no lwp vs pthread differences on Windows due to
the use of the Windows random data sources.
Jeffrey Altman [Mon, 11 Oct 2010 19:00:08 +0000 (15:00 -0400)]
Windows: Cleanup build scripts; no include\afs or include\rx
As part of the build system cleanup, minimize the number of
directories in which include\afs and include\rx paths are included
by default. To acheive this goal the windows openafs dirent.h is
moved from include\afs to include, references whenever possible to
openafs headers included in include\afs or include\rx are prefixed
with afs\ or rx\ as appropriate.
Some source files or directories have a broad range of interdependencies
that make separation quite challenging. For those directories or files
the inclusion of the path is added at the smallest possible level.
At some point in the future the WINNT\afsd\ headers should be moved
from include\afs to include\WINNT and should be installed there first
and then referenced internally from that location instead of from the
WINNT\afsd directory. That will permit further cleanup to be performed.
Jeffrey Altman [Sat, 9 Oct 2010 07:06:07 +0000 (03:06 -0400)]
Windows: Do not issue RXAFS change RPCs on known RO volumes
If the cm_scache_t is known to be on a RO volume, do not permit
RXAFS_xxx RPCs that would attempt to make a change to the volume
to be issued to the file server. Instead, return CM_ERROR_READONLY
immediately. This avoids triggering the abort threshold for
the current connection on the file server.
Simon Wilkinson [Mon, 4 Oct 2010 12:49:16 +0000 (13:49 +0100)]
Unix: Rework build system
Rework the unix build system so that we support taking CFLAGS and
LDFLAGS from the command line, and don't replace them with our own
settings. Also, take the opportunity to bring some sanity and
consistency into our Makefiles.
The standard Makefile.config now defines rules for LWP, pthreaded
and shared library builds. The CFLAGS settings for these are
called LWP_CFLAGS, PTH_CFLAGS and SHD_CFLAGS, respectively.
Similarly named variables are provided for LDFLAGS.
A module may select to use a particular build type for its suffix
rule by including either Makefile.lwp, Makefile.pthread or
Makefile.shared from src/config. This creates an appropriate .c.o
suffix rule, defines AFS_CFLAGS and AFS_LDFLAGS as appropriate, and
creates two rules AFS_CCRULE and AFS_LDRULE, which can be used to
build, and link objects. For example:
foo.o: foo.c
$(AFS_CCRULE) foo.c
foo: foo.o
$(AFS_LDRULE) foo.o
If a you wish to override the CFLAGS or LDFLAGS for an object build
using these rules (or through the .c.o suffix rule) you can do so,
by defining CFLAGS_<object> or LDFLAGS_<object>. For example:
CFLAGS_foo.o= -DDEBUG
LDFLAGS_foo = -ldebugging
A module may also alter the behaviour of the compile and link steps
module wide by defining MODULE_CFLAGS or MODULE_LDFLAGS.
This functionality is now used throughout the tree:
*) Suffix rules are used wherever possible, removing a number of
unecessary build rules.
*) All link steps are replaced with AFS_LDRULE
*) All standard compile steps are replaced with AFS_CCRULE
*) Unusal compile steps are defined, as far as possible, int
terms of the LWP_ PTH_ and SHD_ variables.
*) The use of $? has been removed entirely, as it makes it
impossible to provide build rules with dependency information
Andrew Deason [Thu, 29 Jul 2010 16:06:28 +0000 (11:06 -0500)]
fssync-debug: exec DAFS version if DAFS detected
If the user requests something that differs depending on whether the
server is DAFS or not, try to exec the DAFS-enabled fssync-debug
(dafssync-debug) for them.
Phillip Moore [Thu, 7 Oct 2010 23:25:09 +0000 (19:25 -0400)]
Extract the .version file when building the srpm file
If you are building the source and binary rpms from a released
tarball, instead of a real git repo, the .version file is required by
build-tools/git-version. With out this, the version defaults to
UNKNOWN, and although the source rpm will build, it won't compile.