Hartmut Reuter [Tue, 2 Nov 2010 11:15:42 +0000 (12:15 +0100)]
Make osi_fetchstore.c protocol independent
For future use of OSD and vicep-access osi_fetchstore.c should not depend on
the rx-fileserver-protocol but call instead the routines pointed to by ops.
Some code beautyfication in afs_fetchstore.c to use nBytes instead of code.
New global variable afs_protocols in afs_fetchstore.c which will be used
in RXOSD/VICEP-ACCESS programs in the future.
as discovered by Benjamin Kaduk, we were usually holding rx_refcnt_mutex
but briefly, and here we held it longer, and thus around acquiring freepktQ
mutex. undo it by simply setting STATE_RESET sooner as newcall does.
Jeffrey Altman [Tue, 2 Nov 2010 20:16:20 +0000 (16:16 -0400)]
Windows: Do not leak cm_volume_t objects from the LRU queue
During cm_volume_t object recycling the object is removed
from the LRU to ensure that a single object is not recycled
by multiple threads at the same time. Before cm_FindVolumeByName()
exits the object must be re-inserted into the LRU if it is not
present.
Jeffrey Altman [Thu, 28 Oct 2010 04:37:03 +0000 (00:37 -0400)]
vol: attach2 must always return with VOL_LOCK held
attach2() is required to return with the VOL_LOCK held
even though it is called without it. This must be true
for error conditions as well. Not all error paths are
obtaining the VOL_LOCK before returning. Add out paths for
lock held and lock unheld error cases.
Andrew Deason [Fri, 2 Jul 2010 21:57:42 +0000 (16:57 -0500)]
DAFS: Fix demand-salvages of attached volumes
Currently, when an error is encountered for an attached volume, we
call VRequestSalvage_r, which makes the volume go into the
VOL_STATE_SALVAGING state. This state implies that the volume is
offline, however, which is not necessarily the case if we're calling
VRequestSalvage_r from, for example, VAllocVnode_r or VUpdateVolume_r.
So now, make a new state called VOL_STATE_SALVAGE_REQ to indicate when
a salvage has been requested but the volume is not offline yet (and
thus is not yet ready to give to the salvager). If VCheckSalvage finds
a volume in this state, it offlines the volume first. The FSSYNC
VOL_OFF handler now checks for this state, and if we're giving the
volume to the salvager, we wait for the volume to exit that state.
VRequestSalvage_r also gains a new flag, VOL_SALVAGE_NO_OFFLINE. This
is to ensure that the existing salvaging code paths for unattached
volumes does not change (for when VRequesetSalvage_r is called from
attach2). If this flag is passed, we do what we used to do, which is
just salvage the volume without offlining it.
Andrew Deason [Mon, 1 Nov 2010 20:34:26 +0000 (15:34 -0500)]
Cleanup VOffline log message for non-DAFS
Commit fd592c7674d4aa44dda90998b54d7b56947f6ed8 fixed the 'Volume X
(Y) is now offline' message for DAFS, but the same problem persists
for non-DAFS. Fix the non-DAFS case.
Simon Wilkinson [Fri, 29 Oct 2010 11:10:16 +0000 (12:10 +0100)]
Add libroken as its own library
Include libroken as a library in its own right, so that the whole
of the code can benefit from it. This change purely adds libroken
for the Unix build system. It doesn't replace those pieces of
libroken in hcrypto or util, or enable it for Windows.
There is also the option of using a system-install libroken, if one is
found at configure time.
*) If --with-libroken=yes, or is not supplied than a system library
will be used if suitable. Otherwise, we'll use the internal
libroken
*) If --with-libroken=/path/to/installation then the libroken at
that path will be used. If there is no libroken there, or it
is not suitable, an error will be returned
*) If --with-libroken=internal then the internal libroken is used,
regardless of what is present on the system.
We deliberately do not provide installed headers for the internal
libroken. If other applications wish to make use of libroken, then
they should use the Heimdal one, rather than piggybacking on ours.
Phillip Moore [Tue, 19 Oct 2010 16:17:20 +0000 (12:17 -0400)]
Fix fs bypassthreshold to accept a size of -1 to disable
The fs bypassthreshold command assumes a value of -1 means the feature
is disabled, but the CLI refused to accept this argument, since it is
not strictly a digit (according ti isdigit()). This patch accepts the
string -1, and makes it possible to both enable AND disable this
feature.
Phillip Moore [Tue, 19 Oct 2010 15:31:47 +0000 (11:31 -0400)]
fs getfid defaults to '.', like other path-related commands
I noticed that all of the other commands that accept a list of paths
use the SetDotDefault() function to default to ".", when no arguments
are given. This patch adds that call to getfid, making it more
consistent with similar commands.
Phillip Moore [Tue, 19 Oct 2010 15:23:46 +0000 (11:23 -0400)]
fs getfid output changed for consistency with Windows implementation
This patch removes the redundant volume ID from the output of fs
getfid, and replaces it with the cell name, which is what the Windows
implementation provides.
Phillip Moore [Tue, 19 Oct 2010 12:24:41 +0000 (08:24 -0400)]
Makes fs getfid error handling consistent with other fs commands
This patch makes the fs getfid command print errors for paths that
can't be handled correctly, instead of quietly ignoring them, and it
also returns an error code if any such paths are encountered. This
makes the behavior consistent with other fs commands, such as
listquota, whereis, etc.
Ben Kaduk [Fri, 29 Oct 2010 07:18:02 +0000 (03:18 -0400)]
FBSD: correct and simplify vcache eviction routines
osi_VM_FlushVCache and osi_TryEvictVCache were both attempting
to be wrappers around vgone(), with some checks before hand.
Implement the latter in terms of the former to prevent
code duplication and propagation of incorrect code.
Additionally, correct the locking around vgone(). The
vnode lock must be held, and we must also increase the vnode's
hold count so that it does not disappear out from under us.
As we need the interlock to check the usecount, keep it
locked until we lock the vnode lock, for extra protection.
As an added bonus, we no longer try to call vgonel(), which
is not an exported symbol and merely happened to work due
to the current kernel linker implementation.
Remove some stale comments.
With this change, a parallel buildworld completes on
my four-core machine.
Ben Kaduk [Fri, 29 Oct 2010 16:01:04 +0000 (12:01 -0400)]
FBSD: lock interlock around v_usecount accesses
The FreeBSD vnode locking strategy requires that the vnode
interlock be held for all accesses to v_usecount, such as those
used by our VREFCOUNT and VREFCOUNT_GT macros. Conveniently,
a wrapper function is provided that takes the lock around its
access of the element, vrefcnt(). Use it for our macros.
afs_osi_Alloc_NoSleep() is no longer used by the SOLARIS or IRIX
clients. It is used by the *BSD code in rx, so just let those
platforms define/prototype it in their osi_machdep.h
Rod Widdowson [Fri, 29 Oct 2010 17:01:13 +0000 (18:01 +0100)]
windows: terminate multi_sz correctly
CreateProcess requires a null-terminated list of null-terminated strings
as an environment parameter.
A missing level of indirection was causing the final null to be
missed, meaning that if bosserver ran from somewhere which had an
environment the create process would fail.
Jeffrey Altman [Thu, 28 Oct 2010 05:19:17 +0000 (01:19 -0400)]
Windows: Finish converting vol apps to pthread only
The src/vol directory on Windows is one of the rare examples
where a single directory builds both lwp and pthreaded versions
of libraries and executables. With this patchset the executables
are fully converted from lwp to pthread. This requires that
afsrpc.dll include the pthread implementations of the threadname,
fasttime, and lock implementations from the LWP directory.
The inclusion within afsrpc.dll permits the dviced and
dvolser directories to avoid rebuilding those object modules.
Jeffrey Altman [Thu, 28 Oct 2010 04:43:26 +0000 (00:43 -0400)]
vol: Use OSI_NULLSOCKET and not -1 to indicate invalid fssync fd
The FSync file descriptor is an osi_socket which has an invalid
value of OSI_NULLSOCKET which is not necessarily -1. Be sure to
compare against OSI_NULLSOCKET and not -1 when checking an invalid
value.
Jeffrey Altman [Thu, 28 Oct 2010 04:40:32 +0000 (00:40 -0400)]
vol: Always use INVALID_FD to indicate an invalid fd
file descriptors on Windows are not ints and therefore
cannot be safely compared against -1. Always use INVALID_FD
which is -1 on UNIX and INVALID_HANDLE_VALUE on Windows.
Simon Wilkinson [Fri, 29 Oct 2010 11:40:31 +0000 (12:40 +0100)]
shlib-build: Add ignore option
Add an option to shlib-build to ignore missing symbols in the map file.
This is already the default on some platforms, but others (such as
Darwin) require that all symbols in the mapfile be present in the
objects. This is a pain for libraries such as libroken, which will
have different symbols on different platforms.
Specifying -i adds the necessary magic to Darwin's ld to relax this
check. Changes may also be necessary for other platforms, but I
don't currently have those available for testing.
Marc Dionne [Wed, 27 Oct 2010 00:33:41 +0000 (20:33 -0400)]
bucoord: parallel make fix
Fix an instance of a Makefile rule with multiple targets.
This can cause a parallel make to fail when two instances of
compile_et compete to write the same output files.
Spotted by a build failure with a corrupt bc.h header.
Rod Widdowson [Thu, 28 Oct 2010 17:34:41 +0000 (18:34 +0100)]
Windows: fix built in touch
Recent versions of windows add a whole bunch of attributes above
A_ARCH. (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED was what bit be but
encryption of compression would do it).
This makes ~_A_ARCH not a good choice for testing nonwritability
of a file - so files with these new attributes just get silently ignored.
Using an explicit mask is much better. So do that.
Simon Wilkinson [Sun, 24 Oct 2010 10:50:25 +0000 (11:50 +0100)]
Improve commit messages for git imports
Improve the quality of the commit messages produced by git import
by adding an explicit author (obtained from the $module-author file),
and by including a list of all of the upstream changes that are being
imported.
Derrick Brashear [Mon, 18 Oct 2010 03:39:47 +0000 (23:39 -0400)]
down with assert, up with osi_Assert
because NDEBUG breaks things which happen inside an assert,
be done with that. instead, call osi_Assert wherever possible.
doesn't work for code which builds before rx; those cases we handle
by ensuring no operations happen inside the assert(). side effect:
move all pthread operations wrapped in asserts to MUTEX_mumble and
CV_mumble calls where those exist, so the assertions happen all in
one set of macroes.
Simon Wilkinson [Mon, 25 Oct 2010 09:14:12 +0000 (10:14 +0100)]
rx: Don't let timeouts force fast recovery
The current RX implementation goes into fast recovery whenever a
timeout occurs. This is incredibly wasteful, particularly on fast
connections. So, remove this in favour of TCP style behaviour.
Simon Wilkinson [Mon, 25 Oct 2010 19:50:29 +0000 (20:50 +0100)]
rx: Fix starting of transmit timers
The code used to start the transmit timer once for every set of packets
that it sends. However, these packets might be sent individually or in
clumps, with blocking for sendmsg, and on peer->lock, between each set
of packet sends. This has the effect of, even on a very stable network,
producing a high degree of variation in RTTs and timeouts. This is a
particular issue where the connection size is larger, as the number of
packets being sent individually under the one timer grows too.
Fix this by moving timer initialisation to SendList. This already takes
the peer lock, so obtain the timeout value here too. This means that
each jumbo gram, or individual packet (where jumbograms are disabled)
is sent with its own start time, and stabilises RTTs.
Simon Wilkinson [Mon, 25 Oct 2010 08:16:09 +0000 (09:16 +0100)]
rx: Fix resend accounting
rxi_Start flagged itself as 'resending' whenever it flushed the
transmit queue due to a resend event. However, it would flush the
entire transmit queue at this point, rather than only transmitting
packets that require a resend. When running with large window sizes
this results an a large number of packets erroneously being marked
as resent.
Instead, let SendXmitList decide whether a packet is being
retransmitted by using the presence of a serial number. This takes
advantage of the fact that a retransmitted packet must be the only
entry in a packet list - we just flag the packet list, instead of
having to maintain counters for each individual packet.
Ben Kaduk [Sun, 24 Oct 2010 04:29:07 +0000 (00:29 -0400)]
FBSD: band-aid vnode locking in lookup
The lock order requires that we acquire vnode locks from the root
towards the leaf. When looking up "..", this requires that we
unlock the directory before locking the child, otherwise we
are susceptible to deadlock.
This is only a band-aid, as afs_vop_lookup should be rewritten.
Ben Kaduk [Tue, 26 Oct 2010 02:15:49 +0000 (22:15 -0400)]
Fix build on systems with .y.o rules
On systems with system .y.o rules (such as FreeBSD), the system
rule for making error_table.o from error_table.y can bypass
AFS_CCRULE and thus fail to pull in the necessary include paths
for compilation. Present an explicit dependency on error_table.c
to force that file to be generated, and then our .c.o rule gets
used as desired.
Many conditionals involving osi_fsplock were changed to depend
on AFS_PRIVATE_OSI_ALLOCSPACES instead of constants or other
things (like AFS_FBSD_ENV). The condition on the initializaion
in afs_init was changed but not the declaration in afs_prototypes.h,
breaking the build on FBSD.
Use the same conditional in afs_prototypes.h, fixing the FBSD build.
Asanka C. Herath [Mon, 18 Oct 2010 08:52:34 +0000 (04:52 -0400)]
Windows: Backup and restore configuration across installs
The MSI installer for OpenAFS does not preserve configuration data
across installs. This patch creates a backup of specific
configuration registry values when uninstalling OpenAFS and uses this
backup when subsequently installing OpenAFS.
Simon Wilkinson [Sat, 23 Oct 2010 14:07:42 +0000 (15:07 +0100)]
rx: Tidy up variables in RTT calc
We used to do rttp = &thisRtt, and then use rttp and thisRtt to
interchangably refer to the same data. This is just confusing, and
unnecessary. Replace all of the occurences of rttp with &thisRtt.
Take the opportunity to use the Clock_IsZero macro rather than doing
an explicit zero clock check.
Simon Wilkinson [Sat, 23 Oct 2010 13:51:56 +0000 (14:51 +0100)]
rx: More improvments to RTT calculation
Move the decision about whether a packet contributes to the peer's
rount trip time into the CalculateRoundTripTime function, and improve
the criteria used.
Previously, we only computed the RTT if we had not retransmitted. This
is bad, because it means that places where we have backed off in order
to retransmit never actually lengthen the RTT, and so the RTT is kept
artificially low, and we see a large number of retransmits. Instead,
use the serial of the ACK packet to determine which transmission is
being acknowledged, and if it is the first, or the last, transmission
use the appropriate sent time to calculate the RTT.
If we have no serial in the ACK (for a delayed ack, for example), or
if the serial doesn't match (where a single acknowledgement is soft
acking a number of packets), fall back to only using the ack if the
packet has not be retransmitted.
Also, avoid multiple counting of packets which have arrived as part
of a jumbogram by only permitting the last packet in a jumbogram to
contribute to the RTT. This avoids giving the RTT of jumbograms more
weight than those of normal packets - doing so would pull down the
RTT, as it in effect favours packets which have not be retransmitted.
Jeffrey Altman [Thu, 21 Oct 2010 18:13:03 +0000 (14:13 -0400)]
Rx: Treat rx_minPeerTimeout not as a minimum but as padding
An improved RTT and timeout calculation algorithm is being
developed but until we have it, treat rx_minPeerTimeout not as
a minimum value for the timeout but as padding to be added to
the measured RTT when computing the peer timeout value.
With this change rx does not begin to send large numbers of
resends when the RTT begins to exceed the rx_minPeerTimeout
value. Timeout triggered resends at the moment can force rx
into fast recovery mode which in turn kills performance. It
is better to avoid that problem for now.
Jeffrey Altman [Thu, 21 Oct 2010 18:23:18 +0000 (14:23 -0400)]
Rx: Fix socket() handling so errors are properly detected
socket() returns an osi_socket which on Windows is an
unsigned type (HANDLE). Therefore, tests of osi_socket < 0
will never identify when the INVALID_SOCKET value is returned.
On Windows, the OSI_NULLSOCKET is assigned to INVALID_SOCKET.
Replace all comparisons of (osi_socket < 0) with
(osi_socket == OSI_NULLSOCKET) as a means of detecting errors.
In addition, do not pass socket() the protocol value 0 when
IPPROTO_UDP is what is desired.
Finally, perror() on Windows never reports any error from Winsock.
perror() is a CRT function. To get the real socket error
WSAGetLastError() must be called and its value be written to
stderr.
Jeffrey Altman [Sun, 17 Oct 2010 04:35:36 +0000 (00:35 -0400)]
Windows: Use rx_Readv / rx_Writev
When USE_RX_IOVEC is defined, cm_BufWrite() will utilize rx_Writev()
instead of rx_Write() and cm_GetBuffer() will use rx_Readv() instead
of rx_Read() to improve throughput.
maxDgramPackets is initially assigned this value after correcting
for the wire endian. This compare is harmless on little endian
since the network endian value will typically be huge and redundant
on big endian machines.
Allow private implementations of osi_AllocSmall/LargeSpace
NBSD seemed to already do this at one point but was partly disabled.
This patches generalizes this feature by adding a define to disable the
standard pool macros. Linux's slab based allocator should out perform
this single threaded allocator/pool.
Simon Wilkinson [Mon, 11 Oct 2010 17:25:38 +0000 (13:25 -0400)]
rx: Simplify round trip time calculation
Move the logic for deciding whether to compute RTT out of PeerNetStats
and into the callers. This means that we can share decisions about
whether a packet is ACK'd or not, and avoid uneccessary multiple tests
and function calls.
This change also stops us from computing RTT times for packets outside
of the set of explicit ACKs that we have received. This means that we
no longer compute RTTs for packets that are on the transmit queue, but
not yet on the wire.
Jeffrey Altman [Sat, 16 Oct 2010 17:14:03 +0000 (13:14 -0400)]
Rx: Do not compute RTT on non-last packets of a jumbogram
A jumbogram is constructed as a series of rx packets that are
all sent at once and acknowledged at the same time. Computing the
RTT for all of the packets that makes up the jumbogram provides
the jumbogram RTT more weight than for a non-jumbogram packet.
To restore fairness, only compute the RTT for the last packet of
a jumbogram. The non-last packets with have the RX_JUMBO_PACKET flag
set in the packet header.
Simon Wilkinson [Mon, 11 Oct 2010 17:14:02 +0000 (13:14 -0400)]
Rx: Reject out of order ACK packets
Our RX implementation virtually guarantees that we will see out of
order ACK packets, even on well behaved networks, as we send acks
simultaneously from multiple threads.
Currently we only reject out-of-order ACKS which change the window
position (so a window that advances, can never go back). However,
we fail to deal with the explicit acknowledgement portion of the ACK
packet in the same way...
For example, if we have a packet A that acknowledges packets 1 and 2,
and then a packet B acknowledging 1,2,3 and 4. If B arrives before A,
then we mark 1, 2, 3, 4 as acknowledged, and then treat the arrival of
A as nAcking 3 and 4. This has the same effect as an explicitly stated
nack, triggers an early and unnecessary resend and may, in some situations,
cause the call to go into congestion avoidance.
We can solve this using the previousPacket field of the ACK. This
indicates the last packet seen by the peer. In the same way as
firstPacket, this should never go backwards, and so can be used to
detect out of order acknowledgements, and reject them.
Andrew Deason [Fri, 15 Oct 2010 21:35:32 +0000 (16:35 -0500)]
pts: Specifically check for group id 0
For consistency with the code checking user ids in createuser, check
for a specified group id of 0 specifically and give a slightly
different error message for it.