If a volume lookup returns VL_NOENT or VL_BADNAME, cache the negative
response for five minutes. This prevents volume lookup storms caused
by the same volume lookup being performed repeated during a short
time period. This can happen if mount points to volumes that do not
exist are present in a directory that is being evaluated by Windows
Explorer or Common Control File Dialogs.
This functionality is implemented by storing the most recent update
time for the volume group as part of the cm_volume_t. A non-existing
volume group is identified with a new CM_VOLUMEFLAG_NOEXIST flag.
The presence of the lastUpdateTime value also permits volume location
information to expire at lastUpdateTime + lifetime instead of expiring
all volume information simultaneously each lifetime period.
Andrew Deason [Tue, 14 Sep 2010 14:45:10 +0000 (10:45 -0400)]
DAFS: Raise LogLevel for per-chain vol stats
Only report detailed per-chain volume statistics on shutdown/SIGXCPU
if LogLevel is 125 (or 25 for smaller per-chain stats). If a
fileserver is configured with a large -vhashsize, printing out stats
for each chain can take awhile and use up a nontrivial amount of disk
space for logging, so only print out these stats if we're asked for
them.
configure: --with-linux-kernel-packaging should default to disabled
the test for this build feature is reversed. by default, the value for
with_linux_kernel_packaging will not be defined which makes the existing
test pick MPS='SP' instead of LINUX_WHICH_MODULES. based on the configure
help messages, this would appear to be an opt-in not an opt-out.
...
Optional Packages:
...
--with-linux-kernel-packaging
use standard naming conventions to aid Linux kernel
build packaging (disables MPS, sets the kernel
module name to openafs.ko, and installs kernel
modules into the standard Linux location)
...
Simon Wilkinson [Sun, 26 Sep 2010 14:48:54 +0000 (15:48 +0100)]
RX: Tidy reader data locking
Data which is accessed only by the reader thread doesn't need to be
protected by call->lock
Remove the call->lock protection where it isn't required, which makes
certain read/write calls lock free.
Stop rx_ResetCall from manipulating reader thread data. This data will
be zero'd and cleared when the reader thread calls rx_EndCall, and
doesn't need to be reset by the Listener thread.
The change which made rx_ResetCall reset reader thread information
was originally part of 559ea99b. It caused race conditions that were
fixed by adding additional lock protection in d0cc6e, 4dadd2 and 423ab97e. This commit reverts portions of all of those changes. It
is safe to not clear the iovc in ResetCall because any NewCall must
be balanced by a corresponding EndCall in the reader thread, and
EndCall does the appropriate freeing of reader elements.
Ben Kaduk [Wed, 29 Sep 2010 00:03:25 +0000 (20:03 -0400)]
More FBSD syscall tweaking
We're now properly registered in syscalls.master for HEAD
(i.e. proto-9.0) and RELENG_8 (proto-8.2), which means that
afs3_syscall is prototyped in sys/sysproto.h . Accordingly,
don't declare it in afs_prototypes.h for those cases.
Also add FBSD82_ENV checks for the new syscall-registration code,
and cast afs3_syscall to sy_call_t* for the sysent structure.
Simon Wilkinson [Mon, 27 Sep 2010 22:50:23 +0000 (23:50 +0100)]
rx: Limit window size to max acks
The RX ack packet can only acknowledge 255 packets at once. In the
current implementation, this limits our maximum window size to 255,
as we can't acknowledge any packets we receive outside of that window
size.
Contains DKMS robustness fixes, improvements to the defaults for the
module build, and cleanup of the openafs-client init script. Updates
the build system for the new demand-attach binary naming and for the
changes to supported configure options. Fixes some issues with
afs-newcell. Forces disabling of the Linux syscall probing in kernel
module builds, since no supported Debian kernel allows this and it
causes problems. Update debhelper to V8, which allows simplification
of debian/rules and debian/module/rules.
Michael Meffie [Thu, 23 Sep 2010 14:15:57 +0000 (10:15 -0400)]
scout: display fetch and store counts as unsigned
Fetches and stores are already defined as unsigned, so format
them as unsigned values when displaying in scout. This fixes
the bug where scout shows those counts as negative values on
busy servers which have been running for a while.
Simon Wilkinson [Thu, 23 Sep 2010 16:41:47 +0000 (17:41 +0100)]
rx: Big windows make us sad
The commit which took our Window size to 128 caused rxperf to run
40 times slower than before. All of the recent rx improvements have
reduced this to being around 2x slower than before, but we're still
not ready for large window sizes.
As 1.6 is nearing release, reset back to the old, fast, window size
of 32. We can revist this as further performance improvements and
restructuring happen on master.
The previous value, 350ms, is historical. Now that networks are
so much faster, an artificially high timeout value when backed off
results in an extremely long delay before communication can resume.
Rx: Do not hold call lock across memcpy in rx_ReadProc/rx_WriteProc
1.4.x does not hold the call lock across memcpy operations in
rx_ReadProc, rx_ReadProc32, rx_WriteProc, rx_WriteProc32. The
claim is that the call curpos, curlen, and nLeft fields which
refer to the current packet being processed will not be touched
by any other thread. Therefore it is safe to drop the call lock
to permit another thread to add packets to the call while the memcpy
is performed in parallel.
This patchset continues to hold the call lock longer than the
original implementation but does drop it for the length of time
it takes to copy data from the packet buffer to the application
buffer.
Windows: Export additional RX debugging variables from afsrpc.dll
Export
rxi_nRecvFrags @2008 DATA
rxi_nSendFrags @2009 DATA
rx_initReceiveWindow @2010 DATA
rx_initSendWindow @2011 DATA
rx_intentionallyDroppedPacketsPer100 @2012 DATA
rx_intentionallyDroppedOnReadPer100 @2013 DATA
so they can be referenced from pthreaded builds of src/rx/test tools.
Exported variables must be present in both FREE and CHECKED builds.
Rx: PrintTheseStats should not be dependent on RXDEBUG
When RXDEBUG is not defined, PrintTheseStats generates an error
even though the statistics are in fact available. The global
variable rx_packetTypes was not being defined without RXDEBUG.
Make rx_packetTypes defined always and permit statistics to
always be printed.
The global dataPacketsReSent statistic should be the sum of all
peer->reSends and dataPacketsSent should not include the count of
resent packets. Prior to this patchset, dataPacketsSent included
the resent packets and dataPacketsReSent was computed as the number
of requests for Ack instead of the number of packets resent.
If a packet is missing, the peer timeout is backed off to provide
a new starting point for timeout computation. The backoff state
must be stored in the peer object to ensure that multiple failures
do not result in more than one backoff before a successfully received
packet is available for recomputation.
Rx: only compute peer bytes sent and received if rx_stats_active
Computing the bytes sent and received is an expensive operation.
If rx statistics collection has been disabled we should not collect
the peer data. The most expensive operation is the rx_FindPeer()
call that is performed during rxi_ReadPacket(). rxi_ReadPacket()
is processed by the rx listener thread which must be as fast as
possible.
rxi_ReceiveAckPacket can acquire and drop the conn_data_lock several
times and acquires and drops the peer_lock unnecessarily. This patchset
adds a variable to track whether the conn_data_lock is held in order
to avoid the need to drop it and reacquire it based upon conditional
operations. It also relocates the peer->maxPacketSize computations
in order to consolidate the work performed under the peer_lock.
rxperf made assumptions that it was built against LWP, used buffer
sizes for read/write that were too small, made use of non-portable
types, and set signal handlers that are unsupported.
Andrew Deason [Wed, 15 Sep 2010 16:19:33 +0000 (12:19 -0400)]
libafs: Fix pioctl get/putInt alignment issues
We don't know if the buffer for pioctl data is aligned to anything, so
we can't just dereference the given pointer as an int or anything
else. So, just memcpy the data in for ints and such; conveniently,
afs_pd_getBytes and afs_pd_putBytes can do this for us, so just use
that.
Marc Dionne [Fri, 10 Sep 2010 23:55:39 +0000 (19:55 -0400)]
vlserver: Set but not used variables
Remove some variables that are set but never used in the vlserver
directory:
- n1,n2,n3 and n4 in vlclient.c appear to have never been used even
in the original IBM code
- some variables in vldb_check.c that are no longer used after some
recent changes
Andrew Deason [Tue, 14 Sep 2010 16:15:22 +0000 (12:15 -0400)]
volser: Delete timed-out temporary volumes
When a transaction times out on a volume, delete the volume if it is a
temporary volume (destroyMe is set). This prevents half-created
volumes from accumulating, which can take up space and screw up
certain vol ops in some versions.
Marc Dionne [Sat, 11 Sep 2010 17:23:11 +0000 (13:23 -0400)]
butc: Set but unused variables
Remove unused variable taskId
writeData() systematically returns 0, so make it void and adapt
call sites that assigned the return value but never used it.
Also move the function up in the file to avoid the need for a
forward declaration, and make it static since it's only used here.
Marc Dionne [Thu, 1 Jul 2010 15:38:20 +0000 (11:38 -0400)]
Linux: normalize error return for emulated syscalls
pagsh and other code expect setpag() and pioctl() to behave like
a regular syscall or pioctl, that is to return -1 on error, with
errno set to the specific error code.
On Linux, the underlying emulation does a straight return of any
error code it gets from the ioctl, and errors are not properly
caught by the callers.
As an example, pagsh won't detect an error from setpag such as
exceeding a keyring quota limit. With this patch, the user
will see this:
$ pagsh
setpag: Disk quota exceeded
sh-4.1$
The code in proc_afs_syscall is modified to set errno to the error
code and to set errorcode to -1 in case of error.
proc_afs_sycall is reindented while we're changing code there.
minimize the impact of Rx packet tracking. in particular, do no
extra queue scans, which means the rest of the state which tracks
where a packet is now isn't of use. make it possible to re-enable.
Simon Wilkinson [Wed, 8 Sep 2010 07:22:57 +0000 (08:22 +0100)]
Add config.log to gitignore globally
When you end up explicitly regenerating chosen Makefiles by
running ../../config.status Makefile, the working tree ends up littered
with config.log files.
Currently, we only ignore config.log in the top directory - extend this
so that it's ignored across the tree.
Marc Dionne [Fri, 10 Sep 2010 01:02:05 +0000 (21:02 -0400)]
Warning fix for gcc 4.5 "operation may be undefined" warnings
The inc_header_word and set_header_word macros make repeated use of their
argument, which triggers many (~30) warnings with gcc 4.5, like this one:
./ptutils.c:473:6: warning: operation on ‘cheader.foreigncount’ may be undefined
Removing the cast to afs_int32 in the macros gets rid of the warning,
and should be safe since we're just getting a small positive integer value
- the offset of the member in the structure - and passing it to the
pr_Write function which expects an afs_int32.
Update bos create man page for new naming of demand-attach binaries
The demand-attach fileserver binaries now have a "da" prefix. Adjust
the documentation in the man page for bos create accordingly, and add
the new binaries to SEE ALSO.
The two commands are documented identically for right now, so just link
the dafssync-debug man page to the fssync-debug man page. Remove the
incorrect statement in the man page that fssync-debug only works with
demand-attach.
Simon Wilkinson [Thu, 23 Sep 2010 07:58:21 +0000 (08:58 +0100)]
libuafs: Don't #define user
libuafs used to #define user to usr_user, so that any references to
'struct user' would become 'struct usr_user'. However, none of the
kernel code uses struct user, and this #define conflicts with the
definitions in sys/user.h on Linux.
So, just remove it.
Thanks to Russ Allbery for the original problem report.
the file propagation "out of band" changes should not hardcode recovery
on file 0, but instead work on any file the interface is acting on.
use the provided file number.
Windows: Improve SMB detection of Local System account
Depending on the authentication method, the smb session authenticated
name for the "local system" account may be the nul string. In this
case it is impossible to use the name to determine if the authenticated
entity is the "local system" account as required by smb_SetToken.
To work around this problem, smb_AuthenticateUserExt() will now obtain
the Security Identifier (SID) for the authenticated account. The string
representation of the SID will be used in place of the name by
smb_ReceiveV3SessionSetupX() when constructing the smb_user_t object.
A new flag, SMB_USERNAMEFLAG_SID, indicates when the name is in fact
a SID.
smb_userIsLocalSystem() checks for the SMB_USERNAMEFLAG_SID flag and
performs a SID comparison when it is set.
smb_SetToken() will accept either MACHINE\user or a SID string as
the smbname. It will obtain the SID if possible and create a SID-based
smb_user_t.
It is possible that a SYSTEM service will use an anonymous (S-1-5-7)
SMB connection. In that case, we also check the RPC Impersonation
SID to see if it is SYSTEM. If so, the RPC identity supercedes the
SMB identity for SetToken.
smb_IoctlRead, smb_IoctlV3Read and smb_IoctlRawRead are now all
consistent with regards to name processing.
Windows: Modify signature of buf_CleanAsync and buf_CleanAsyncLocked
The buf_CleanAsync() and buf_CleanAsyncLocked() signature does
not include a cm_scache_t pointer even though buf_CleanAsyncLocked()
needs a pointer to the matching cm_scache_t object. There are
some calls when the cm_scache_t object is already known. For those
cases it is more efficient to avoid the additional lookup especially
when buf_CleanAsync*() is being called on every buffer associated
with the cm_scache_t object.
At the same time add a flags field and a constant
CM_BUF_WRITE_SCP_LOCKED to permit the lock state of the cm_scache_t
to be passed in.
Finally, fix up the usage in buf_FlushCleanPages() which gains
the most from these changes.
Windows: Permit cm_scache rwlock to be dropped when "Stablized"
The cm_buf_opts_t cm_BufStabilize() function was implemented
such that holding the cm_scache_t.rw lock had to be exclusively
held until cm_BufUnstablize() was called. Unfortunately, this
prevents using Stabilize/Unstabilize to protect the cm_scache_t
during Flush operations as the cm_scache_t.rw lock must be
acquired after the cm_buf_t mutex and not before it.
This patchset reimplements the synchronization logic using
the new CM_SCACHEFLAG_SIZESETTING flag and cm_SyncOp().
Jeffrey Altman [Mon, 30 Aug 2010 03:41:02 +0000 (23:41 -0400)]
Windows: fail cm_CheckNTOpen if READ|DELETE for readonly file
If the readonly file attribute is set (stored as a unix mode)
then a CreateFile operation should fail if the file is opened
for DELETE in combination with any other privilege.
Jeffrey Altman [Thu, 26 Aug 2010 15:33:43 +0000 (11:33 -0400)]
Windows: Add validation for directory buffer contents
If the directory buffer contents are garbage we can crash
the service. Add some simple validation checks to ensure
that cm_dirEntry_t objects have the correct flag value and
that the name strings are not too long.
Jeffrey Altman [Tue, 24 Aug 2010 20:46:45 +0000 (16:46 -0400)]
Windows: cm_TryBulkStatRPC must process VIO errors
If the bulkStat errorCode indicates that a particular object
is inaccessible due to a VIO error, we must update the server
status appropriately in order to permit failover.
Jeffrey Altman [Tue, 24 Aug 2010 20:42:57 +0000 (16:42 -0400)]
Windows: better handle RX_MSGSIZE errors
An RX_MSGSIZE error is returned by the new PMTU detection
code. It is critical that such an error result in a retry of
the operation that failed. Otherwise, the PMTU detection can't
work and the server will be marked down.
Secondly, it is important that such errors not leak to the
application layer. Map them to CM_ERROR_RETRY in all cases.
Jeffrey Altman [Sat, 21 Aug 2010 04:23:45 +0000 (00:23 -0400)]
Windows: Log cell along with volume id for server errors
When logging server volume instance errors to the windows
application event log, be sure to log the cell as well.
Translating from server ip address is non-trivial. Make it
easier for administrators triaging issues to plug the volume
and cell info into vos commands.
Andrew Deason [Fri, 3 Sep 2010 20:20:10 +0000 (15:20 -0500)]
vos: Show after effects in dryrun mode
The dryrun mode of operation for 'vos syncvldb' and 'vos syncserv'
does not currently show the "status after" portion of its output, so
they don't really show what the commands will do. Change them so
"status after" is shown for -dryrun when sync'ing servers or
partitions, and count changes towards the count at the end.
Marc Dionne [Sun, 5 Sep 2010 14:48:52 +0000 (10:48 -0400)]
afs_DoBulkStat: don't call afs_Analyze without holding the GLOCK
Limit the scope of the GUNLOCK-GLOCK blocks to cover only the RX
calls. This prevents afs_Analyze from being called without the
GLOCK, which causes an oops in afs_icl_Event4() where there's
an ASSERT_GLOCK.
Andrew Deason [Wed, 1 Sep 2010 16:14:37 +0000 (11:14 -0500)]
RedHat: Do not force krb5-config path
If the %krb5config macro is not defined, do not force using
/usr/kerbers/bin/krb5-config, since sometimes that is not where it is
(RHEL6 puts it in /usr/bin). Instead only specify KRB5_CONFIG if
krb5config is defined; otherwise let configure find krb5-config for
us.
Andrew Deason [Wed, 1 Sep 2010 15:32:53 +0000 (10:32 -0500)]
RedHat: Update openafs.spec for configure changes
We no longer have the configure options --enable-disconnected and
--with-krb5-conf. Remove them from the spec file and instead specify
krb5-config via the KRB5_CONFIG variable.
Andrew Deason [Wed, 1 Sep 2010 15:18:17 +0000 (10:18 -0500)]
RedHat: Use git-version in makesrpm.pl
We no longer have the OpenAFS version in the AM_INIT_AUTOMAKE. Get the
version from the equivalent AC_INIT version, which is determined by
running build-tools/git-version. So, run git-version to get the
version.
Hartmut Reuter [Tue, 31 Aug 2010 11:30:41 +0000 (13:30 +0200)]
Let SRXAFS_GetStatistics64 return correct values for the workstations
h_GetWorkstats was called also for 64bit which let random contents
in the other half of the 64bit field. Worse: little and big endian
machines filled different parts of the field so that a later masking
in fsprobe would net help for all kinds of servers.
Now a small wrapper h_GetWorkstats64 is called which calls h_GetWorkstats
correctly.
Matt Benjamin [Sun, 29 Aug 2010 19:06:22 +0000 (15:06 -0400)]
cache bypass Also increment page refcount in readpage
As noticed by a commenter, afs_linux_bypass_readpage needs
the same get_page operation as in afs_linux_bypass_readpages,
as background page accounting assumes we have done it.
Matt Benjamin [Fri, 27 Aug 2010 23:11:32 +0000 (19:11 -0400)]
DFBSD update dfbsd userland
Add new sysnames. Fix some userland header inclusions,
defend against kernel-mode ioctl interpretation when
building UKERNEL. Add fragments missing from DFBSD
MakefileProto template.
Matt Benjamin [Sun, 29 Aug 2010 21:33:03 +0000 (17:33 -0400)]
FBSD try-relax child vnode locking (recurse only)
In cases where afs_vop_lookup would return a child vnode
locked, continue to take an exclusive lock, but allow
recursion (LK_CANRECURSE). Allow recursion also at
afs_vop_link, where we specifically encountered a conflict
due to recursion.
Matt Benjamin [Sun, 29 Aug 2010 00:43:41 +0000 (20:43 -0400)]
FBSD, DFBSD (future) vnode_pager_setsize updates
Based on review of bundled filesystems on FBSD and DFBSD,
call vnode_pager_setsize in three unhandled cases (getattr,
setattr, and io growing a file; truncation was handled
correctly already). Following up on a suggestion by Ben
Kaduk.
Matt Benjamin [Fri, 27 Aug 2010 02:54:20 +0000 (22:54 -0400)]
FBSD restore old syscall register logic for older kernels
The syscall_register code appears to depend on coordination
with FreeBSD upstream, which hopefully can be completed by
RELENG_9. Use the original code for installation of the AFS
syscall everything older than that.
Ben Kaduk [Thu, 26 Aug 2010 03:21:30 +0000 (23:21 -0400)]
FBSD: properly register our syscall
Use the provided interface, syscall_register(), instead of
manually tweaking the sysent table.
Starting afsd will still fail at present on FreeBSD HEAD
without an additional kernel patch to syscalls.master.
Matt Benjamin [Wed, 25 Aug 2010 20:19:18 +0000 (16:19 -0400)]
FBSD: give osi_NetReceive time to shutdown, reprise
The delay logic needs to follow soshutdown, and precede
soclose. The thread in osi_NetReceive is racing to do
another soreceive. That thread needs to win the race
and notice the socket is shut down before rx_socket is
torn down.