Andrew Deason [Wed, 14 Dec 2011 20:42:08 +0000 (14:42 -0600)]
afs: Clear VHardMount on ResetVolumeInfo
afs_Analyze sets VHardMount on a volume struct when a hard-mount
scenario is encountered, and clears it after sleeping. However, if the
volume struct has VRecheck set, or if it's not in memory, afs_Analyze
cannot retrieve the volume struct in order to clear VHardMount again.
For the VRecheck case, this can results in VHardMount never getting
cleared, and so hard-mount messages for the volume seem to disappear.
So, clear VHardMount when we set VRecheck so this does not occur.
For the case where the volume struct is not in memory, this is not a
problem, since when we allocate a volume struct again, the VHardMount
state will not be retained.
Andrew Deason [Wed, 14 Dec 2011 20:16:16 +0000 (14:16 -0600)]
viced: Yell when we GetSomeSpace_r
A GetSomeSpace_r call indicates we don't have enough callbacks
configured. For many people, this can happen without the administrator
realizing anything is wrong, since we never give any indication that
something is amiss, unless the administrator checks the xstat
statistics.
Since this can indicate a serious performance problem, yell in the log
when this happens. Only do it once, so we don't spam the log.
cs_CZ localization cannot be committed to the repository until:
1. Resource DLLs for all components are built in the tree.
2. All built components have been successfully tested so that OpenAFS is not shipping code that caused executable components to crash in the cs_CZ locale.
Michael Meffie [Thu, 29 Sep 2011 18:44:11 +0000 (14:44 -0400)]
bozo: retry start after error stops
After a bnode is stopped because of two many consecutive exits
delay for some time and attempt to start the bnode again. Countine
to retry on each error stop, doubling the delay for each retry
attempt until a maxium number of attempts.
Michael Meffie [Fri, 30 Sep 2011 16:22:27 +0000 (12:22 -0400)]
bozo: preserve all options over restart
On unix, save all the bosserver command-line options and reuse
them on bosserver restarts. On Windows, the SCM integrator saves
the argument list, just use them.
Andrew Deason [Thu, 3 Feb 2011 22:11:38 +0000 (16:11 -0600)]
volser: Do not reset copyDate in ReClone
When we ReClone in the volserver, do not reset the clone's copyDate to
the current time. If we retain the copyDate between ReClone
operations, then we can know when the clone was first created (and
thus makes local RO clones more consistent with remote RO sites).
It appears that we don't actually need an interface to set the name
of an arbitrary thread (which Mac OS can't do), so remove the
afs_pthread_setname() interface and promote afs_pthread_setname_self()
to the status of primary.
Michael Meffie [Tue, 26 Jul 2011 13:18:44 +0000 (09:18 -0400)]
volscan: print vnode metadata information
volscan program to print vnode meta-data in a grep/awk/perl friendly
format. Optionally, find the paths of each vnode relative to the volume
root. Access control list data can be reported, and are listed as one
access entry per line. Mount point information can be shown to which
volumes are mounted from given volumes.
The path lookup code originally written by Tom Keiser.
Jeffrey Altman [Fri, 9 Dec 2011 23:40:42 +0000 (18:40 -0500)]
Windows: Suspend/Resume for afsd_service
The power mgmt events are received in the service. The service
can block all requests from the redirector from being processed
until it knows that it is safe to process them.
The service will receive a SERVICE_CONTROL_APMSUSPEND just before
the system goes to sleep. The service has two seconds to respond
and it uses that time to attempt to send RXAFS_GiveUpAllCallBacks
to all file servers as an rx_multi with no wait. It also marks
all servers down and updates the callback expirations to be just
after the servers were marked down so that they will be forced to
be refreshed when the server is marked up.
Upon resume the service receives two events. First,
SERVICE_CONTROL_APMRESUMEAUTOMATIC which is used to perform an
SMB lan adapter change detection and perform a probe of all down
servers. The second, SERVICE_CONTROL_APMRESUMESUSPEND is used to
resume SMB listeners, perform a 2nd lan adapter change check (just
in case), check the status of all down servers in additional
networks have come up, and finally resume processing of redirector
requests.
With these changes no special logic in the redirector is required.
Jeffrey Altman [Thu, 8 Dec 2011 15:00:57 +0000 (10:00 -0500)]
Windows: increase timeout for extent request retries
The AFS Redirector requests file data extents from the afsd_service.exe. If
it does not receive the requested extent within 10 seconds it issues another
request for that extent. Extent processing in the afsd_service is handled
by background daemons that process tasks serially from a work queue. When
the load on the system is large enough that satisfying the work queue takes
longer than 10 seconds, the redirector would retry the request. This would
increase the length of the work queue and increase lock contention.
Increasing the timeout period for extent retries to two minutes
significantly reduces the number of retry attempts while maintaining
protection against a lost extent request. Two minutes is selected because
that is the rx hard dead call timeout.
Simon Wilkinson [Sun, 20 Nov 2011 23:11:53 +0000 (18:11 -0500)]
rx: Make CALL_RELE and CALL_HOLD lock refcnt mutex
The reference count mutex must always be held when calling CALL_RELE
or CALL_HOLD. Instead of requiring that the caller obtain, and release
the mutex, do so within the HOLD and RELE macros, greatly simplifying
calling code. Provide CALL_RELE_R and CALL_HOLD_R as versions of these
macros which can be used by callers who already hold the reference
count mutex for other purposes.
Ben Kaduk [Sat, 3 Dec 2011 19:37:09 +0000 (14:37 -0500)]
FBSD: switch afsi_SetServerIPRank implementation
Upstream has removed the ia_net{,mask} elements from
struct in_ifaddr, so we can no longer use them directly.
Switch to passing an rx_ifaddr_t (i.e. struct ifaddr*) in instead,
as that uses a slightly different codepath which still works
for our purposes.
We compile the kernel module with -Werror, so storing a pointer
(memcpy return value) in an int is forbidden, hence the conditional
declaration of 't'.
Simon Wilkinson [Sun, 20 Nov 2011 23:07:41 +0000 (18:07 -0500)]
rx: Helper function for decrementing conn refcnt
The code to lock the reference count mutex, reduce the connection
reference count, then unlock the mutex, is duplicated many times
throughout rx.c. Replace all of these multiple copies with a single
inline function.
Simon Wilkinson [Sun, 20 Nov 2011 16:31:28 +0000 (16:31 +0000)]
rx: Hide the rx_packet.h
Hide the rx_packet.h, and hence the rx_packet structure from
application view. rx_packet.h is currently still installed, and is
included directly by RX security classes, to reduce the per-packet
overhead there.
Simon Wilkinson [Sun, 20 Nov 2011 14:58:28 +0000 (14:58 +0000)]
rx: Make the rx_call structure private
Hide the rx_call structure for public view. Provide accessors for
those elements which are currently accessed by applications.
Note that this change as it currently stands removes the visibility
of the last sent time, and sequence number information, from the
VolMonitor function.
Simon Wilkinson [Tue, 15 Nov 2011 10:40:44 +0000 (10:40 +0000)]
rx: Make struct rx_connection private
Move the rx_connection structure into a private header file, so that
it is only visible from within the rx module. This allows us to use
types within the structure that are not visible to everywhere that
includes rx.h, as well as being a step towards a more stable ABI for
RX.
Add accessor functions for all of the connection members which are
currently used by external callers, and modify those accessors
which were implemented as macros to also be functions.
Change all external access to the connection structures to use these
new functions.
Jeffrey Altman [Sat, 3 Dec 2011 22:49:47 +0000 (17:49 -0500)]
Windows: apply Nat Pings only to cm_rootUser connections
Use CM_UCELLFLAG_ROOTUSER flag to identify the cm_rootUser
connections and only apply Nat pings to those connections
instead of examining the security state of the connection.
Simon Wilkinson [Sat, 3 Dec 2011 21:10:43 +0000 (21:10 +0000)]
rx: Some kernels have no reschedule function
If RXK_TIMEDSLEEP_ENV isn't set, then Unix kernel cache managers
call rxevent_Init without a reschedule function. Check for this so
we don't end up calling a NULL function in these situations.
Change-Id: I5e89f5247aeffc4c27d3f81c0ccabe4979232846
Reviewed-on: http://gerrit.openafs.org/6206 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Sat, 3 Dec 2011 04:38:01 +0000 (23:38 -0500)]
Windows: npdll connected query returns no usage
In response to a NPEnumResources CONNECTED scope query, the usage
field is always set to zero. If the CONNECTABLE flag is set,
mpr.dll will filter the entry out of the result list.
Simon Wilkinson [Sun, 20 Nov 2011 16:29:55 +0000 (16:29 +0000)]
rx: Refactor MaxMTU error checking
The error checking on the rxMaxMTU parameter was done individually by
every server that sets it, using "internal" RX #defines to do so.
Instead, do the error checking within the function that actually sets
the MTU, reducing both the amount of code duplication, and the amount
of RX knowledge held within the servers.
Andrew Deason [Fri, 2 Dec 2011 20:36:59 +0000 (14:36 -0600)]
salvager: Create link table with volume group id
The link table needs to be created with the VG id or RW vol id, not
the non-RW vol id. Unlike other special inodes, this goes for both the
'parent' and 'volume' volume ids, not just the 'parent' id, since
there is only one link table per VG.
Without this, the salvager can generate invalid linktable special
inodes if it encounters a VG with no inodes for the RW vol.
Andrew Deason [Wed, 30 Nov 2011 23:41:53 +0000 (17:41 -0600)]
DAFS: Ensure logging on attach2 errors
The attach2 error path transitions a volume to VOL_STATE_ERROR, in
case whatever got us to that error path did not already put the volume
in an appropriate state. Log when we do this, to make sure we do not
end up with a volume in VOL_STATE_ERROR state silently.
Andrew Deason [Wed, 30 Nov 2011 23:35:56 +0000 (17:35 -0600)]
DAFS: Avoid unnecessary preattach on FSYNC_VOL_ON
FSYNC_VOL_ON/FSYNC_VOL_ATTACH can be called to "online" a volume that
was actually kept online for the duration of the volume operation.
Avoid calling VPreAttachVolumeByVp_r for such a volume if it's already
attached, in order to avoid an unnecessary log message and to save a
tiny bit of processing.
Andrew Deason [Wed, 30 Nov 2011 23:21:32 +0000 (17:21 -0600)]
DAFS: Log more for VPreAttachVolumeByVp odd states
When we encounter "odd" states in VPreAttachVolumeByVp_r, say what the
actual state we encountered was, along with the attach flags, so we
have a better idea of what's going on.
Andrew Deason [Wed, 30 Nov 2011 23:08:57 +0000 (17:08 -0600)]
DAFS: Ensure GetVolume errors on ERROR volumes
In GetVolume, after we call VAttachVolumeByVp_r, there is no explicit
check to see if vp is in VOL_STATE_ERROR state. Make sure we don't try
to use such a volume, or blindly transition the volume away from that
state.
Andrew Deason [Wed, 30 Nov 2011 20:36:06 +0000 (14:36 -0600)]
DAFS: Do not transition to ERROR on trivial errors
attach2 can result in many different errors; some indicate that the
volume is in an inconsistent state, but many others just indicate that
the volume cannot be attached for benign reasons (such as VNOVOL if
the volume doesn't exist, or VOFFLINE if the volume is being used by a
volume utility). Currently, for DAFS, attach2 transitions the relevant
volume to the VOL_STATE_ERROR state for almost all errors encountered,
even the benign ones. Instead, skip the error state transition for
error handling paths that do not reflect a "broken" volume.
Jeffrey Altman [Fri, 2 Dec 2011 18:41:38 +0000 (13:41 -0500)]
Windows: memset in RDR_RequestFileExtentsAsync
The logic in RDR_RequestFileExtentsAsync() made it possible
for memset() to be called multiple times on a buffer that
is already known to be up to date. Restructure the code to
make things faster.
Jeffrey Altman [Fri, 2 Dec 2011 18:36:01 +0000 (13:36 -0500)]
Windows: cm_MergeStatus redirector invalidation
The redirector maintains its own cached status information which
must be updated when a DV change occurs that is not the result
of a redirector initiated data change.
If the current old DV is BAD, send a DV change notification.
If the DV has changed and request was not initiated by the
redirector, send a DV change notification.
If the request was initiated by the redirector, send a notification
for store and directory operations that result in a DV change greater
than the number of active RPCs or any other operation that results
in an unexpected DV change such as FetchStatus.
Jeffrey Altman [Fri, 2 Dec 2011 18:31:15 +0000 (13:31 -0500)]
Windows: cm_MergeStatus use new DV to purge buffers
When deciding whether or not to purge buffers on a DV change
it is the new DV that matters not the old DV. If the new DV
is 0, there should be no purging because there are no buffers
to purge.
Jeffrey Altman [Fri, 2 Dec 2011 16:21:46 +0000 (11:21 -0500)]
Windows: buf_GetNewLocked should use cleaned cm_buf
buf_GetNewLocked() searches the free buffer list for a buffer
that has a 0 refcnt, is not in the chunk that is being populated,
is not actively having I/O performed on it and is not dirty.
If it comes across a dirty buffer, it calls buf_Clean() with
the assumption that buf_CleanAsync() (as it was previously called)
was in fact asynchronous and would return immediately. Instead
buf_Clean() is synchronous and when it completes the buffer will
in most cases be clean. buf_GetNewLocked() should use the newly
cleaned buffer if it is still available and not continue the
search from the next entry in the free buffer list.
Jeffrey Altman [Fri, 2 Dec 2011 16:14:11 +0000 (11:14 -0500)]
Windows: buf_CleanAsync is not async; rename it
buf_CleanAsync() calls cm_BufWrite() which stores the dirty
buffers synchronously. There is nothing asynchronous about
buf_CleanAsync() so rename it to buf_Clean() and buf_CleanAsyncLocked()
to buf_CleanLocked(). Update the comments to remove the references
to the asynchronous processing which doesn't exist.
That is not to say that the call to buf_Clean() in buf_GetNewLocked()
should not be asynchronous; it should. There is no such functionality
at the moment. One approach would be to modify buf_IncrSyncer to
trigger on an event set by buf_GetNewLocked() instead of the call
to buf_Clean(). Another approach would be registering a background
store event. In any case, that is for another patchset.
Jeffrey Altman [Thu, 1 Dec 2011 04:29:56 +0000 (23:29 -0500)]
Windows: invalidate rdr for CM_SCACHE_VERSION_BAD
If the cm_scache_t.dataVersion is set to CM_SCACHE_VERSION_BAD,
invalidate the redirector notion of status so that we do not
leak info to users that do not have permission.
If the dataVersion is CM_SCACHE_VERSION_BAD and is updated
with real status info, invalidate the redirector so it attempts
to read the directory contents.
Jeffrey Altman [Tue, 29 Nov 2011 20:02:12 +0000 (15:02 -0500)]
Windows: AFSRDFSProvider log to file
For when logging via OutputDebugString() is insufficient, add
a cheap method of logging to a fixed file: c:\temp\AFSRDFSProvider.log.
Set AFSRedirector\NetworkProvider "Debug" to 0x2.
Jeffrey Altman [Tue, 29 Nov 2011 20:01:00 +0000 (15:01 -0500)]
Windows: NPEnumResources no Printer support
The AFS Redirector does not support printer shares. If the
query is for printers only (or any other query that does not
permit disk shares as a response) return no more entries.
Jeffrey Altman [Tue, 29 Nov 2011 19:55:55 +0000 (14:55 -0500)]
Windows: no drive subst for NPCancelConnection
NPCancelConnection() must use the results of a Get Connection
ioctl to the afs redirector and not the result of Drive Letter
Substitution queries via DosQueryDevice(). Rename NPGetConnection()
to NPGetConnectionCommon() and add a new parameter to indicate
whether drive substitution is ok.
Jeffrey Altman [Mon, 28 Nov 2011 23:42:21 +0000 (18:42 -0500)]
Windows: Wix disable integrated logon by default
One of the significant differences between the NSIS and Wix
installer packages is that NSIS does not activate integrated
logon by default whereas the Wix installer does. Enabling
integrated logon without configuring the cell, CellServDB,
installing Kerberos v5 and configuring krb5.conf can result
in a very long wait at logon. Now that NSIS is no longer
being supported and cannot be supported as a native 64-bit
installer mechanism we must disable integrated logon by
default to prevent more bad end user experiences like
@Lotterleben described on Twitter.
Simon Wilkinson [Sun, 23 Oct 2011 20:21:39 +0000 (21:21 +0100)]
rx: Use a red black tree for the event stack
Instead of the current event stack, which uses a sorted linked
list, use a red/black tree to maintain the timer stack. This
dramatically improves event insertion times, at the expense of
some additional implementation complexity.
This change also adds reference counting to the rxevent
structure. We've always had a race between an event being
fired, and that event being simultaneously cancelled by
the user thread. Reference counting avoids that race resulting
in the structure appearing twice in the free list.
Jeffrey Altman [Mon, 28 Nov 2011 20:13:43 +0000 (15:13 -0500)]
Windows: cache format version change
With the change to the size of the osi_mutex_t and osi_rwlock_t
structures the CM_CONFIG_DATA_VERSION must change to force a
reconstruction of the cache file.
Jeffrey Altman [Sat, 26 Nov 2011 15:55:27 +0000 (10:55 -0500)]
Windows: convert daemons threads to pthreads
The daemon threads make calls to Rx and therefore need to
be created with the pthread package to prevent the threads
from being tracked as 'native' threads by the pthread_thread_shutdown
thread which can only track up to 63 native threads.
Ben Kaduk [Sun, 13 Nov 2011 18:12:50 +0000 (13:12 -0500)]
FBSD: cleanup dvp locking for ISDOTDOT
This is a more correct version of c2ed2577f9c16df3088158fb593d7aab6e8690d0, which was reverted since
it caused build issues on some versions and kernel panics on others.
We do want to always unlock dvp before calling over the network
in the ISDOTDOT case, but be sure to use the proper spelling
for this operation (as the syntax has changed between FreeBSD versions).
This requires not unlocking dvp right after the afs_lookup() call if
it succeeds, letting us just lock the "child" vp (which is actually
the parent starting from '/') first, and then re-lock dvp.
The error case of afs_lookup() was already handled correctly in
this logic, which is to say that it was incorrect before this change,
attempting to recursively lock dvp which causes a panic.
Edward Z. Yang [Sun, 27 Nov 2011 00:32:51 +0000 (19:32 -0500)]
Linux: 3: Update specfile to know about 3.* kernels.
Update spec file to be consistent with acinclude.m4 with regards to
sysnames. We don't bother updating the code inside the legacy kernel
build section, as it doesn't get triggered for 3.* kernels (it should
probably get cleaned up at some point.)
Also, fix a bug in error message printing of unrecognized kernel.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Change-Id: Ife6046db0bec981be59aa053f63ae71458da7167
Reviewed-on: http://gerrit.openafs.org/6120 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Simon Wilkinson <sxw@inf.ed.ac.uk> Reviewed-by: Derrick Brashear <shadow@dementix.org>
Jeffrey Altman [Tue, 22 Nov 2011 21:36:18 +0000 (16:36 -0500)]
Windows: _._AFS_IOCTL_._ size is zero
When replying to a FileStandardInformation query on the pioctl
special file, the size of the file is 0. Failure to return 0
can result in an anti-virus program attempting to read the file
via a paging request which will fail.
Jeffrey Altman [Mon, 21 Nov 2011 18:14:40 +0000 (13:14 -0500)]
Windows: cm_GetSCache do not release unheld lock
if cm_GetNewSCache() fails, an attempt would be made to
release cm_scacheLock which is not held. However, it should
be noted that cm_GetNewSCache() cannot fail without itself
triggering a panic.
Simon Wilkinson [Sun, 20 Nov 2011 23:40:51 +0000 (23:40 +0000)]
opr: Add Bob Jenkins's hash functions
This imports a small subset of Bob Jenkins lookup3.c hash functions
into the opr library. At present we only import the subset of this
that deals with aligned arrays of integers, as this addresses our
immediate need.
It seems likely that if we're interested in a hash function for string
arrays (or other arbitrary data), that more recent functions such like
SpookyHash (from Bob Jenkins, again) or CityHash (from Google) may be
a better solution.
The immediate use case for this is removing the use of the '%' operator
when indexing speed critical hash tables, as well as ensuring fairer
distribution of entries across these tables.
Andrew Deason [Tue, 15 Nov 2011 19:18:48 +0000 (13:18 -0600)]
afs: Leave cellnum alone for explicit mtpt cell
When a mountpoint is given an explicit cell, don't alter cellnum.
Cellnum represents the cell for the parent, and is used for
determining whether or not we're crossing a cell boundary.
Previously, this code forced the mount point to always be treated as
foreign (for a mountpoint prefixed with a cell name), or to always be
treated as local (for a mountpoint prefixed with a cell number).