Jeffrey Altman [Tue, 31 Jan 2012 20:51:34 +0000 (15:51 -0500)]
Windows: Improve AFSNotifyDelete
Do not call AFSNotifyDelete after the reference count on the
DirEntry->ObjectInformation is given up.
Log the Parent FID and file name since that is what are passed
to the service to perform a delete. Log the actual FID of the
object being deleted and not the address of the FID fields.
Tom Keiser [Wed, 1 Feb 2012 08:31:23 +0000 (03:31 -0500)]
com_err: correctly deal with lack of libintl
On machines lacking a libintl, _intlize() currently fails to initialize
the output error string--leading to tools (e.g., translate_et) returning
a null string; make afs_com_err fall back to returning the en/US canonical
error text when we don't have any i18n support...
Christof Hanke [Sun, 29 Jan 2012 17:08:57 +0000 (18:08 +0100)]
linux: fix probing for noop_fsync
Commit 267934d0e6910c8d8166a6e78f93c1bab40857b8 introduced
probing code to deal with the renameing of simple_fsync
inside the linux-kernel.
This test does not take different parameter-lists
for noop_fsync or simple_fsync resp. into account.
Fix this.
Jeffrey Altman [Sun, 29 Jan 2012 15:39:28 +0000 (10:39 -0500)]
Windows: Increase size of worker thread pools
The size of the afs redirector worker thread pools should be
made configurable but for now just increase the pool size to
be in parity with the default worker pool created by the
afsd service.
Jeffrey Altman [Sun, 29 Jan 2012 15:37:50 +0000 (10:37 -0500)]
Windows: Run Workers until empty task queue
Do not allow a worker thread to sleep until the task queue is
empty. It is better for the running thread to pick up and process
a task then to sleep this thread and wait for another one to wake
up to perform the work.
Jeffrey Altman [Fri, 20 Jan 2012 19:43:06 +0000 (14:43 -0500)]
Windows: Stop the thundering herd
The afs redirector used notification events to wake up worker
threads when a task was added to a work queue. Notification
events when signalled wake up all threads instead of just one.
Instead, use synchronization events to wake up a single thread at
a time and restructure the code to permit workers to wake up
additional workers if there is additional work to be performed
or during library shutdown.
Jeffrey Altman [Wed, 25 Jan 2012 16:27:39 +0000 (11:27 -0500)]
Windows: DriveSubstitution handle too small buffer
If the buffer passed to DriveSubstitution is too small the
resulting file path will end up being truncated. At the very
least log the fact that truncation is occurring. In addition
return the fact that truncation occurred to the caller.
In NPGetUniversalName allocate a 4K buffer on the heap instead
of calculating a buffer based on the local name buffer size.
The local name buffer size has no relationship with the required
buffer size for the expanded unc or device path.
Jeffrey Altman [Tue, 24 Jan 2012 22:09:01 +0000 (17:09 -0500)]
Windows: Invalidate all volumes at library init
The afsredirlib.sys library driver is unloaded when the afsd_service
stops and is reloaded when the afsd_service restarts. During the
shutdown window any objects known to the kernel are preserved by
afsredir.sys. When the afsd_service restarts, there are no valid
callbacks on any objects so the afsredirlib.sys must invalidate all
status info to permit the service to request a callback from the
file server on next use.
Jeffrey Altman [Tue, 24 Jan 2012 17:52:12 +0000 (12:52 -0500)]
Windows: Refactor and consolidate afsredir invalidation
Invalidation requests were being processed in an inconsistent
manner because different rules were being applied to volume root
directories and other objects and whether or not the invalidation
was a whole volume invalidation or not.
This patchset consolidates all invalidation logic for an object
in the new AFSInvalidateObject function. AFSInvalidateObject
is then called from AFSInvalidateCache and AFSInvalidateVolume
as necessary.
AFSInvalidateVolume executes AFSInvalidateObject on all objects
in the volume object tree. As a result, whole volume invalidations
whether triggered by the file server or "fs flushvolume" now work.
Marc Dionne [Mon, 23 Jan 2012 02:21:51 +0000 (21:21 -0500)]
vlserver: Consolidate VLDB entry server flag definitions
Group the definitions of server flags for VLDB entries in one place,
and rename VLSERVER_FLAG_UUID to make its name consistent with the
other flags.
This makes it easier to see the complete set of flags and avoid
conflicts.
Simon Wilkinson [Mon, 7 Nov 2011 09:48:14 +0000 (09:48 +0000)]
viced: Remove the LWP fileserver
*) Remove all LWP specific code from the fileserver, and make pthread
the default
*) Build the pthreaded fileserver in the 'viced' directory, rather than
in tviced
*) Move the DAFS specific files from tviced to viced (arguably, these
should move into dviced, but there are currently no source files in
that directory)
*) Remove tviced from the build
Andrew Deason [Fri, 13 Jan 2012 18:43:16 +0000 (13:43 -0500)]
vol: remove SYNC fatal_error processing
Currently SYNC clients will "disable" themselves on certain error
patterns. For example, if the server end closes its file descriptor
too many times, or takes too long and then closes the fd, the SYNC
client will return an error and set fatal_error. On any subsequent
SYNC requests, the request will immediately fail without contacting
the server, often making SYNC client programs effectively useless
until they are restarted.
There isn't really any reason to cause future requests to fail.
Transient problems in the fileserver can easily make this situation
possible (e.g. a fileserver can crash but still take several minutes
to close the SYNC fd while the core is written to disk), and so while
we may return an error for a specific problematic request, future
requests may be fine.
So, just remove everything related to fatal_error, so future SYNC
requests can continue to be attempted. Adjust some log messages to
reflect the new behavior.
Jeffrey Altman [Sun, 15 Jan 2012 16:43:40 +0000 (11:43 -0500)]
Windows: make lock reader history debug only
The lock reader history on osi_rwlock is proving to be too
expensive. Only use it for DEBUG builds. Leave the data
structures the same so that DEBUG builds can be mixed with
a RELEASE build of afsd_service.exe.
Jeffrey Altman [Sun, 22 Jan 2012 23:42:32 +0000 (18:42 -0500)]
Windows: store data verification mode
Over the lifetime of OpenAFS a number of bugs have been discovered
that can result in data corruption. This new mode (Windows only)
will double check that the data received by the file server does
in fact match the data that was written by the cache manager.
After a successful StoreData and status merge but before the BIOD
is released, a fetchdata is issued to read the data written by the
cache manager. If the data fails to match, the StoreData operation
is repeated.
Data verification mode can be queried with "fs getverify" and set
with "fs setverify {on, off}". The default value can be set with
the TransarcAFSDaemon\Parameters DWORD "VerifyData" registry value.
Jeffrey Altman [Sun, 22 Jan 2012 23:33:43 +0000 (18:33 -0500)]
Windows; release BIOD after status merge
Releasing the BIOD permits the accumulated buffers to be accessed.
Releasing the BIOD before the cm_MergeStatus() call creates a
window where the buffer data version is larger than the cm_scache
data version. Release the BIOD after the status merge.
when we are going to hit the backend storage, disable keepalives.
the net effect of this is that no idle dead time is needed; instead,
the normal dead time will result in a connection with no activity
simply dying naturally if i/o blocks forever.
it's important that keepalives be enabled during callback breaks,
so that is done.
Jeffrey Altman [Wed, 18 Jan 2012 00:46:30 +0000 (19:46 -0500)]
Windows: failover and retry for VBUSY
When a file server returns the VBUSY error for an RPC the
cache manager records the 'srv_busy' state in the cm_serverRef_t
structure binding that file server to the active cm_volume_t
object. The 'srv_busy' was never cleared which prevents the
volume from being accessed.
Clear the 'srv_busy' flag whenever cm_Analyze() receives a
CM_ERROR_ALLBUSY error which means that all replicas have
been tried or whenever the error is not VBUSY or VRESTARTING.
Jeffrey Altman [Fri, 25 Nov 2011 14:28:18 +0000 (09:28 -0500)]
Windows: improved idle dead time handling
RX_CALL_IDLE has been treated the same as RX_CALL_DEAD which is
a fatal error that results in the server being marked down. This
is not the appropriate behavior for an idle dead timeout error
which should not result in servers being marked down.
Idle dead timeouts are locally generated and are an indication
that the server:
a. is severely overloaded and cannot process all
incoming requests in a timely fashion.
b. has a partition whose underlying disk (or iSCSI, etc) is
failing and all I/O requests on that device are blocking.
c. has a large number of threads blocking on a single vnode
and cannot process requests for other vnodes as a result.
d. is malicious.
RX_CALL_IDLE is distinct from RX_DEAD_CALL in that idle dead timeout
handling should permit failover to replicas when they exist in a
timely fashion but in the non-replica case should not be triggered
until the hard dead timeout. If the request cannot be retried, it
should fail with an I/O error. The client should not retry a request
to the same server as a result of an idle dead timeout.
In addition, RX_CALL_IDLE indicates that the client has abandoned
the call but the server has not. Therefore, the client cannot determine
whether or not the RPC will eventually succeed and it must discard
any status information it has about the object of the RPC if the
RPC could have altered the object state upon success.
This patchset splits the RX_CALL_DEAD processing in cm_Analyze() to
clarify that only RX_CALL_DEAD errors result in the server being marked
down. Since Rx idle dead timeout processing is per connection and
idle dead timeouts must differ depending upon whether or not replica
sites exist, cm_ConnBy*() are extended to select a connection based
upon whether or not replica sites exist. A separate connection object
is used for RPCs to replicated objects as compared to RPCs to non-replicated
objects (volumes or vldb).
For non-replica connections the idle dead timeout is set to the hard
dead timeout. For replica connections the idle dead timeout is set
to the configured idle dead timeout.
Idle dead timeout events and whether or not a retry was triggered
are logged to the Windows Event Log.
cm_Analyze() is given a new 'storeOp' parameter which is non-zero
when the execute RPC could modify the data on the file server.
Jeffrey Altman [Mon, 28 Nov 2011 17:58:02 +0000 (12:58 -0500)]
rx: RX_CALL_IDLE and RX_CALL_BUSY
Allocate new Rx error codes for Idle and Busy calls but do not
send these errors on the wire. They are only intended for local
use.
RX_CALL_IDLE is an indication to an application that requests it
that the rx peer is maintaining an open call channel but has not
sent any actual data for the length of the registered idle dead
timeout.
RX_CALL_BUSY is an indication to an application that requests it
that the rx peer believes the selected call channel is in use by
a pre-existing call.
When either RX_CALL_IDLE or RX_CALL_BUSY are assigned as the call
error and an abort must be sent to the rx peer, the errors are
translated to RX_CALL_TIMEOUT. This is necessary because it is
not possible to add new Rx error values in a method that is safe
for peers that are not expecting them.
This patchset also documents which Rx errors defined in rx.h are
used on the wire and which are not.
The Unix and Windows cache managers are updated to build with
these new error codes.
Peter Scott [Thu, 19 Jan 2012 01:42:19 +0000 (18:42 -0700)]
Windows Asynchronous purging of file content after a DV change
Purge all regions of the file surrounding the extents which are to be
purged. If a failure occurs on the purge due to an existing mapping, flag
for purge during handle close
Jeffrey Altman [Thu, 19 Jan 2012 20:25:44 +0000 (15:25 -0500)]
Windows: cm_buf refcnt must hold buf_globalLock
An assertion in buf_Recycle() was being triggered when a cm_buf_t
object was supposed to be in the free buffer list but wasn't.
buf_Recycle() was racing with another thread. The test for
refCount == 0 was performed while holding the buf_globalLock
exclusively but the InterlockedDecrement(refCount) in buf_Release()
was performed without holding buf_globalLock at all. buf_globalLOck
must be held at least as a read lock. Otherwise, the refCount can
reach 0 prior to the thread blocking for exclusive access to the
buf_globalLock. This provides buf_Recycle() which is holding
buf_globalLock the opportunity to race.
The solution is to make sure that buf_Release() always holds
buf_globalLock as a read lock and then use buf_ReleaseLocked()
to perform the actual decrement and test.
Jeffrey Altman [Thu, 19 Jan 2012 06:21:02 +0000 (01:21 -0500)]
Windows: Redesign daemon thread queue management
The daemon thread worker pool has some very poor properties.
The threads spend a significant amount of time polling for
ready to process tasks because so frequently a store/fetch data
request is accompanied by many other requests for the same FID
that would block.
Lets try a new approach. Create one queue for each worker thread
and assign the tasks to a thread by a hash of the FID. This ensures
that all tasks for a single FID are serialized and prevents multiple
threads from attempting to perform the same task only to decide that
the thread would be forced to block.
Jeffrey Altman [Wed, 18 Jan 2012 00:43:54 +0000 (19:43 -0500)]
Windows: prevent race assigning Fcb in AFSInitFcb()
AFSInitFcb() is executed when the ObjectInformation->Fcb pointer
is NULL. More than one thread can make that determination at the
same time. Use InterlockedCompareExchangePointer() to detect
a race and permit cleanup to be performed.
Remove the output parameter of AFSInitFcb() to avoid a double
assignment.
Jeffrey Altman [Sat, 14 Jan 2012 15:32:51 +0000 (10:32 -0500)]
Windows: cm_EndCallbackGrantingCall refactoring
Refactor cm_EndCallbackGrantingCall to prevent assigning a
callback to the cm_scache object in the case where it is going
to be discarded. If the race was lost the callback data was
already discarded by cm_RevokeCallback. By assigning and then
discarding we are forced to issue an additional change notification
to the smb client or afs redirector. Not only is this extra work
but the afs redirector notification can result in a deadlock with
a kernel thread that is waiting for the current thread to complete.
modify the function signature to return whether or not a race
was lost with a callback revocation.
rename 'freeFlag' to 'freeRacingRevokes' since that is what
the flag is meant to indicate.
create a new 'freeServer' flag to indicate when the server
reference should be released. There was a leak of server
references when a race occurred.
modify all calls to cm_EndCallbackGrantingCall() that provide
an AFSCallBack structure on input to check for a lost race.
If a race occurs, cm_MergeStatus() should not be performed.
The DirectoryNodeHdr.TreeLock must be obtained before the
DirEntry->NonPaged->Lock. In AFSLocateNameEntry(), the
DirEntry lock is obtained before the TreeLock when processing
a symlink object. For that case obtain the TreeLOCK first.
Drop it if it is not required.
Peter Scott [Wed, 11 Jan 2012 13:49:23 +0000 (06:49 -0700)]
Windows: Performing async work after cache invalidation
The code now queues a work item to perform additional work on extent
processing after a cache invalidation has occurred. This additional work
involves walking the current list of extents and purging/flushing regions of
the system cache based upon the current state of the extent.
Additional changes to filter which invlidation events result in a queued
worker to perform asynchronous work.
Marc Dionne [Wed, 18 Jan 2012 15:06:36 +0000 (10:06 -0500)]
Parallel build fixes
Assorted fixes for issues seen with parallel builds:
- bucoord must depend on butm, since it uses libbutm
- for most object files in roken and hcrypto, headers must be installed
before building
- remove rules with 2 targets in rxkad and ubik
- budb: add dependencies for db_dump.o
Marc Dionne [Wed, 18 Jan 2012 01:19:54 +0000 (20:19 -0500)]
rx: Correctly test for end of call queue
The intention of this condition is to check if the current call
being considered is the last one on the queue, but the test is
incorrect. A null next pointer indicates a removed item, not
the end of the queue.
Use the queue_IsLast macro instead to correctly determine that
this is the last item in the queue and that a call has to be
selected, either the current one or a previously seen good choice.
This can cause calls to get permanently stuck in the call queue
and never get assigned to a thread, even when all threads are
idle.
Jeffrey Altman [Sat, 14 Jan 2012 15:31:01 +0000 (10:31 -0500)]
Windows: restrict service to 2 cpus by default
Performance drops off considerably when the number of processors
increases due to lock contention and the cm_SyncOp wait processing.
If the MaxCPUs registry value is not set, limit ourselves to two.
Setting MaxCPUs to zero permits use of all CPUs.
Andrew Deason [Wed, 11 Jan 2012 15:00:35 +0000 (10:00 -0500)]
vol: Fix VCreateVolume special inode cleanup
In order to dec the relevant special inodes, we need to know the
parent vol id in addition to the vol id itself. Use the appropriate
volume IDs when IH_DEC'ing special inodes after we fail to create the
volume, so we don't leave behind special inodes.
Jeffrey Altman [Thu, 5 Jan 2012 16:52:00 +0000 (11:52 -0500)]
Windows: Avoid file server rpcs on deleted files
If a file has been deleted, do not attempt to issue RPCs
to the file server in response to AFS redirector extent processing.
All RPCs will fail with VNOVNODE which will in turn trigger invalidation
requests to the AFS redirector which can deadlock.
Jeffrey Altman [Wed, 4 Jan 2012 17:13:40 +0000 (12:13 -0500)]
Windows: use local var for interlocked result
Save the result of the interlocked operations for use in
debug logging. Do not reference the incremented or decremented
object in the log messages, it may have changed.
Local assignment is provided even in functions that are currently
not logging to assist with debugging and as a reminder to use
the result variable in future log messages.
Jeffrey Altman [Wed, 4 Jan 2012 05:02:42 +0000 (00:02 -0500)]
Windows: Directory Enumeration, DVs, and TreeLocks
Hold the TreeLock exclusively across all operations that
enumerate, validate, or otherwise manipulate directory tree
lists or data versions.
Take the data version into account when deciding what to do
with directory data. If a directory enumeration takes more
than one request to service and the DV has changed from the
time the directory snapshop was taken by the service and the
enumeration completion, merge in the changes and then mark
the directory as requiring verification.
If a directory change operation completes (create, rename, remove)
and the directory DV has changed by more than one force a full
directory verification.
Set the directory data version to -1 whenever a directory
verification is required. Otherwise, the check to clear the
VERIFY flag will only update the metadata for the directory.
During a directory verification, if a new entry has been discovered
it is added to the directory. Make sure the VALID flag is set so
that the entry will not immediately be removed as invalid.
Change-Id: I6be8d00126fccf88bde8ae5f97e850dfb9a2f60f
Reviewed-on: http://gerrit.openafs.org/6460 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Wed, 4 Jan 2012 04:39:53 +0000 (23:39 -0500)]
Windows: Permit renames of open files
AFS does not impose a restriction on renames of open files.
Failure to permit the rename can cause problems if an anti-malware
service opens the file immediately after the application performing
the rename does so.
Change-Id: Ib23a6a893c5c575e89b8a817faec4c11300a04b7
Reviewed-on: http://gerrit.openafs.org/6503 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Wed, 4 Jan 2012 04:36:50 +0000 (23:36 -0500)]
Windows: Do not prime the service directory cache
Performing a directory enumeration is an expensive operation
that we should be attempting to avoid. The current directory
enumeration and evaluate target requests will use inline bulk
status RPCs to the file server which obtain status for 49 items
at a time from a single directory.
Change-Id: I78e08680fec9715c3c446d0c4c5226cd79db80bd
Reviewed-on: http://gerrit.openafs.org/6502 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Wed, 4 Jan 2012 04:12:34 +0000 (23:12 -0500)]
Windows: do not flush dirty extents without permission
When closing file handles, do not permit dirty extents to be
released back to the service if the current handle (Ccb) does
not have write permission. The cleanup operation will fail with
STATUS_ACCESS_DENIED, the extents will be released and all of the
dirty data will be discarded.
Change-Id: Iceacf5319147d1bd6277ea160bc67d91f1a49d5b
Reviewed-on: http://gerrit.openafs.org/6500 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Marc Dionne [Fri, 6 Jan 2012 22:22:35 +0000 (17:22 -0500)]
libuafs: only rebuild h directory when needed
A few changes to allow a "make all ; sudo make install ; make all..."
workflow to work without manually removing files in between.
Make the rebuilding of the h directory dependent on the source
files scanned to build it. This prevents it from being rebuilt
for every "make install".
While we're here, use -f when removing linktest for the clean target.
This allows "make clean" to remove it without prompting when the user
doesn't have write access to the file, as is the case when make install
rebuilds it as root.
afs: discard cached state when we are unsure of validity
in the event we got a network error, we don't know if the server
completed (or will complete) our operation. we can assume nothing.
a more complicated version of this could attempt to verify that the
state is what we expect it to be, but in extended callbacks universe
this is potentially easier to solve anyway. for now, return the
error to the caller, and mark the vcache unstat'd.
it's actually important this be more than the rx call dead time
so timing out server callbacks to clients don't result in us idle deading
a call to the server when callbacks need to be broken
Marc Dionne [Thu, 5 Jan 2012 00:27:18 +0000 (19:27 -0500)]
Use offsetof() in set_header_word to get field offset
Use offsetof() to replace a few instances where the same logic is
open coded in set_header_word and inc_header_word macros. In cases
where the field name involves a variable as an index to an array,
newer gcc gives a sequence point warning.
Michael Meffie [Wed, 14 Dec 2011 17:52:51 +0000 (12:52 -0500)]
Unix CM: reset blacklist on hard-mount retry
Reset black-listed servers on a request when retrying due to a
hard-mount retry. When hard-mounts are in effect, a request may
retry indefinitely. If all the servers have been black-listed
due to a transient error, the request may never complete.
Andrew Deason [Fri, 18 Nov 2011 16:25:08 +0000 (10:25 -0600)]
DAFS: Atomically re-hash vnode in VGetFreeVnode_r
VGetFreeVnode_r pulls a vnode off of the vnode LRU, and removes the
vnode from the vnode hash table. In DAFS, we may drop the volume glock
immediately afterwards in order to close the ihandle for the old vnode
structure.
While we have the glock dropped, another thread may try to
VLookupVnode for the new vnode we are creating, find that it is not
hashed, and call VGetFreeVnode_r itself. This can result in two
threads having two separate copies of the same vnode, which bypasses
any mutual exclusion ensured by per-vnode locks, since they will lock
their own version of the vnode. This can result in a variety of
different problems where two threads try to write to the same vnode at
the same time. One example is calling CopyOnWrite on the same file in
parallel, which can cause link undercounts, writes to the wrong vnode
tag, and other CoW-related errors.
To prevent all this, make VGetFreeVnode_r atomically remove the old
vnode structure from the relevant hashes, and add it to the new hashes
before dropping the glock. This ensures that any other thread trying
to load the same vnode will see the new vnode in the hash table,
though it will not yet be valid until the vnode is loaded.
Note that this only solves this race for DAFS. For non-DAFS, the vol
glock is held over the ihandle close, so this race does not exist.
The comments around the callers of VGetFreeVnode_r indicate that
similar extant races exist here for non-DAFS, but they are unsolvable
without significant DAFS-like changes to the vnode package.
Andrew Deason [Tue, 27 Dec 2011 02:22:08 +0000 (21:22 -0500)]
afs: Grab a reference to setp in afs_icl_Event4
We can drop GLOCK in several places in afs_icl_Event4 and the
afs_icl_AppendRecord callee. To ensure that the given afs_icl_set does
not get freed while we have GLOCK dropped, grab a reference to the
set.
Thanks to Ryan C. Underwood for reporting an issue triggered by this.
Geoffrey Thomas [Sun, 1 Jan 2012 00:51:29 +0000 (19:51 -0500)]
linux: fsync on a directory should return 0, not EINVAL
Directory writes are synchronous, so this is fine. There's a
mostly-convenient function in fs/libfs.c that returns 0 that we can use
to do what we want ("mostly" because it was renamed in 2.6.35).
Geoffrey Thomas [Sun, 11 Dec 2011 10:06:24 +0000 (05:06 -0500)]
rpm: Don't attempt to restart on upgrade when using systemd
systemd is actually rather capable of leaving the OpenAFS client in an
incredibly broken state, thanks to its willingness to track services and
kill their processes. We should not attempt to restart the client on
upgrade, whether a normal upgrade or a migration from SysV initscripts.
In the former case, it's fine (and correct) for the old AFS to keep
running; in the latter case, the unit file is capable of correctly
shutting down an initscript-launched client. The same is true for the
OpenAFS server.
This brings the packaging in line with the SysV initscript code in the
specfile, which does not attempt to restart the service, as well as with
e.g. Debian's packaging, which uses --no-restart-on-upgrade.
While we're here, clean up a redundant BuildRequires on systemd-units.
Peter Scott [Fri, 30 Dec 2011 00:30:45 +0000 (17:30 -0700)]
Windows: Handle invalid node types
In the case where the direntry data is invalid, construct an Fcb
of type INVALID so that the direntry can be displayed and the objected
deleted even if it cannot be evaluated.
Jeffrey Altman [Sat, 31 Dec 2011 01:09:06 +0000 (20:09 -0500)]
Windows: renames that overwrite existing target
The Windows client up to this point has never correctly implemented
directory renames. For the longest time it assumed that the file
server would not replace a pre-existing target. As a result, when
the target name was already in use the contents of the directory
would end up with the target name existing but its previous file id
associated with it.
A second problem was that lookups for the source and target names
were not performed while the directory (or directories) were exclusively
held to ensure that competing changes could not occur.
This patchset corrects both issues in cm_Rename() and adjusts the
redirector interface to match the new behavior.
Jeffrey Altman [Fri, 30 Dec 2011 06:34:51 +0000 (01:34 -0500)]
Windows: AFSDirEnumResp and AFSDirEnumEntry changes
A directory enumeration is not an atomic operation. The redirector
reads an enumeration a chunk at a time. During the entire enumeration
it is possible that the data version of the directory object has
changed due to entries being added or removed. This patchset adds
two data version values to the AFSDirEnumResp structure.
The first is the snapshot data version which is the dv of the
directory object at the time the entry list snapshot was taken.
The second is the current data version number of the directory
object.
If an object has been removed from the directory after the snapshot
was taken, attempts to fetch status information for the object will
fail with a VNOVNODE (aka CM_ERROR_BADFD aka STATUS_INVALID_HANDLE).
The NTStatus field has been added to the AFSDirEnumEntry structure
to permit notifying the redirector of such failures.
RDR_PopulateCurrentEntry() has been extended with an additional
cm_Error parameter that accepts the errorCode field provided by
the cm_direnum_entry_t object constructed during the enumeration.
Jeffrey Altman [Fri, 30 Dec 2011 06:24:27 +0000 (01:24 -0500)]
Windows: Add AFSFileEvalResultCB
In response to AFS_REQUEST_TYPE_EVAL_TARGET_BY_ID and
AFS_REQUEST_TYPE_EVAL_TARGET_BY_NAME, return the new AFSFileEvalResultCB
instead of a raw AFSDirEnumEntry. AFSFileEvalResultCB includes
the data version number of the parent directory at the time the
node was evaluated.
Jeffrey Altman [Fri, 30 Dec 2011 06:10:08 +0000 (01:10 -0500)]
Windows: Add AFSFileCleanupResultCB
Add AFSFileCleanupResultCB which includes the parent directory
data version number. This is necessary because object deletion occurs
during the Cleanup processing and the redirector needs to know the
resulting data version of the affected directory.
Jeffrey Altman [Tue, 27 Dec 2011 01:44:36 +0000 (20:44 -0500)]
Windows: RequestExtents avoid bufWrite if rdr held
If the cm_buf_t is held by the redirector the buffer cannot
be written back to the file server even if dirty. Therefore,
do not check whether or not the cm_buf_t is dirty until after
it is known that the buffer is not redirector owned.
Jeffrey Altman [Sat, 31 Dec 2011 21:07:00 +0000 (16:07 -0500)]
Windows: avoid race during Fcb cleanup
The worker thread can race with a AFSCleanup() operation and
tear down the Fcb before the AFSCleanup() drops the Fcb->NPFcb->Resource.
Avoid this race by requiring the worker thread to obtain the resource
once before deleting the resource.
Jeffrey Altman [Sat, 31 Dec 2011 21:04:27 +0000 (16:04 -0500)]
Windows: avoid deadlock if bulk error during enum
If the cache manager has a valid callback at the start of a
directory enumeration, the service can begin a bulk status rpc
which can fail. The error code from the rpc is never propagated
to the caller, therefore the caller loops forever attempting to
complete the enumeration with status info.
Jeffrey Altman [Sat, 31 Dec 2011 01:24:49 +0000 (20:24 -0500)]
Windows: AFSInsertHashEntry can fail
If AFSInsertHashEntry() fails, the object information structure
that was being inserted is not in the btree. Therefore, ensure
that the object does not have the AFS_OBJECT_INSERTED_HASH_TREE
or AFS_VOLUME_INSERTED_HASH_TREE flag set (as appropriate).
This permits the unreferenced object to be garbage collected.
Jeffrey Altman [Fri, 30 Dec 2011 03:20:38 +0000 (22:20 -0500)]
Windows: add DV and error status to dir enumerations
The cm_BPlusDirEnum family of functions are atomic when generating
the directory enumeration but are not atomic with respect to the
rest of the system as the enumeration is accessed. Therefore, the
data version of the directory at the time the enumeration is created
may not be the same as the directory version when the enumeration
is fully processed. We therefore store the initial data version in the
cm_direnum_t object.
When the enumeration is fetching status information for each of the
directory entries, it is possible that the fetch status will fail.
We therefore store the fetch status error code in the cm_direnum_entry_t
object. By doing so, the consumer of the enumeration can make a
reasonable decision about the lack of status info. For example,
if the resulting error is CM_ERROR_BADFD it is known that the entry
has been removed from the directory since the initial enumeration.
Jeffrey Altman [Fri, 30 Dec 2011 03:18:59 +0000 (22:18 -0500)]
Windows: protect merge status against dscp == scp
If the directory status object is the same as the object for which
status info is being merged, the object will refer to itself as its
own parent. Do not permit that.
Jeffrey Altman [Fri, 30 Dec 2011 00:58:19 +0000 (19:58 -0500)]
Windows: protect dir ops by CM_SCACHESYNC_STOREDATA
CM_SCACHESYNC_STOREDATA is used to ensure that only one directory
modifying rpc can be issued to the file server at a time on a
single cm_scache_t. However, the local directory modifications
were being made after cm_MergeStatus() and cm_SyncOpDone()
were called. As a result, serialization of changes against the
local directory buffers and b+tree was lost.
Jeffrey Altman [Thu, 29 Dec 2011 17:42:26 +0000 (12:42 -0500)]
Windows: Symlink resolve failure error
If a symlink cannot be resolved, return STATUS_REPARSE_POINT_NOT_RESOLVED
instead of STATUS_ACCESS_DENIED. The symlink is after all a reparse
point. This results in a more meaningful error being delivered to
the end user.
Jeffrey Altman [Wed, 28 Dec 2011 22:08:23 +0000 (17:08 -0500)]
Windows: Make idle dead timeout very long
The idle dead timeout processing must eventually be removed
from Rx for initiators. In the meantime, make the timeout period
ten times longer than the hard dead timeout. This permits eventual
failure when the server doesn't respond in ten minutes but avoids
more transient issues.