Marc Dionne [Wed, 18 Jan 2012 01:19:54 +0000 (20:19 -0500)]
rx: Correctly test for end of call queue
The intention of this condition is to check if the current call
being considered is the last one on the queue, but the test is
incorrect. A null next pointer indicates a removed item, not
the end of the queue.
Use the queue_IsLast macro instead to correctly determine that
this is the last item in the queue and that a call has to be
selected, either the current one or a previously seen good choice.
This can cause calls to get permanently stuck in the call queue
and never get assigned to a thread, even when all threads are
idle.
Jeffrey Altman [Sat, 14 Jan 2012 15:31:01 +0000 (10:31 -0500)]
Windows: restrict service to 2 cpus by default
Performance drops off considerably when the number of processors
increases due to lock contention and the cm_SyncOp wait processing.
If the MaxCPUs registry value is not set, limit ourselves to two.
Setting MaxCPUs to zero permits use of all CPUs.
Andrew Deason [Wed, 11 Jan 2012 15:00:35 +0000 (10:00 -0500)]
vol: Fix VCreateVolume special inode cleanup
In order to dec the relevant special inodes, we need to know the
parent vol id in addition to the vol id itself. Use the appropriate
volume IDs when IH_DEC'ing special inodes after we fail to create the
volume, so we don't leave behind special inodes.
Jeffrey Altman [Thu, 5 Jan 2012 16:52:00 +0000 (11:52 -0500)]
Windows: Avoid file server rpcs on deleted files
If a file has been deleted, do not attempt to issue RPCs
to the file server in response to AFS redirector extent processing.
All RPCs will fail with VNOVNODE which will in turn trigger invalidation
requests to the AFS redirector which can deadlock.
Jeffrey Altman [Wed, 4 Jan 2012 17:13:40 +0000 (12:13 -0500)]
Windows: use local var for interlocked result
Save the result of the interlocked operations for use in
debug logging. Do not reference the incremented or decremented
object in the log messages, it may have changed.
Local assignment is provided even in functions that are currently
not logging to assist with debugging and as a reminder to use
the result variable in future log messages.
Jeffrey Altman [Wed, 4 Jan 2012 05:02:42 +0000 (00:02 -0500)]
Windows: Directory Enumeration, DVs, and TreeLocks
Hold the TreeLock exclusively across all operations that
enumerate, validate, or otherwise manipulate directory tree
lists or data versions.
Take the data version into account when deciding what to do
with directory data. If a directory enumeration takes more
than one request to service and the DV has changed from the
time the directory snapshop was taken by the service and the
enumeration completion, merge in the changes and then mark
the directory as requiring verification.
If a directory change operation completes (create, rename, remove)
and the directory DV has changed by more than one force a full
directory verification.
Set the directory data version to -1 whenever a directory
verification is required. Otherwise, the check to clear the
VERIFY flag will only update the metadata for the directory.
During a directory verification, if a new entry has been discovered
it is added to the directory. Make sure the VALID flag is set so
that the entry will not immediately be removed as invalid.
Change-Id: I6be8d00126fccf88bde8ae5f97e850dfb9a2f60f
Reviewed-on: http://gerrit.openafs.org/6460 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Wed, 4 Jan 2012 04:39:53 +0000 (23:39 -0500)]
Windows: Permit renames of open files
AFS does not impose a restriction on renames of open files.
Failure to permit the rename can cause problems if an anti-malware
service opens the file immediately after the application performing
the rename does so.
Change-Id: Ib23a6a893c5c575e89b8a817faec4c11300a04b7
Reviewed-on: http://gerrit.openafs.org/6503 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Wed, 4 Jan 2012 04:36:50 +0000 (23:36 -0500)]
Windows: Do not prime the service directory cache
Performing a directory enumeration is an expensive operation
that we should be attempting to avoid. The current directory
enumeration and evaluate target requests will use inline bulk
status RPCs to the file server which obtain status for 49 items
at a time from a single directory.
Change-Id: I78e08680fec9715c3c446d0c4c5226cd79db80bd
Reviewed-on: http://gerrit.openafs.org/6502 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Wed, 4 Jan 2012 04:12:34 +0000 (23:12 -0500)]
Windows: do not flush dirty extents without permission
When closing file handles, do not permit dirty extents to be
released back to the service if the current handle (Ccb) does
not have write permission. The cleanup operation will fail with
STATUS_ACCESS_DENIED, the extents will be released and all of the
dirty data will be discarded.
Change-Id: Iceacf5319147d1bd6277ea160bc67d91f1a49d5b
Reviewed-on: http://gerrit.openafs.org/6500 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Marc Dionne [Fri, 6 Jan 2012 22:22:35 +0000 (17:22 -0500)]
libuafs: only rebuild h directory when needed
A few changes to allow a "make all ; sudo make install ; make all..."
workflow to work without manually removing files in between.
Make the rebuilding of the h directory dependent on the source
files scanned to build it. This prevents it from being rebuilt
for every "make install".
While we're here, use -f when removing linktest for the clean target.
This allows "make clean" to remove it without prompting when the user
doesn't have write access to the file, as is the case when make install
rebuilds it as root.
afs: discard cached state when we are unsure of validity
in the event we got a network error, we don't know if the server
completed (or will complete) our operation. we can assume nothing.
a more complicated version of this could attempt to verify that the
state is what we expect it to be, but in extended callbacks universe
this is potentially easier to solve anyway. for now, return the
error to the caller, and mark the vcache unstat'd.
it's actually important this be more than the rx call dead time
so timing out server callbacks to clients don't result in us idle deading
a call to the server when callbacks need to be broken
Marc Dionne [Thu, 5 Jan 2012 00:27:18 +0000 (19:27 -0500)]
Use offsetof() in set_header_word to get field offset
Use offsetof() to replace a few instances where the same logic is
open coded in set_header_word and inc_header_word macros. In cases
where the field name involves a variable as an index to an array,
newer gcc gives a sequence point warning.
Michael Meffie [Wed, 14 Dec 2011 17:52:51 +0000 (12:52 -0500)]
Unix CM: reset blacklist on hard-mount retry
Reset black-listed servers on a request when retrying due to a
hard-mount retry. When hard-mounts are in effect, a request may
retry indefinitely. If all the servers have been black-listed
due to a transient error, the request may never complete.
Andrew Deason [Fri, 18 Nov 2011 16:25:08 +0000 (10:25 -0600)]
DAFS: Atomically re-hash vnode in VGetFreeVnode_r
VGetFreeVnode_r pulls a vnode off of the vnode LRU, and removes the
vnode from the vnode hash table. In DAFS, we may drop the volume glock
immediately afterwards in order to close the ihandle for the old vnode
structure.
While we have the glock dropped, another thread may try to
VLookupVnode for the new vnode we are creating, find that it is not
hashed, and call VGetFreeVnode_r itself. This can result in two
threads having two separate copies of the same vnode, which bypasses
any mutual exclusion ensured by per-vnode locks, since they will lock
their own version of the vnode. This can result in a variety of
different problems where two threads try to write to the same vnode at
the same time. One example is calling CopyOnWrite on the same file in
parallel, which can cause link undercounts, writes to the wrong vnode
tag, and other CoW-related errors.
To prevent all this, make VGetFreeVnode_r atomically remove the old
vnode structure from the relevant hashes, and add it to the new hashes
before dropping the glock. This ensures that any other thread trying
to load the same vnode will see the new vnode in the hash table,
though it will not yet be valid until the vnode is loaded.
Note that this only solves this race for DAFS. For non-DAFS, the vol
glock is held over the ihandle close, so this race does not exist.
The comments around the callers of VGetFreeVnode_r indicate that
similar extant races exist here for non-DAFS, but they are unsolvable
without significant DAFS-like changes to the vnode package.
Andrew Deason [Tue, 27 Dec 2011 02:22:08 +0000 (21:22 -0500)]
afs: Grab a reference to setp in afs_icl_Event4
We can drop GLOCK in several places in afs_icl_Event4 and the
afs_icl_AppendRecord callee. To ensure that the given afs_icl_set does
not get freed while we have GLOCK dropped, grab a reference to the
set.
Thanks to Ryan C. Underwood for reporting an issue triggered by this.
Geoffrey Thomas [Sun, 1 Jan 2012 00:51:29 +0000 (19:51 -0500)]
linux: fsync on a directory should return 0, not EINVAL
Directory writes are synchronous, so this is fine. There's a
mostly-convenient function in fs/libfs.c that returns 0 that we can use
to do what we want ("mostly" because it was renamed in 2.6.35).
Geoffrey Thomas [Sun, 11 Dec 2011 10:06:24 +0000 (05:06 -0500)]
rpm: Don't attempt to restart on upgrade when using systemd
systemd is actually rather capable of leaving the OpenAFS client in an
incredibly broken state, thanks to its willingness to track services and
kill their processes. We should not attempt to restart the client on
upgrade, whether a normal upgrade or a migration from SysV initscripts.
In the former case, it's fine (and correct) for the old AFS to keep
running; in the latter case, the unit file is capable of correctly
shutting down an initscript-launched client. The same is true for the
OpenAFS server.
This brings the packaging in line with the SysV initscript code in the
specfile, which does not attempt to restart the service, as well as with
e.g. Debian's packaging, which uses --no-restart-on-upgrade.
While we're here, clean up a redundant BuildRequires on systemd-units.
Peter Scott [Fri, 30 Dec 2011 00:30:45 +0000 (17:30 -0700)]
Windows: Handle invalid node types
In the case where the direntry data is invalid, construct an Fcb
of type INVALID so that the direntry can be displayed and the objected
deleted even if it cannot be evaluated.
Jeffrey Altman [Sat, 31 Dec 2011 01:09:06 +0000 (20:09 -0500)]
Windows: renames that overwrite existing target
The Windows client up to this point has never correctly implemented
directory renames. For the longest time it assumed that the file
server would not replace a pre-existing target. As a result, when
the target name was already in use the contents of the directory
would end up with the target name existing but its previous file id
associated with it.
A second problem was that lookups for the source and target names
were not performed while the directory (or directories) were exclusively
held to ensure that competing changes could not occur.
This patchset corrects both issues in cm_Rename() and adjusts the
redirector interface to match the new behavior.
Jeffrey Altman [Fri, 30 Dec 2011 06:34:51 +0000 (01:34 -0500)]
Windows: AFSDirEnumResp and AFSDirEnumEntry changes
A directory enumeration is not an atomic operation. The redirector
reads an enumeration a chunk at a time. During the entire enumeration
it is possible that the data version of the directory object has
changed due to entries being added or removed. This patchset adds
two data version values to the AFSDirEnumResp structure.
The first is the snapshot data version which is the dv of the
directory object at the time the entry list snapshot was taken.
The second is the current data version number of the directory
object.
If an object has been removed from the directory after the snapshot
was taken, attempts to fetch status information for the object will
fail with a VNOVNODE (aka CM_ERROR_BADFD aka STATUS_INVALID_HANDLE).
The NTStatus field has been added to the AFSDirEnumEntry structure
to permit notifying the redirector of such failures.
RDR_PopulateCurrentEntry() has been extended with an additional
cm_Error parameter that accepts the errorCode field provided by
the cm_direnum_entry_t object constructed during the enumeration.
Jeffrey Altman [Fri, 30 Dec 2011 06:24:27 +0000 (01:24 -0500)]
Windows: Add AFSFileEvalResultCB
In response to AFS_REQUEST_TYPE_EVAL_TARGET_BY_ID and
AFS_REQUEST_TYPE_EVAL_TARGET_BY_NAME, return the new AFSFileEvalResultCB
instead of a raw AFSDirEnumEntry. AFSFileEvalResultCB includes
the data version number of the parent directory at the time the
node was evaluated.
Jeffrey Altman [Fri, 30 Dec 2011 06:10:08 +0000 (01:10 -0500)]
Windows: Add AFSFileCleanupResultCB
Add AFSFileCleanupResultCB which includes the parent directory
data version number. This is necessary because object deletion occurs
during the Cleanup processing and the redirector needs to know the
resulting data version of the affected directory.
Jeffrey Altman [Tue, 27 Dec 2011 01:44:36 +0000 (20:44 -0500)]
Windows: RequestExtents avoid bufWrite if rdr held
If the cm_buf_t is held by the redirector the buffer cannot
be written back to the file server even if dirty. Therefore,
do not check whether or not the cm_buf_t is dirty until after
it is known that the buffer is not redirector owned.
Jeffrey Altman [Sat, 31 Dec 2011 21:07:00 +0000 (16:07 -0500)]
Windows: avoid race during Fcb cleanup
The worker thread can race with a AFSCleanup() operation and
tear down the Fcb before the AFSCleanup() drops the Fcb->NPFcb->Resource.
Avoid this race by requiring the worker thread to obtain the resource
once before deleting the resource.
Jeffrey Altman [Sat, 31 Dec 2011 21:04:27 +0000 (16:04 -0500)]
Windows: avoid deadlock if bulk error during enum
If the cache manager has a valid callback at the start of a
directory enumeration, the service can begin a bulk status rpc
which can fail. The error code from the rpc is never propagated
to the caller, therefore the caller loops forever attempting to
complete the enumeration with status info.
Jeffrey Altman [Sat, 31 Dec 2011 01:24:49 +0000 (20:24 -0500)]
Windows: AFSInsertHashEntry can fail
If AFSInsertHashEntry() fails, the object information structure
that was being inserted is not in the btree. Therefore, ensure
that the object does not have the AFS_OBJECT_INSERTED_HASH_TREE
or AFS_VOLUME_INSERTED_HASH_TREE flag set (as appropriate).
This permits the unreferenced object to be garbage collected.
Jeffrey Altman [Fri, 30 Dec 2011 03:20:38 +0000 (22:20 -0500)]
Windows: add DV and error status to dir enumerations
The cm_BPlusDirEnum family of functions are atomic when generating
the directory enumeration but are not atomic with respect to the
rest of the system as the enumeration is accessed. Therefore, the
data version of the directory at the time the enumeration is created
may not be the same as the directory version when the enumeration
is fully processed. We therefore store the initial data version in the
cm_direnum_t object.
When the enumeration is fetching status information for each of the
directory entries, it is possible that the fetch status will fail.
We therefore store the fetch status error code in the cm_direnum_entry_t
object. By doing so, the consumer of the enumeration can make a
reasonable decision about the lack of status info. For example,
if the resulting error is CM_ERROR_BADFD it is known that the entry
has been removed from the directory since the initial enumeration.
Jeffrey Altman [Fri, 30 Dec 2011 03:18:59 +0000 (22:18 -0500)]
Windows: protect merge status against dscp == scp
If the directory status object is the same as the object for which
status info is being merged, the object will refer to itself as its
own parent. Do not permit that.
Jeffrey Altman [Fri, 30 Dec 2011 00:58:19 +0000 (19:58 -0500)]
Windows: protect dir ops by CM_SCACHESYNC_STOREDATA
CM_SCACHESYNC_STOREDATA is used to ensure that only one directory
modifying rpc can be issued to the file server at a time on a
single cm_scache_t. However, the local directory modifications
were being made after cm_MergeStatus() and cm_SyncOpDone()
were called. As a result, serialization of changes against the
local directory buffers and b+tree was lost.
Jeffrey Altman [Thu, 29 Dec 2011 17:42:26 +0000 (12:42 -0500)]
Windows: Symlink resolve failure error
If a symlink cannot be resolved, return STATUS_REPARSE_POINT_NOT_RESOLVED
instead of STATUS_ACCESS_DENIED. The symlink is after all a reparse
point. This results in a more meaningful error being delivered to
the end user.
Jeffrey Altman [Wed, 28 Dec 2011 22:08:23 +0000 (17:08 -0500)]
Windows: Make idle dead timeout very long
The idle dead timeout processing must eventually be removed
from Rx for initiators. In the meantime, make the timeout period
ten times longer than the hard dead timeout. This permits eventual
failure when the server doesn't respond in ten minutes but avoids
more transient issues.
Jeffrey Altman [Tue, 27 Dec 2011 01:56:38 +0000 (20:56 -0500)]
Windows: osisleep do not tamper with queues
There is no need to manually remove an entry from a queue before
executing osi_QRemoveHT(). osi_QRemoveHT() removes the item
from the queue and fixes up the pointers correctly. Manual
intervention is a waste of cpu and can be harmful.
Jeffrey Altman [Tue, 27 Dec 2011 01:51:33 +0000 (20:51 -0500)]
Windows: add osi_TWaitExt(), fix osi_TWait()
osi_TWait() was adding new locks to the turnstile at the tail
which is the end of the queue locks are removed from. This
implemented LIFO instead of FIFO when FIFO is the "fair" order
to service lock requests.
osi_TWaitExt() is added to permit the Reader to Writer upgrade
request to use LIFO when more than one reader is present.
Jeffrey Altman [Tue, 27 Dec 2011 01:48:24 +0000 (20:48 -0500)]
Windows: use waiters counter instead of osi_TEmpty
The osi_TEmpty() macro examines the values of the turnstile
pointers. Instead use the lock's 'waiters' counter to determine
if there are waiting threads to signal.
Andrew Deason [Thu, 22 Dec 2011 20:48:49 +0000 (15:48 -0500)]
afs: Panic on afs_conn refcount imbalance
An undercounted afs_conn can easily cause a panic and/or memory
corruption later on, since we put an rx_connection reference with each
afs_conn reference. Panic as soon as we detect this, as this indicates
a serious bug.
Andrew Deason [Wed, 21 Dec 2011 22:01:16 +0000 (17:01 -0500)]
afs: Add afs_WriteDCache sanity checks
Writing a non-free non-discarded dcache entry with a zero volume id
can easily cause hash table corruption later on, so make sure we don't
do that. Also log something if the write itself fails, as this usually
indicates an unusual situation involving I/O errors or something.
Andrew Deason [Wed, 21 Dec 2011 21:05:40 +0000 (16:05 -0500)]
afs: Cope with afs_GetValidDSlot errors
Make callers of afs_GetValidDSlot deal with getting a NULL dcache,
which can occur if an error is encountered. Some of these just panic
at least for now, since a code path for recovery is complex, but this
is at least better than dereferencing a NULL pointer.
Andrew Deason [Wed, 21 Dec 2011 20:04:32 +0000 (15:04 -0500)]
afs: Do not always ignore errors in afs_GetDSlot
Currently afs_UFSGetDSlot will silently swallow any error in reading
the specified dslot from disk, and will return a "blank" dcache to the
caller. However, many callers of afs_GetDSlot will be asking for a
dcache that we know exists, and more importantly, we know is on the
global hash table. If a disk error is encountered and we're given a
"blank" dcache, we will erroneously believe the dcache entry is not on
the hash table, causing corruption of the hash table later on.
So instead, modify all callers of afs_GetDSlot to use either
afs_GetValidDSlot or afs_GetNewDSlot. Calling afs_GetValidDSlot
indicates that the given dentry index is known to be valid, and any
error encountered while reading the entry from disk should result in
an error (for disk I/O errors we have no control over, this results in
a NULL dentry returned; for internal consistency errors we panic).
Calling afs_GetNewDSlot indicates that the specified index may not
exist or may not be valid, and so returning a "blank" dentry in that
case is fine.
For memcache, the situation is the same, except any time we go to
"disk" it is an (internal) error, since there is no disk.
Andrew Deason [Wed, 21 Dec 2011 22:25:29 +0000 (17:25 -0500)]
afs: Remove second argument to afs_GetDSlot
All callers of afs_GetDSlot were passing NULL as the second argument
to afs_GetDSlot. So, remove the argument, and behave as if tmpdc was
NULL unconditionally.
Andrew Deason [Thu, 22 Dec 2011 20:01:52 +0000 (15:01 -0500)]
afs: Indicate error from afs_osi_Read/Write better
Currently afs_osi_Read and afs_osi_Write just return -1 on any I/O
error, even though they know the error code given from the OS VFS.
Just return that code instead so the caller can see what the error
was; but negate it, so it's clear that it is an error.
Andrew Deason [Thu, 22 Dec 2011 19:50:09 +0000 (14:50 -0500)]
afs: afs_osi_Read/Write returns negative on error
afs_osi_Read and afs_osi_Write need to return negative values on
error. EIO is not negative; return -EIO so we don't accidentally
return "success" if someone requested to read or write EIO bytes.
Andrew Deason [Thu, 22 Dec 2011 18:50:53 +0000 (13:50 -0500)]
klog.krb5: cast get_cred_keylen to unsigned
get_cred_keylen can yield a type besides an unsigned int (such as a
size_t on heimdal). But we are printing it with %u, which causes a
warning, so cast it to an unsigned int.
Andrew Deason [Thu, 22 Dec 2011 03:00:12 +0000 (22:00 -0500)]
afsd: Parse cacheinfo during argument parsing
Currently we parse cacheinfo in afsd_run, when the client is
initialized and started. Parsing cacheinfo can change
afsd_cacheMountDir, however, which may be of interest to afsd.o users;
in particular, libuafs exposes this via uafs_MountDir(). This means
that if a mount dir is not explicitly specified in the libcmd
arguments to afsd, a libuafs-using program will see the mountpoint as
the empty string if it is queried after afsd_parse but before
afsd_run. For afsd.fuse, this causes the cryptic error message:
fuse: bad mount point `': No such file or directory
since the mountpoint is the empty string if it is not specified
explicitly on the command line.
To fix this, move cacheinfo parsing to effectively near the end of
afsd_parse, so the mountpoint is calculated in afsd_parse().
Andrew Deason [Fri, 2 Dec 2011 22:06:42 +0000 (16:06 -0600)]
fuse: Add -oallow_other by default where possible
By default, fuse mountpoints are only accessible by the same uid as
that which mounted the fuse filesystem. When we're running as root,
specify -oallow_other so by default anyone can access the afs
mountpoint.
Peter Scott [Sat, 24 Dec 2011 00:00:57 +0000 (17:00 -0700)]
Windows: Avoid bottleneck on VolumeLock
The VolumeLock resource was obtained during each AFSParseName()
and held across a wide range of operations including volume
info queries, renames, and extent requests. These operations can
take a long time to complete and as long as the VolumeLock was
held exclusively there could only be one operation in flight at
a time on a given volume. This significantly reduced the parallelism
of operations.
The VolumeLock was not required in almost all cases. This patchset
adjusts the use of the VolumeLock and avoids the bottleneck.
Jeffrey Altman [Sat, 24 Dec 2011 08:15:53 +0000 (03:15 -0500)]
Windows: avoid race in cm_GetNewSCache
The cm_scacheLock is dropped while walking the scache LRU queue.
As a result it is possible for the cm_scache_t that is being
considered for recycling to be accessed and moved to the head
of the queue.
Track the prev and next pointers so it is possible to detect if
the cm_scache_t that is about to be recycled has been moved. If
so, restart the search from the tail.
Jeffrey Altman [Sat, 24 Dec 2011 08:11:04 +0000 (03:11 -0500)]
Windows: cm_BufWrite() must wait in cm_SyncOp()
Now that it is permissible for more than one store data operation
to construct BIOD lists in parallel, cm_BufWrite() must be willing
to wait in cm_SyncOp(). Otherwise, the daemon threads will spin.
Simon Wilkinson [Sat, 24 Dec 2011 17:23:48 +0000 (17:23 +0000)]
rx: Don't adjust non-existent events
If we notice that time has gone backwards (that is, the current
time is older than the time of the last event we fired), then we
reschedule all pending events.
On Windows, immediately after we have resumed from a suspend, this
code path can be executed with an empty event tree, causing an
exception:
Resolve this by checking for an empty tree before we attempt to adjust
event times. If the tree is empty, we just zero the last event time
(so we don't keep running the adjustTimes routine), and continue as
normal.
Jeffrey Altman [Thu, 22 Dec 2011 02:47:56 +0000 (21:47 -0500)]
Windows: AFSCleanup extent processing
1. Perform a CcFlushCache() any time the file is cached
and the Context Control Block indicates that the handle
has FILE_WRITE_DATA permission.
2. Perform an AFSFlushExtents() whenever there are dirty
extents and the handle has FILE_WRITE_DATA permission.
No point flushing the extents if the AuthGroup does not
have write permission. Another Ccb must exist that does
have write permission.
Change-Id: I3ece011b484c12e7dc936b81c272ba6a42f6c7d6
Reviewed-on: http://gerrit.openafs.org/6399 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Thu, 22 Dec 2011 02:34:14 +0000 (21:34 -0500)]
Windows: AFSRequestExtentsAsync retry with alt authgroup
If AFSRequestExtentsAsync() fails to obtain requested extents
due to STATUS_ACCESS_DENIED using the AuthGroup associated with
the Context Control Block, try to find an alternate AuthGroup
to use to perform the extent request. We have already told
Windows what permissions the application has when the file was
opened. Windows will perform its own validation checks prior
to permitting the data to be accessed or altered.
Jeffrey Altman [Thu, 22 Dec 2011 02:17:33 +0000 (21:17 -0500)]
Windows: Use AuthGroups for extent request error reporting
The afs redirector current tracks the most recent extent error
in the File Control Block. Prior to this patchset the error
was returned to the requesting thread when the process Id matched
the most recent Process to issue a request. This approach resulted
in a couple of problems.
1. There are multiple threads that can issue an extent request
on the same file at the same time representing different processes.
Resetting the process Id with each new request could clear the
error prior to its receipt.
2. The failure may be due to inappropriate permissions. Permissions
are not associated with proceses but with Authentication Groups.
This patchset makes several changes:
1. It enables the afsd_service to track the active authgroup as
part of the cm_user_t structure and associates that object with
the BIOD object to ensure that the active authgroup can be
reported to the afs redirector.
2. It modifies the AFSExtentFailureCB structure to include the
AuthGroup GUID.
3. It tracks the AuthGroup GUID associated with the extent
failure in the non-paged file control block.
4. It converts all tests on Process Id to use AuthGroup instead.
5. It alters the behavior of error delivery such that reported
error is only cleared after it has been reported once to a
thread using the matching AuthGroup.
These changes make the situation better but not perfect as error
states can still be lost. However, it avoids the case most often
seen in production where two processes (a end user process and an
anti-malware process) are fighting over a file and the anti-malware
process has no permission to access the file under its own credentials.
Change-Id: Ia5c3877b8d46de695c86884c4166dc812885a72c
Reviewed-on: http://gerrit.openafs.org/6396 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Jeffrey Altman [Thu, 22 Dec 2011 02:10:45 +0000 (21:10 -0500)]
Windows: Explicit permission check on extent release
When a data extent is released by the afs redirector or the
afsd_service performs an extent claw back during a cleanup
operation, perform an explicit permission check before attempting
to store dirty buffers to the file server. Instead of waiting
for the file server to fail the request, fail it immediately.
The permission check is performed using the currently active
authentication group.
Jeffrey Altman [Thu, 22 Dec 2011 02:08:59 +0000 (21:08 -0500)]
Windows: RDR_CleanupFileEntry restrict extent claw back
Only demand that extents be returned by the afs redirector
if this cleanup is the last open handle or the redirector has
requested that the file be flushed to the file server.
Jeffrey Altman [Thu, 22 Dec 2011 01:49:59 +0000 (20:49 -0500)]
Windows: Bad DV invalidate only when new DV not 0
If the current DV is BAD_VERSION and the new DV is 0, do not send
an invalidation to the redirector. It only results in wasteful work.
If the current DV is BAD_VERSION the object either:
1. was never previously known
2. was recently flushed
3. the cm_scache_t was recycled
In all cases, the redirector does not have knowledge of the object
since either it didn't exist or a previous invalidation was sent.
Jeffrey Altman [Thu, 22 Dec 2011 01:45:19 +0000 (20:45 -0500)]
Windows: Define times in terms of AFS_ONE_SECOND
The afs redirector defines the macro AFS_ONE_SECOND to indicate
the number of 100ns units necessary to indicate one second of time.
Use that definition when defining other time values. Also define
AFS_ONE_MILLISECOND and AFS_ONE_MICROSECOND.
Change-Id: Ie2a173b4037af61e9a1c5aa06129520c36d714bb
Reviewed-on: http://gerrit.openafs.org/6391 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Peter Scott <pscott@kerneldrivers.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Andrew Deason [Mon, 19 Dec 2011 22:11:31 +0000 (17:11 -0500)]
Include afsconfig.h before anything else
afsconfig.h can define various preprocessor symbols that can affect
how system headers behave. For example, the presence of the
_POSIX_PTHREAD_SEMANTICS symbol changes the number of arguments to
getpwnam_r on at least Solaris 8. So, we must include afsconfig.h
before including anything else, to ensure consistency.
Jeffrey Altman [Sun, 18 Dec 2011 23:36:14 +0000 (18:36 -0500)]
Windows: avoid deadlock during SetRenameInformation
The VolumeLock must be held before the Fcb->NPFcb->Resource.
Obtain the VolumeLock in AFSSetFileInformation only in the
rename case instead of obtaining the VolumeLockin AFSSetRenameInformation.
Peter Scott [Wed, 14 Dec 2011 19:27:54 +0000 (12:27 -0700)]
Windows: Track AuthGroup in Context Control Block
Tracking the AuthGroup in the File Control Block proved to be
insufficient to ensure that dirty extents can be stored back
to the file server when an anti-virus service opens a file
in authgroup without 'write' permission immediate after the
application performing a WriteFile() opens it. In this situation
the Fcb ends up with the AuthGroup set to the anti-virus value
and not the one that belongs to the writing application.
Tracking the AuthGroup by Ccb provides the ability to select
an AuthGroup from the list of open handles instead of tracking
the most recent one.
Jeffrey Altman [Sat, 17 Dec 2011 17:08:49 +0000 (12:08 -0500)]
Windows: forget data version only for flushing
The AFS redirector was intentionally forgetting the data version
number for AFS_INVALIDATE_DATA_VERSION events. The point of that
event is to ensure that clean data be purged if the data version
in fact changed. Checking the data version for change cannot be
performed if the data version is reset to -1.
Only when AFS_INVALIDATE_FLUSHED is processed should the data
version be reset to ensure that all of the data is purged.