Andrew Deason [Fri, 4 Nov 2011 17:42:33 +0000 (12:42 -0500)]
DAFS: Deal with exclusive-state volume headers
GetVolumeHeader assumes that headers on the LRU are not associated
with a volume in an exclusive state. This is known to not be true for
some cases when salvage requests are received over FSSYNC, and may be
true in other scenarios. It's easy to just skip such headers, so skip
them.
Andrew Deason [Thu, 3 Nov 2011 18:17:33 +0000 (13:17 -0500)]
salvager: Implement AskDAFS via SYNC flags
Instead of probing the DAFS-ness of the fileserver by probing which
FSSYNC opcodes it supports, detect DAFS-ness by looking at the SYNC
response header flags, which explicitly state whether or not the
endpoint is DAFS. This avoids unnecessary "protocol mismatch" log
messages when the endpoint is not DAFS.
Andrew Deason [Wed, 9 Nov 2011 23:04:09 +0000 (17:04 -0600)]
volser: Preserve needsSalvaged during restore
Some of the routines during a volume restore may set needsSalvaged, if
an inconsistency is detected while writing the given volume data.
However, after the data is read, we set the volume header information
to what was found in the dump stream, ignoring any needsSalvaged that
may have been set.
To ensure that inconsistent volumes in this situation actually get
demand-salvaged (for DAFS) or offlined (non-DAFS), keep the value of
needsSalvaged in the header, if it was set.
Andrew Deason [Thu, 10 Nov 2011 17:58:12 +0000 (11:58 -0600)]
namei: Remove extraneous rmdir
We just unlinked the file, so we know we won't be able to rmdir() the
same thing. Give a path one level higher to
namei_RemoveDataDirectories, so we start rmdir()ing at the parent dir.
Andrew Deason [Tue, 15 Nov 2011 19:18:48 +0000 (13:18 -0600)]
afs: Leave cellnum alone for explicit mtpt cell
When a mountpoint is given an explicit cell, don't alter cellnum.
Cellnum represents the cell for the parent, and is used for
determining whether or not we're crossing a cell boundary.
Previously, this code forced the mount point to always be treated as
foreign (for a mountpoint prefixed with a cell name), or to always be
treated as local (for a mountpoint prefixed with a cell number).
Michael Meffie [Thu, 3 Nov 2011 21:09:28 +0000 (17:09 -0400)]
vol: rate-limit volume usage updates
Add threshold and time rate-limit parameters for volume usage
updates to disk. This reduces the amount of i/o needed for
volume usage statistics on very busy fileservers. Set the
default to limit updates to one every 5 seconds per volume.
Change-Id: I6b4274476ef6b8f9e4288b109d5a3edbdea6e91c
Reviewed-on: http://gerrit.openafs.org/5803 Reviewed-by: Derrick Brashear <shadow@dementix.org> Reviewed-by: Tom Keiser <tkeiser@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com>
Jeffrey Altman [Thu, 17 Nov 2011 05:30:24 +0000 (00:30 -0500)]
Windows: non-release only worker threads can release
There are two classes of worker threads created by the service
and donated to the afsredir as part of the reverse ioctl processing
model. Normal workers can process any kind of ioctl and Release
Only workers that can only process release extent events.
Use a KeWaitForMultipleEvents in the normal worker case to permit
processing any type of event. The previous implementation excluded
release extent ioctls from the normal workers.
Jeffrey Altman [Wed, 16 Nov 2011 05:29:34 +0000 (00:29 -0500)]
auth: initKeys before first error exit path
In afsconf_OpenInternal() _afsconf_InitKeys() must be called
before the first opportunity to call afsconf_CloseInternal()
or a crash can occur if the CellServDB file cannot be parsed.
Jeffrey Altman [Wed, 16 Nov 2011 15:33:41 +0000 (10:33 -0500)]
Windows: Do not install IBM AFS HLP files
The IBM AFS HLP files are so out of date at this point
that they are simply confusing. They reference tools and screens
that no longer exist and claim the product is "IBM AFS". Incorrect
documentation is worse than no documentation.
The HLP files cannot be updated since we do not have the sources.
HLP file format is no longer supported on Windows Vista or 7.
The afs-nt.hlp file will continue to be installed conditionally
when afscreds.exe is installed but the shortcut to it in the
Start menu is being removed. afscreds.exe is not installed by
default.
Jeffrey Altman [Tue, 15 Nov 2011 23:35:26 +0000 (18:35 -0500)]
Windows: buf_CleanAsyncLocked dirty range only
buf_CleanAsyncLocked() should not instruct cm_BufWrite() to
write a full chunk if the current buffer is the only one that
is dirty. cm_BufWrite() will determine if it is appropriate
to fill a full chunk when storing. Instructing it to check
a full chunk forces it to do more work than necessary.
Jeffrey Altman [Tue, 15 Nov 2011 23:23:46 +0000 (18:23 -0500)]
Windows: create scache->redirMx to reduce contention
Relying on the cm_scache_t.rw lock to protect the cm_scache_t.redirQueue*
results in a large amount of contention between processing extent
requests and releases from the afs redirector and the threads attempting
to read from or write data to the file server. There is no reason why
the same lock must be used. Allocate a dedicated mutex to protect the
queue.
By placing the new mutex after the buf_globalLock in the locking
hierarchy it permits the lock acquisition logic for extent processing
to be simplified further reducing cm_scache_t.rw lock transitions.
Jeffrey Altman [Wed, 16 Nov 2011 00:03:14 +0000 (19:03 -0500)]
Windows: Increase default number of daemon threads
With the SMB interface there was little benefit to having
a large background daemon worker pool since it was so rarely
used. Now that the redirector does everything in the background
daemon workers, increase the default from 4 to 16 threads.
Jeffrey Altman [Wed, 16 Nov 2011 00:00:05 +0000 (19:00 -0500)]
Windows: cm_SetupStoreBIOD use firstModOffset chunk
When cm_SetupStoreBIOD attempts to store a chunk to the file
server it should not use *inOffsetp as the start of the range.
There is no guarantee that the buffer at *inOffsetp is dirty.
Instead use firstModOffset which refers to the first known
dirty buffer in the range specified by the caller. Attempt
to fill a chunk of consecutive dirty buffers from that point.
Jeffrey Altman [Tue, 15 Nov 2011 23:40:21 +0000 (18:40 -0500)]
Windows: Fairness for background operations
The background daemon worker pool is responsible for processing
background Store and Fetch operations. With the SMB interface
primary store and fetch operations are performed in the SMB worker
thread which makes sense since those operations must be synchronous
to the incoming request.
With the AFS redirector interface almost all of the work is performed
by the background daemon worker pool. It is therefore critical that
the workers not get stuck in a state that starves applications.
For example, copy of a file that is larger than the cache to \\AFS
will result in a background store request for each chunk size of
the file. If each worker thread grabs one to process, only one will
make progress and the rest will block. If a cleanup operation
(aka handle close) occurs the entire file will be flushed to the
server synchronously in the redirector worker thread. That thread
will cause of the background daemon threads to block.
Any subsequent fetch data requests that get queued behind the list
of stores will in turn block until they clear. This behavior is not
fair.
This patchset adds a new test to the cm_BkgDaemon() request
selection loop, cm_RequestWillBlock(). If a request will block it
is skipped. If there are no requests to process that would not have
blocked, the worker will sleep for 25ms instead of the usual 1s.
For BkgStore operations, the CM_SCACHEFLAG_DATASTORING flag is
used to indicating a blocking state.
For BkgFetch and PreFetch operations, the CM_BUF_WRITING and
CM_BUF_READING flags on the first cm_buf_t of the range is used
to indicate a blocking state.
Jeffrey Altman [Sat, 12 Nov 2011 18:45:08 +0000 (13:45 -0500)]
Windows: Track active RPCs per scache_t
It has been noticed that multiple RPCs can be active on
a cm_scache_t object at the same time. This is especially
true of directory objects with the redirector. Track the
number of active RPCs and use that number in cm_MergeStatus
when deciding whether or not to discard the cached data for
the object.
Jeffrey Altman [Fri, 28 Oct 2011 15:36:10 +0000 (11:36 -0400)]
Windows: out of date version not in current chunk
In buf_GetNewLocked(), the comparision to decide whether a
cm_buf_t is a member of the current chunk must take the data
version into account. If the data version is out of date, it
is not part of the current chunk and is an object that can be
safely recycled.
Jeffrey Altman [Thu, 27 Oct 2011 21:57:25 +0000 (17:57 -0400)]
Windows: only flush buffers on shutdown if running
If a service shutdown message is received prior to the
service entering the running state, do not attempt to
buf_CleanAndReset() because the required data structures
and locks are not initialized.
Jeffrey Altman [Tue, 25 Oct 2011 19:32:11 +0000 (15:32 -0400)]
Windows: Do not EEXIST exact match during rename
AFS Rename operations on the file server will delete a
target file if it exists. Do not prevent renames because
an exact match of the target name exists in the target
directory.
Instead of dropping the lock for read and reacquiring for write
use lock_ConvertRToW() which will make the change atomicly if
it is possible or place the thread into the wait list if not.
The buffer free list least recently used queue has both
head and tail points. Use the proper versions of the queue
mgmt functions and do not handle edge cases as special cases.
The windows cache manager tracks volumes by volume group.
Up to this point all volume location updates have been performed
by the volume name. What if the volume name was altered? In this
case the volume location information for the in use volume ids will
fail until a mount point to the new name is queried. Before
marking the volume group as non-existent attempt to perform a
lookup using either the volume id for the readwrite or readonly
volume.
Jeffrey Altman [Mon, 14 Nov 2011 15:23:53 +0000 (10:23 -0500)]
Windows: netidmgr krb5_cc_get_principal can fail
Do not dereference a NULL pointer if krb5_cc_get_principal fails.
On master this bug is fixed by e55d1774b1b5b27a3617467b5e2a24ee2be3a38c
but that change is after the conversion to the Kerberos Compatibility
SDK and cannot be applied to openafs-stable-1_6_x.
smb_ReceiveNTCreateX() calls cm_CheckNTOpen() which now
requires the smb_fid_t allocated fid value for use in share
mode locking. Move the allocation of the smb_fid earlier
in the function and apply necessary cleanup in error paths.
Jeffrey Altman [Sat, 12 Nov 2011 22:32:06 +0000 (17:32 -0500)]
Windows: cm_GetSCache avoid holding cm_scacheLock
cm_GetSCache used to hold cm_scacheLock write-locked from
start to finish except that it didn't. There were several
places where cm_scacheLock was dropped and reacquired due
to lock ordering requirements. Unfortunately, this has
two problems. First, the function isn't very fast in the
most common case since cm_scacheLock is write-locked for
the search for an existing FID. Second, there is a race
that results when cm_GetNewSCache() drops the cm_scacheLock.
To make things faster, use a read-lock for the common case.
To avoid the race, if the FID cannot be located, call
cm_GetNewSCache() first and then obtain the cell and volume
information. Then perform a second lookup for the FID while
holding cm_scacheLock write-locked. If we lost the race or
there was an error obtaining the cell and volume info, put
the new cm_scache_t back onto the end of the LRU queue.
Jeffrey Altman [Sat, 12 Nov 2011 18:45:08 +0000 (13:45 -0500)]
Windows: Track active RPCs per scache_t
It has been noticed that multiple RPCs can be active on
a cm_scache_t object at the same time. This is especially
true of directory objects with the redirector. Track the
number of active RPCs and use that number in cm_MergeStatus
when deciding whether or not to discard the cached data for
the object.
Jeffrey Altman [Sat, 12 Nov 2011 18:41:30 +0000 (13:41 -0500)]
Windows: fix locking hierarchy in service
The smb username lock and the daemon global lock can be requested
while the scache dirlock is held if there are no free buffers
and the service is forced to claw back extents from the redirector.
Adjust the locking hierarchy accordingly.
Andrew Deason [Wed, 2 Nov 2011 21:55:49 +0000 (16:55 -0500)]
afs: Do not use separate array for srvAddrs
The array of srvAddr structs we use in afs_LoopServers have indices
unrelated to the indices of conns, rxconns, etc. Several places were
assuming that addr[i] corresponded to conn[i], which is not
necessarily true. So instead, do not use the separate addr array
(except when populating the conn and rxconn arrays), and just get the
srvAddr structure by going through the relevant conn[i].
Simon Wilkinson [Sat, 22 Oct 2011 15:37:04 +0000 (16:37 +0100)]
rx: Turn the rxevent_Cancel macro into a function
Turn rxevent_Cancel into a function rather than a macro which modifies
its argument as a side effect. rxevent_Cancel now checks whether the
event being cancelled is already NULL, as well as NULLifying the event
when it is actually cancelled.
Update all of the callers to reflect this new API, and so they no
longer do unecessary work.
Simon Wilkinson [Sat, 22 Oct 2011 15:22:36 +0000 (16:22 +0100)]
rx: New signature for rx event functions
For a while now, we've had both new and old-style rx event callback
functions. Modify all of our event handlers, and the functions that
install them, to use only new style functions, and get rid of the
old-style function prototypes.
Simon Wilkinson [Sat, 22 Oct 2011 10:22:51 +0000 (11:22 +0100)]
opr: Add a red/black tree implementation
Add an implementation of red/black trees to our runtime library.
This is originally derived from the FreeBSD macro-based rbtree
implementation, but is heavily reworked to not use macros, to improve
legibility, and to favour speed over structure compactness.
Simon Wilkinson [Sat, 22 Oct 2011 08:45:10 +0000 (09:45 +0100)]
opr: Add opr_containerof
Add the opr_containerof macro, which can be used to find the base
address of a structure which contains a member whose location is known.
This formulation is heavily used throughout OpenAFS to determine the
base address of structures containing queue pointers - this provides
a central definition, rather than coding it from scratch each time.
Jeffrey Altman [Wed, 5 Oct 2011 07:36:48 +0000 (03:36 -0400)]
Windows: Enforce Share Access
Use file server locks to enforce file share access modes
via the afs redirector interface. The approach taken
integrates share mode enforcement with the file server
lock tracking code in the service. The share mode
enforcement mimics that of the SMB Server interface.
This patchset includes two functional changes to
the previous locking and share mode processing:
1. The cm_scache_t fsLockCount field is used to
determine if the desired lock can be granted
by the file server. If not, the RXAFS_SetLock()
request is skipped and the request is failed
locally.
2. cm_CheckNTOpen() now accepts the desired and
and share access modes. The share access mode
is used to determine if a test lock should be
obtained at all. If the share mode is FILE_SHARE_WRITE
then no lock is requested. This change permits
Microsoft Office applications to offer the user
the ability to open the file in read-only mode
and notify the user when the document can be
opened in read-write mode.
Developed with Peter Scott <pscott@kerneldrivers.com>
Andrew Deason [Thu, 10 Nov 2011 21:18:41 +0000 (15:18 -0600)]
SOLARIS: Do not build x86 kernel module on 5.11
Oracle Solaris 11 no longer supports x86 (amd64 is required). If we
try to build the x86 module, /usr/include/sys/kobj.h complains that
the ISA is unsupported, and refuses to go on. So, just remove
MODLOAD32 from the libafs directories to build on sunx86_511.
Andrew Deason [Fri, 4 Nov 2011 22:19:28 +0000 (17:19 -0500)]
volser: Remove debugging log messages
While the -log option to volserver is supposed to print additional log
information, it shouldn't spam the log with useless data. Remove some
of the log lines that are really more "debug" information, so we log
the same amount of information as in the 1.4 series.
Andrew Deason [Thu, 10 Nov 2011 17:58:12 +0000 (11:58 -0600)]
namei: Remove extraneous rmdir
We just unlinked the file, so we know we won't be able to rmdir() the
same thing. Give a path one level higher to
namei_RemoveDataDirectories, so we start rmdir()ing at the parent dir.
Andrew Deason [Thu, 10 Nov 2011 17:05:28 +0000 (11:05 -0600)]
vol: Remove O_EXCL|O_TRUNC combinations
A few places were specifying both O_EXCL and O_TRUNC to open().
O_TRUNC does not make any sense with O_EXCL, and doesn't do anything,
so remove O_TRUNC from these instances to make the code more clear.
Jeffrey Altman [Thu, 10 Nov 2011 03:47:55 +0000 (22:47 -0500)]
Windows: FSCTL_SET_REPARSE_POINT error
According to MS_FSCC 2.3.54 if the input buffer length is less than the size
of a REPARSE_DATA_BUFFER structure, or the input buffer length is greater
than 16,384, or a REPARSE_DATA_BUFFER structure has been specified for a
third party reparse tag, or the GUID specified for a third party reparse tag
does not match the GUID known by the operating system for this reparse
point, or the reparse tag is 0 or 1, then the return status shall be
STATUS_IO_REPARSE_DATA_INVALID.
Jeffrey Altman [Thu, 10 Nov 2011 03:45:07 +0000 (22:45 -0500)]
Windows: FSCTL_IS_PATHNAME_VALID return success
Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista,
Windows Server 2008, Windows 7, and Windows Server 2008 R2 support the
FSCTL_IS_PATHNAME_VALID Request (section 2.3.21) and return STATUS_SUCCESS
whenever this request is invoked. We will do the same.
Jeffrey Altman [Mon, 31 Oct 2011 03:52:00 +0000 (23:52 -0400)]
Windows: improve store data parallelism
The file server will set the rx call status bit (0x1)
when the rpc is in process and all of the locks are held.
At this point it is not possible for another store data rpc
to begin on the vnode prior to the completion of the current
rpc. Once this status bit is detected as set, the exclusive
store data synchronization on the cm_scache_t can be dropped.
This permits the next store data rpc to perform its biod
construction.
Andrew Deason [Wed, 9 Nov 2011 23:04:09 +0000 (17:04 -0600)]
volser: Preserve needsSalvaged during restore
Some of the routines during a volume restore may set needsSalvaged, if
an inconsistency is detected while writing the given volume data.
However, after the data is read, we set the volume header information
to what was found in the dump stream, ignoring any needsSalvaged that
may have been set.
To ensure that inconsistent volumes in this situation actually get
demand-salvaged (for DAFS) or offlined (non-DAFS), keep the value of
needsSalvaged in the header, if it was set.
Simon Wilkinson [Wed, 12 Oct 2011 13:47:14 +0000 (09:47 -0400)]
rx: Don't clear the receive queue when out of packets
We can end up discarding a receive queue that's been soft acked,
effectively taking back soft acks we sent. Whilst the RX
documentation says that a client can drop soft acked packets at
will, our RX implementation assumes that if the final packet in
a call has been soft acked, we won't clear the queue. If a client
clears the queue in this situation, the call will hang.
What *should* happen is that we should take necessary locks,
confirm that we have not soft-acked all of the packets in a flow,
and then discard, or, if we're just going to discard, error the
call.
Andrew Deason [Wed, 13 Apr 2011 18:15:57 +0000 (13:15 -0500)]
Add "pretty" build option
Add the capability to do a "pretty" build, where we output something
like " CC /path/to/foo.o" to build foo.o, instead of the entire
compiler invocation, similarly to how the Linux kernel build appears.
Add the "pretty" building for CC and LD rules.
This also prints out some helpful information when a command fails,
which can sometimes otherwise be annoying to figure out post-mortem.
To enable the pretty building, make with V=0. To output everything
that is actually run with V=0, make with 'V=0 Q=' .
Note that this does not work with all makes, since not all makes will
propagate command-line-specified variables to sub-makes without -e.
Non-working makes include /usr/ccs/bin/make on HP-UX and Solaris.
However, GNU make will work, as will /usr/xpg4/bin/make on Solaris.
Andrew Deason [Tue, 8 Nov 2011 18:29:39 +0000 (12:29 -0600)]
Specify pattern rules in addition to suffix rules
A few makefiles specify an old-style suffix rule, such as:
.c.o:
$(AFS_CCRULE) $<
Not all makes seem to interpret these rules correctly (such as Solaris
/usr/xpg4/bin/make). Since it is easy to do so, specify pattern-based
rules along with these, like so:
Andrew Deason [Thu, 3 Nov 2011 18:17:33 +0000 (13:17 -0500)]
salvager: Implement AskDAFS via SYNC flags
Instead of probing the DAFS-ness of the fileserver by probing which
FSSYNC opcodes it supports, detect DAFS-ness by looking at the SYNC
response header flags, which explicitly state whether or not the
endpoint is DAFS. This avoids unnecessary "protocol mismatch" log
messages when the endpoint is not DAFS.
Andrew Deason [Fri, 4 Nov 2011 22:19:28 +0000 (17:19 -0500)]
volser: Remove debugging log messages
While the -log option to volserver is supposed to print additional log
information, it shouldn't spam the log with useless data. Remove some
of the log lines that are really more "debug" information, so we log
the same amount of information as in the 1.4 series.
Andrew Deason [Fri, 4 Nov 2011 17:42:33 +0000 (12:42 -0500)]
DAFS: Deal with exclusive-state volume headers
GetVolumeHeader assumes that headers on the LRU are not associated
with a volume in an exclusive state. This is known to not be true for
some cases when salvage requests are received over FSSYNC, and may be
true in other scenarios. It's easy to just skip such headers, so skip
them.
Jeffrey Altman [Fri, 4 Nov 2011 13:34:53 +0000 (09:34 -0400)]
Windows: NPCancelConnection set correct length
The RemoteNameLength passed in the IOCTL_AFS_CANCEL_CONNECTION call
must not include the trailing NUL. NPGetConnection() returns the
size of the buffer used which does include the trailing NUL.
Jeffrey Altman [Thu, 3 Nov 2011 18:14:52 +0000 (14:14 -0400)]
Windows: Simplify KFW_AFS_klog
Reduce the complexity of KFW_AFS_klog. Introduce
KFW_AFS_continue_aklog_processing_after_krb5_error() and
combine the input realm and realm_of_cell cases making
use of the RealmName variable.
Ken Dreyer [Mon, 31 Oct 2011 14:27:16 +0000 (08:27 -0600)]
doc: limitations of addsite on different partitions
A user on the openafs-info mailing list noted that the Admin Guide is
unclear about creating read-only replicas on different partitions on
the same fileserver. Clarify the rules here.