Andrew Deason [Wed, 21 Dec 2011 21:05:40 +0000 (16:05 -0500)]
afs: Cope with afs_GetValidDSlot errors
Make callers of afs_GetValidDSlot deal with getting a NULL dcache,
which can occur if an error is encountered. Some of these just panic
at least for now, since a code path for recovery is complex, but this
is at least better than dereferencing a NULL pointer.
Andrew Deason [Wed, 21 Dec 2011 20:04:32 +0000 (15:04 -0500)]
afs: Do not always ignore errors in afs_GetDSlot
Currently afs_UFSGetDSlot will silently swallow any error in reading
the specified dslot from disk, and will return a "blank" dcache to the
caller. However, many callers of afs_GetDSlot will be asking for a
dcache that we know exists, and more importantly, we know is on the
global hash table. If a disk error is encountered and we're given a
"blank" dcache, we will erroneously believe the dcache entry is not on
the hash table, causing corruption of the hash table later on.
So instead, modify all callers of afs_GetDSlot to use either
afs_GetValidDSlot or afs_GetNewDSlot. Calling afs_GetValidDSlot
indicates that the given dentry index is known to be valid, and any
error encountered while reading the entry from disk should result in
an error (for disk I/O errors we have no control over, this results in
a NULL dentry returned; for internal consistency errors we panic).
Calling afs_GetNewDSlot indicates that the specified index may not
exist or may not be valid, and so returning a "blank" dentry in that
case is fine.
For memcache, the situation is the same, except any time we go to
"disk" it is an (internal) error, since there is no disk.
Andrew Deason [Wed, 21 Dec 2011 22:25:29 +0000 (17:25 -0500)]
afs: Remove second argument to afs_GetDSlot
All callers of afs_GetDSlot were passing NULL as the second argument
to afs_GetDSlot. So, remove the argument, and behave as if tmpdc was
NULL unconditionally.
Andrew Deason [Fri, 6 Apr 2012 19:56:07 +0000 (14:56 -0500)]
LINUX: Do not lookup immediately recursive mtpts
On Linux, having a mountpoint in a volume root that points to the same
volume can cause serious problems. By 'immediately recursive', I mean
a situation like the following:
fs mkm mtpt vol
fs mkm mtpt/mtpt vol
If there are multiple dentry aliases for the directory (which is
possible if the directory is a mountpoint), an 'rmdir' on the
recursive mountpoint can cause the client to deadlock. Since the
'rmdir' code path in Linux locks the parent directory inode to perform
the rmdir, and locks the child directory inode after performing a
couple of sanity checks. For an immediately recursive mountpoint,
these two inodes are the same, and so we will deadlock.
Andrew Deason [Fri, 6 Jul 2012 21:37:39 +0000 (16:37 -0500)]
Linux: Make dir dentry aliases act like symlinks
Currently, we try to invalidate other dentries that exist for a
particular dir inode when we look up a dentry. This is so we try to
avoid duplicate dentries for a directory, which Linux does not like
(you cannot have hardlinks to a dir).
If we cannot invalidate the other aliases (because they are being
used), right now we just return the alias. This can make it very easy
to panic the client, due to the sanity checks Linux performs when dong
things like 'rmdir'. If we do something like this:
For the 'rmdir', we will lookup 'mtpt2'. Since 'mtpt' and 'mtpt2'
are mountpoints for the same volume, their dentries point to the same
directory inode. So when we lookup 'mtpt2', we will try to invalidate
the other dentry, but we cannot do that since it is the cwd. So we
return the alias dentry (for 'mtpt'). The Linux VFS layer then does a
sanity check for the rmdir operation, checking that the child dentry's
parent inode is the same as the inode we're performing the rmdir for.
Since the dentry we returned was for 'mtpt', whose parent is 'dir1',
and the actual dir we're performing the rmdir for is 'dir2', this
sanity check fails and we BUG.
To avoid this, make the dentry alias act like a symlink when we
encounter an uninvalidateable dentry alias. That is, we allow multiple
dentry aliases for a directory, however, when the dentry aliases are
actually used, we redirect to a common dentry (via d_automount where
possible, and follow_link elsewhere).
This means that such mountpoints will behave similarly to symlinks, in
that we 'point' to a specific mountpoint dentry. This means that if we
have multiple different ways to get to the same volume, and all are
accessed at the same time, all but one of those mountpoints will
behave like symlinks, pointing to the same mountpoint. So, the '..'
entries for each path will all point to the parent dir of one
mountpoint, meaning that the '..' entry will be "wrong", but for most
cases it will still be correct.
In order to try to make the 'target', pointed-to directory consistent,
we add a new field to struct vcache: target_link. This points to the
dentry we should redirect to, whenever that vcache is referenced. To
avoid (possibly not-feasibly-solvable) problems with refcounting, this
pointer is not actually a reference to the target dentry, but just
serves as a pointer to compare to.
FIXES 130273
Reviewed-on: http://gerrit.openafs.org/7741 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit de381aa0d39e88a1ca0c8ccbb2471c5cad5a964c)
Andrew Deason [Thu, 22 Dec 2011 20:01:52 +0000 (15:01 -0500)]
afs: Indicate error from afs_osi_Read/Write better
Currently afs_osi_Read and afs_osi_Write just return -1 on any I/O
error, even though they know the error code given from the OS VFS.
Just return that code instead so the caller can see what the error
was; but negate it, so it's clear that it is an error.
Andrew Deason [Thu, 22 Dec 2011 19:50:09 +0000 (14:50 -0500)]
afs: afs_osi_Read/Write returns negative on error
afs_osi_Read and afs_osi_Write need to return negative values on
error. EIO is not negative; return -EIO so we don't accidentally
return "success" if someone requested to read or write EIO bytes.
Andrew Deason [Mon, 9 Apr 2012 22:16:42 +0000 (17:16 -0500)]
vos: Do not try to remove backup volume id 0
Currently we always try to delete a BK volume if we're deleting the
RW. If the BK volume id is 0, this is never going to do anything, so
don't try to delete it.
Andrew Deason [Thu, 26 Jul 2012 21:40:03 +0000 (16:40 -0500)]
LINUX: Hold GLOCK for proc traversal
The functions that traverse unixuser structures for display via /proc
(uu_start et al) call various libafs functions hold and release locks,
etc. To do any of that, we need GLOCK. Amongst other issues, we can
panic if we try to acquire a contested lock without GLOCK, since we
assert glock is held when we sleep for the lock or try to wake other
waiters. The same goes for the legacy CellServDB proc file.
So, hold and release GLOCK as appropriate.
Reviewed-on: http://gerrit.openafs.org/7885 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 5237d3d232f22aaf4f67f3a8727a293f4058a7ae)
Niklas Jonsson [Wed, 20 Jun 2012 14:03:54 +0000 (10:03 -0400)]
Auth: increase size of DNS resolver answer buffer
This patchset increases the size of the res_search() answer
buffer from 1024 octets to 4096 octets. This is not a proper
long term solution but will permit sites with longer response
lists to make use of SRV and AFSDB records.
This patchset only impacts UNIX systems. Windows uses the
Win32 DNS resolver which dynamically allocates memory based
upon the size of the response.
Andrew Deason [Mon, 18 Jun 2012 20:06:49 +0000 (15:06 -0500)]
doc: Consolidate NetRestrict format docmentation
We were specifying exactly the same format in two different places;
consolidate them into one place. In addition, explicitly say there are
is no way to specify a range of addresses, in case some people are
confused by the previous versions of this man page that erroneously
said you could use 255 as a wildcard.
Andrew Deason [Mon, 18 Jun 2012 20:06:49 +0000 (15:06 -0500)]
doc: Consolidate NetRestrict format docmentation
We were specifying exactly the same format in two different places;
consolidate them into one place. In addition, explicitly say there are
is no way to specify a range of addresses, in case some people are
confused by the previous versions of this man page that erroneously
said you could use 255 as a wildcard.
Marc Dionne [Sun, 3 Jun 2012 01:35:53 +0000 (21:35 -0400)]
Linux 3.5: encode_fh API change
The encode_fh export operation now expects two inode arguments
instead of a dentry and a "connectable" flag. Use the inode of
the dentry we're interested in, and NULL as the parent inode which
is the same as passing a 0 flag in the previous API.
Andrew Deason [Fri, 18 May 2012 21:40:38 +0000 (17:40 -0400)]
afs: Do not QueueVCB before osi_dnlc_purge*
In afs_FlushVCache, when we QueueVCB, we might drop the afs_xvcache
lock (as of 76158df491f47de56d1febe1d1d2d17d316c9a74). The vcache may
still be on the DNLC, so a lookup while xvcache is dropped can cause
someone else to grab a reference to the vcache while it is being
flushed. This can cause panics and failed assertions, since someone
will have a reference to the flushed vcache, which is effectively
freed and many of the structure fields are no longer valid.
So instead, do not call QueueVCB until we have purged the vcache from
the DNLC.
Andrew Deason [Mon, 21 May 2012 22:11:29 +0000 (17:11 -0500)]
afsd: Avoid dir interpolation for memcache
memcache doesn't make use of fullpn_DCacheFile, fullpn_VolInfoFile,
etc. Do not even try to generate these strings for memcache, since
cacheBaseDir will be NULL, and so this can cause a segfault on some
platforms including Solaris.
Mark Vitale [Tue, 8 May 2012 14:01:12 +0000 (10:01 -0400)]
vos: convertROtoRW may create two RW volumes
If the RW volume is listed after the RO convert target in the VLDB,
the code failed to detect that an RW is already present and would
create a second RW volume.
Reviewed-on: http://gerrit.openafs.org/7385 Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 9a728fd86c7add13e15dfa0d3062fa94cc77c53f)
modify afshelper to just run what it's told instead
of offering fixed operations. this avoids having a setuid
tool around. in spite of apple's suggestion this is correct,
it's actually more dangerous. instead, elevate privilege only
to run a small set of commands, then drop. allow
unlocking of the prefs pane, but make the menu extra prompt
for authentication when needed.
deactivate controls in the prefs pane when locked.
Andrew Deason [Thu, 3 May 2012 21:36:03 +0000 (16:36 -0500)]
vol: Free vol header on attach_volume_header error
In attach_volume_header, make sure we free the volume's header if
we're returning an error. We take care of the locks and i/o handles in
the immediately preceding block, but for an actual error we don't get
rid of the header. Do so.
Noticed by Tom Keiser.
Reviewed-on: http://gerrit.openafs.org/7325 Reviewed-by: Tom Keiser <tkeiser@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit ecfd9549fc29cdad8042e830c656caee1363c6cf)
Andrew Deason [Wed, 2 May 2012 17:07:49 +0000 (12:07 -0500)]
vol: Pay attention to specialStatus after VAVByVp
attach2/VAttachVolumeByVp_r do not alter the yielded error code
according to specialStatus. So, pay attention to specialStatus after
receiving an error from VAttachVolumeByVp_r, to ensure we respond with
the correct error code.
Reviewed-on: http://gerrit.openafs.org/7303 Reviewed-by: Tom Keiser <tkeiser@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit a81f9237bfa7b2e3a0567a930f3c49234b9a4376)
Andrew Deason [Wed, 2 May 2012 16:38:57 +0000 (11:38 -0500)]
vol: Avoid VBUSY/VRESTARTING trick for offline vop
Currently, if GetVolume() finds that the volume we're trying to attach
has a vol op that leaves the volume offline, we do the
VBUSY/VRESTARTING trick as described in CheckVnode(). This doesn't
make any sense for a couple of reasons.
For one, VBUSY/VRESTARTING is not the correct error code to return to
the client when an offline vol op is in progress and vp->specialStatus
is not set everywhere else we yield VOFFLINE.
Additionally, this block of code is only hit once for a particular vol
op. Once we reach this section, the volume is in UNATTACHED state, and
so on the next iteration of GetVolume we will immediately return
VOFFLINE (or specialStatus). So the CheckVnode-like situation is not
applicable, since we are not returning VBUSY to the same client for 15
minutes; we would return VBUSY once and then return VOFFLINE.
Reviewed-on: http://gerrit.openafs.org/7302 Reviewed-by: Tom Keiser <tkeiser@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 21ed79aeaee2d3b2b47436db0491943829ac44a6)
Andrew Deason [Thu, 3 May 2012 17:40:40 +0000 (12:40 -0500)]
vos setaddrs: notice unexpected errors
Currently 'vos setaddrs' only prints a message and errors out if the
VL_RegisterAddrs call fails with certain error codes (VL_MULTIPADDR
and RXGEN_OPCODE). But if we get something else like an access error,
we should of course print that out, instead of reporting success.
Reviewed-on: http://gerrit.openafs.org/7322 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Tom Keiser <tkeiser@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 663185d62df501fb9d7a18e6ef329e4f53aa8854)
Andrew Deason [Fri, 27 Apr 2012 17:59:25 +0000 (12:59 -0500)]
vol: A GOING_OFFLINE volume should yield VOFFLINE
Currently, GetVolume treats a volume in the VOL_STATE_GOING_OFFLINE
state the same as VOL_STATE_SHUTTING_DOWN, and so returns VNOVOL for a
GOING_OFFLINE volume, but these states are very different.
GOING_OFFLINE indicates that a volume should soon be in the UNATTACHED
state, so we should treat GOING_OFFLINE the same as UNATTACHED for
returning errors to the user. For UNATTACHED, we return specialStatus
if it's set, or VOFFLINE otherwise; so, just do the same for
GOING_OFFLINE.
Edward Z. Yang [Sun, 20 Nov 2011 20:48:33 +0000 (15:48 -0500)]
Add OpenAFS to the dependencies of remote-fs.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Reviewed-on: http://gerrit.openafs.org/6093 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 3328770612b7205abb92df5b5f4737eb3349c910)
Michael Meffie [Fri, 30 Sep 2011 16:22:27 +0000 (12:22 -0400)]
bozo: preserve all options over restart
On unix, save all the bosserver command-line options and reuse
them on bosserver restarts. On Windows, the SCM integrator saves
the argument list, just use them.
Andrew Deason [Thu, 5 Apr 2012 22:55:17 +0000 (17:55 -0500)]
viced: Do not offline volume on successful IH_DEC
If we fail to CoW a file due to ENOSPC, we try to IH_DEC the new file
copy, and if IH_DEC fails, we take the volume offline for salvaging.
But IH_DEC returns 0 on success, not on error. So take the salvaging
path when we get non-zero.
Andrew Deason [Mon, 7 May 2012 20:49:34 +0000 (15:49 -0500)]
fs: Report default storebehind when errors exist
After 904c9fbe, we no longer print out the default store asynchrony
when any of the supplied paths results in a pioctl error. However, if
just one (or a few) of the paths supplied results in an error (such
as, the path does not exist), this does not prevent us from reporting
the default value.
Instead, keep track of whether or not we have a valid value, and try
to determine the default if we haven't already by the end of
StoreBehindCmd, and print it out.
Reviewed-on: http://gerrit.openafs.org/7376 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 427f53eea7f9c05e7b1913c91d57777d72bc30b2)
Andrew Deason [Wed, 22 Jun 2011 18:44:38 +0000 (13:44 -0500)]
afs: Ensure afs_calc_inum yields nonzero ino
afs_calc_inum can currently yield an inode of 0 if MD5-based inode
numbers are turned on. Some userspace applications (and for some
platforms, maybe even the kernel) make certain assumptions about the
inode number for a file; in particular for example, 'ls' will not
display a file with inode 0 in a normal directory listing.
So, read the md5 digest until we get a non-zero result. Fall back to
the non-md5 calculation if we still somehow end up with a 0.
While this case may at first glance seem to be extremely rare, in
practice it can occur, as the current calculation for volume 538313506, vnode 26178 does actually yield a 0.
Andrew Deason [Mon, 2 Apr 2012 21:16:37 +0000 (16:16 -0500)]
SOLARIS: Correct misplaced osi_machdep.h #endif
Commit 64778fd7bece52360482f9a51f19b34dac1d2678 removed some '#ifdef
KERNEL' blocks, but for one block in SOLARIS/osi_machdep.h, the wrong
trailing #endif was removed. This effectively makes the last part of
the file Solaris 10+ only, and bypasses the header guard. On systems
before Solaris 10, this causes us to lose the osi_procname definition,
which eventually shows up as an undefined symbol.
So, reinstate the original #endif, and remove the correct #endif
instead.
The AFSDisk and AFSFetchVolumeStatus structures use signed
32-bit integers for representation partition size and
available blocks. RoundInt64ToInt31() should be used instead
of RoundInt64ToInt32() when assigning their values.
Andrew Deason [Fri, 2 Mar 2012 20:55:04 +0000 (14:55 -0600)]
viced: Do not ignore all InlineBulkStatus errors
InlineBulkStatus currently returns 0 unconditionally, no matter what
errors are encountered. If we encounter an error early enough, from
CallPreamble for example, we do not fill in the OutStats nor CallBacks
structures at all. Since we return success anyway, this results in the
client getting AFSFetchStatus structures full of zeroes (or garbage,
before commit 726e1e13ff93e2cc1ac21964dc8d906869e64406).
Since current OpenAFS clients do not perform any sanity checks on the
information received, this can result in cache corruption of files
being seen incorrectly as empty, and, before commit 726e1e, more
arbitrary corruption.
So instead, return an error if we encounter an error before we iterate
over the given FIDs. We still of course do not return an error for any
errors encountered during the actual metadata retrieval, as those are
reflected in the individual per-fid status structures.
OpenAFS changed the behavior of implicit administrator permission
for directory ownership. In OpenAFS only the volume root directory
owner has implicit administrator permissions and they apply to all
directories in the volume not just those with matching ownership.
Jeffrey Altman [Thu, 22 Mar 2012 19:55:47 +0000 (15:55 -0400)]
Windows: Client handling of VNOSERVICE
VNOSERVICE should not be grouped together with the volume status
error codes. It is used to indicate that the RPC was not serviced.
The file server issues it when its idle dead timeout period is reached
while receiving rx call data. The client's existing status information
is still valid and the client can retry the call.
Andrew Deason [Wed, 22 Feb 2012 21:40:20 +0000 (15:40 -0600)]
LINUX: Use afs_convert_code in afs_notify_change
afs_notify_change currently just returns "-code". This can cause a
panic if the error code is negative, since we will return a positive
error code, which may get interpreted as a valid pointer value in
higher levels.
Specifically, if we hit afs_notify_change via something like this code
path:
Andrew Deason [Wed, 22 Feb 2012 21:36:37 +0000 (15:36 -0600)]
LINUX: move afs_notify_change to osi_vnodeops.c
afs_notify_change is almost always used solely in inode_operations
structs, and is more similar to the other per-vnode functions. Put it
with the other per-vnode functions for better organization, and so
they can use the same static functions.
Move the helper functions iattr2vattr and vattr2inode along with it.
Andrew Deason [Wed, 7 Mar 2012 22:36:57 +0000 (16:36 -0600)]
afs: Never #define away afsd_dynamic_vcaches
Some versions of the Solaris Studio compiler on SPARC (at least 12.2
and possibly others, but not 12.3) get a little confused by code like
this:
extern int foo;
int
somefunc(void) {
if (0) return foo;
return 0;
}
When optimization is turned off, this results in an undefined symbol
reference to 'foo' (which is normal), but the resulting object file
lacks a relocation entry for the symbol 'foo', so the symbol remains
undefined after linking. In the OpenAFS tree, this occurs in
afs_daemons.c which references afs_vcount and afs_cacheStats in this
manner due to afsd_dynamic_vcaches being defined as '0' on Solaris.
The end result is that the libafs kernel module is not loadable, since
it complains about afs_vcount and afs_cacheStats being undefined, even
though the symbol definitions are also in the module.
While this is a bug in Solaris Studio and has since been fixed, it is
simple to work around this so we are usable with more compilers. If we
just always declare afsd_dynamic_vcaches as a regular variable, it
works around this issue and keeps the code a tiny bit simpler. So, do
that.
Reviewed-on: http://gerrit.openafs.org/6888 Tested-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit e5821239cde138f74f73bec1bd9a3880d08ac3df)
Simon Wilkinson [Wed, 14 Mar 2012 14:56:06 +0000 (10:56 -0400)]
rx: hold call->lock across RXS_PreparePacket
RX Security Classes have a right to assume that when RXS_PreparePacket
is called that they have exclusive access to the rx_call structure.
Commit e445faa68c5ec6e47d3fd9d7318ade71d98703a9 unintentionally
failed to acquire the call->lock prior to RXS_PreparePacket being
called.
When using an unadorned %config, it's possible that these files will
be replaced by the packaged version during a package update. Changing
%config to %config(noreplace) means that the packaged file will be
installed with the extension .rpmnew if there is already a modified
(from the existing package's version) file with the same name on the
installed machine.
The concern here is that updating an existing system could potentially
change the configuration if the person installing doesn't pay close
attention. The Rule of Least Surprise indicates that we should
try to preserve existing configuration changes whenever possible.
Dave Botsch [Thu, 1 Mar 2012 17:43:36 +0000 (12:43 -0500)]
Fixes dkms.conf for Redhat Enterprise
commit 8e0aaae076f4cccfd2d6ed81ede4e355235b578e , while fixing dkms.conf for
Fedora, broke dkms.conf for RHEL. In RHEL, you get a dkms.conf with too
many backslashes in the "mv" line. The dkms.conf should have the mv line
reading:
mv src/libafs/MODLOAD-*/\$KMODNAME \$DSTKMOD"
for Fedora.
This change checks if we are building on Fedora, and if so, maintains
the extra backslashes. Otherwise, not.
modified: src/packaging/RedHat/openafs.spec.in
Uses the dist tags as specified at
http://fedoraproject.org/wiki/Packaging:DistTag
Reviewed-on: http://gerrit.openafs.org/6851 Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 81a9a33e0bc5455841ba105dab52735c64c7096b)
Jeffrey Altman [Thu, 1 Mar 2012 20:49:12 +0000 (15:49 -0500)]
unix: always retry RX_CALL_BUSY
RX_CALL_BUSY is an indication that the call channel is busy not
that the server is down or otherwise cannot respond. Unconditionally
retry the RPC and do not alter state. We just want to force the use
of a different call channel.
Jeffrey Altman [Wed, 29 Feb 2012 18:07:47 +0000 (13:07 -0500)]
Windows: Workaround Win7 SMB Reconnect Bug
The SMB specification permits the server to save a round trip
in the GSS negotiation by sending an initial security blob.
Unfortunately, doing so trips a bug in Windows 7 and Server 2008 R2
whereby the SMB 1.x redirector drops the blob on the floor after
the first connection to the server and simply attempts to reuse
the previous authentication context. This bug can be avoided by
the server sending no security blob in the SMB_COM_NEGOTIATE
response. This forces the client to send an initial GSS init_sec_context
blob under all circumstances which works around the bug in Microsoft's
code.
Do not call smb_NegotiateExtendedSecurity(&secBlob, &secBlobLength);
As a result of the SMB 1.x bug, all attempts to reconnect fail due to
SMB connection resets. The SMB 1.x redirector will retry indefinitely
but all processes with outstanding requests to \\AFS will block until
the machine is rebooted.
Jeffrey Altman [Sun, 26 Feb 2012 19:45:43 +0000 (14:45 -0500)]
Windows: disable Adv ICF support if not supported
OpenAFS 1.6.x does not require the use of SDK 6.0 or above.
Therefore the Advanced Internet Connection Firewall support
may not be available. In particular, the 32-bit distribution
for 1.6.x does not rely on SDK 6.0 or higher.
Jeffrey Altman [Wed, 18 Jan 2012 00:46:30 +0000 (19:46 -0500)]
Windows: failover and retry for VBUSY
When a file server returns the VBUSY error for an RPC the
cache manager records the 'srv_busy' state in the cm_serverRef_t
structure binding that file server to the active cm_volume_t
object. The 'srv_busy' was never cleared which prevents the
volume from being accessed.
Clear the 'srv_busy' flag whenever cm_Analyze() receives a
CM_ERROR_ALLBUSY error which means that all replicas have
been tried or whenever the error is not VBUSY or VRESTARTING.
Jeffrey Altman [Fri, 25 Nov 2011 14:28:18 +0000 (09:28 -0500)]
Windows: improved idle dead time handling
RX_CALL_IDLE has been treated the same as RX_CALL_DEAD which is
a fatal error that results in the server being marked down. This
is not the appropriate behavior for an idle dead timeout error
which should not result in servers being marked down.
Idle dead timeouts are locally generated and are an indication
that the server:
a. is severely overloaded and cannot process all
incoming requests in a timely fashion.
b. has a partition whose underlying disk (or iSCSI, etc) is
failing and all I/O requests on that device are blocking.
c. has a large number of threads blocking on a single vnode
and cannot process requests for other vnodes as a result.
d. is malicious.
RX_CALL_IDLE is distinct from RX_DEAD_CALL in that idle dead timeout
handling should permit failover to replicas when they exist in a
timely fashion but in the non-replica case should not be triggered
until the hard dead timeout. If the request cannot be retried, it
should fail with an I/O error. The client should not retry a request
to the same server as a result of an idle dead timeout.
In addition, RX_CALL_IDLE indicates that the client has abandoned
the call but the server has not. Therefore, the client cannot determine
whether or not the RPC will eventually succeed and it must discard
any status information it has about the object of the RPC if the
RPC could have altered the object state upon success.
This patchset splits the RX_CALL_DEAD processing in cm_Analyze() to
clarify that only RX_CALL_DEAD errors result in the server being marked
down. Since Rx idle dead timeout processing is per connection and
idle dead timeouts must differ depending upon whether or not replica
sites exist, cm_ConnBy*() are extended to select a connection based
upon whether or not replica sites exist. A separate connection object
is used for RPCs to replicated objects as compared to RPCs to non-replicated
objects (volumes or vldb).
For non-replica connections the idle dead timeout is set to the hard
dead timeout. For replica connections the idle dead timeout is set
to the configured idle dead timeout.
Idle dead timeout events and whether or not a retry was triggered
are logged to the Windows Event Log.
cm_Analyze() is given a new 'storeOp' parameter which is non-zero
when the execute RPC could modify the data on the file server.
Jeffrey Altman [Fri, 3 Feb 2012 16:21:45 +0000 (11:21 -0500)]
Windows: fix cm_DirOpDelBuffer assert
In cm_DirOpDelBuffer() the data version field for a buffer
in cm_dirOp_t.buffers[] can be CM_BUF_VERSION_BAD if the buffer
was added to the buffer list but was never fetched from the file
server. If the buffer was recycled by buf_Get() an attempt to
remove an entry from the directory will be failed as opposed to
fetching the buffer from the file server and performing the local
removal.
Jeffrey Altman [Fri, 3 Feb 2012 16:17:40 +0000 (11:17 -0500)]
Windows: buffer DV ranges do not work for directories
In cm_MergeStatus, always set cm_scache_t.bufDataVersionLow
to the new data version because the cm_dir package does not
support version ranges. All modified dir buffers have their
dataVersion field set to the current data version value.
Failure to update the bufDataVersionLow field can result in
B+ Trees being constructed from out of date directory information.
Jeffrey Altman [Sun, 22 Jan 2012 23:33:43 +0000 (18:33 -0500)]
Windows; release BIOD after status merge
Releasing the BIOD permits the accumulated buffers to be accessed.
Releasing the BIOD before the cm_MergeStatus() call creates a
window where the buffer data version is larger than the cm_scache
data version. Release the BIOD after the status merge.
Jeffrey Altman [Thu, 19 Jan 2012 20:25:44 +0000 (15:25 -0500)]
Windows: cm_buf refcnt must hold buf_globalLock
An assertion in buf_Recycle() was being triggered when a cm_buf_t
object was supposed to be in the free buffer list but wasn't.
buf_Recycle() was racing with another thread. The test for
refCount == 0 was performed while holding the buf_globalLock
exclusively but the InterlockedDecrement(refCount) in buf_Release()
was performed without holding buf_globalLock at all. buf_globalLOck
must be held at least as a read lock. Otherwise, the refCount can
reach 0 prior to the thread blocking for exclusive access to the
buf_globalLock. This provides buf_Recycle() which is holding
buf_globalLock the opportunity to race.
The solution is to make sure that buf_Release() always holds
buf_globalLock as a read lock and then use buf_ReleaseLocked()
to perform the actual decrement and test.
Jeffrey Altman [Sat, 14 Jan 2012 15:31:01 +0000 (10:31 -0500)]
Windows: restrict service to 2 cpus by default
Performance drops off considerably when the number of processors
increases due to lock contention and the cm_SyncOp wait processing.
If the MaxCPUs registry value is not set, limit ourselves to two.
Setting MaxCPUs to zero permits use of all CPUs.
Jeffrey Altman [Sat, 24 Dec 2011 08:11:04 +0000 (03:11 -0500)]
Windows: cm_BufWrite() must wait in cm_SyncOp()
Now that it is permissible for more than one store data operation
to construct BIOD lists in parallel, cm_BufWrite() must be willing
to wait in cm_SyncOp(). Otherwise, the daemon threads will spin.
Jeffrey Altman [Sat, 3 Dec 2011 22:49:47 +0000 (17:49 -0500)]
Windows: apply Nat Pings only to cm_rootUser connections
Use CM_UCELLFLAG_ROOTUSER flag to identify the cm_rootUser
connections and only apply Nat pings to those connections
instead of examining the security state of the connection.
Jeffrey Altman [Fri, 2 Dec 2011 16:14:11 +0000 (11:14 -0500)]
Windows: buf_CleanAsync is not async; rename it
buf_CleanAsync() calls cm_BufWrite() which stores the dirty
buffers synchronously. There is nothing asynchronous about
buf_CleanAsync() so rename it to buf_Clean() and buf_CleanAsyncLocked()
to buf_CleanLocked(). Update the comments to remove the references
to the asynchronous processing which doesn't exist.
That is not to say that the call to buf_Clean() in buf_GetNewLocked()
should not be asynchronous; it should. There is no such functionality
at the moment. One approach would be to modify buf_IncrSyncer to
trigger on an event set by buf_GetNewLocked() instead of the call
to buf_Clean(). Another approach would be registering a background
store event. In any case, that is for another patchset.