When using an unadorned %config, it's possible that these files will
be replaced by the packaged version during a package update. Changing
%config to %config(noreplace) means that the packaged file will be
installed with the extension .rpmnew if there is already a modified
(from the existing package's version) file with the same name on the
installed machine.
The concern here is that updating an existing system could potentially
change the configuration if the person installing doesn't pay close
attention. The Rule of Least Surprise indicates that we should
try to preserve existing configuration changes whenever possible.
Dave Botsch [Thu, 1 Mar 2012 17:43:36 +0000 (12:43 -0500)]
Fixes dkms.conf for Redhat Enterprise
commit 8e0aaae076f4cccfd2d6ed81ede4e355235b578e , while fixing dkms.conf for
Fedora, broke dkms.conf for RHEL. In RHEL, you get a dkms.conf with too
many backslashes in the "mv" line. The dkms.conf should have the mv line
reading:
mv src/libafs/MODLOAD-*/\$KMODNAME \$DSTKMOD"
for Fedora.
This change checks if we are building on Fedora, and if so, maintains
the extra backslashes. Otherwise, not.
modified: src/packaging/RedHat/openafs.spec.in
Uses the dist tags as specified at
http://fedoraproject.org/wiki/Packaging:DistTag
Reviewed-on: http://gerrit.openafs.org/6851 Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 81a9a33e0bc5455841ba105dab52735c64c7096b)
Jeffrey Altman [Thu, 1 Mar 2012 20:49:12 +0000 (15:49 -0500)]
unix: always retry RX_CALL_BUSY
RX_CALL_BUSY is an indication that the call channel is busy not
that the server is down or otherwise cannot respond. Unconditionally
retry the RPC and do not alter state. We just want to force the use
of a different call channel.
Jeffrey Altman [Wed, 29 Feb 2012 18:07:47 +0000 (13:07 -0500)]
Windows: Workaround Win7 SMB Reconnect Bug
The SMB specification permits the server to save a round trip
in the GSS negotiation by sending an initial security blob.
Unfortunately, doing so trips a bug in Windows 7 and Server 2008 R2
whereby the SMB 1.x redirector drops the blob on the floor after
the first connection to the server and simply attempts to reuse
the previous authentication context. This bug can be avoided by
the server sending no security blob in the SMB_COM_NEGOTIATE
response. This forces the client to send an initial GSS init_sec_context
blob under all circumstances which works around the bug in Microsoft's
code.
Do not call smb_NegotiateExtendedSecurity(&secBlob, &secBlobLength);
As a result of the SMB 1.x bug, all attempts to reconnect fail due to
SMB connection resets. The SMB 1.x redirector will retry indefinitely
but all processes with outstanding requests to \\AFS will block until
the machine is rebooted.
Jeffrey Altman [Sun, 26 Feb 2012 19:45:43 +0000 (14:45 -0500)]
Windows: disable Adv ICF support if not supported
OpenAFS 1.6.x does not require the use of SDK 6.0 or above.
Therefore the Advanced Internet Connection Firewall support
may not be available. In particular, the 32-bit distribution
for 1.6.x does not rely on SDK 6.0 or higher.
Jeffrey Altman [Wed, 18 Jan 2012 00:46:30 +0000 (19:46 -0500)]
Windows: failover and retry for VBUSY
When a file server returns the VBUSY error for an RPC the
cache manager records the 'srv_busy' state in the cm_serverRef_t
structure binding that file server to the active cm_volume_t
object. The 'srv_busy' was never cleared which prevents the
volume from being accessed.
Clear the 'srv_busy' flag whenever cm_Analyze() receives a
CM_ERROR_ALLBUSY error which means that all replicas have
been tried or whenever the error is not VBUSY or VRESTARTING.
Jeffrey Altman [Fri, 25 Nov 2011 14:28:18 +0000 (09:28 -0500)]
Windows: improved idle dead time handling
RX_CALL_IDLE has been treated the same as RX_CALL_DEAD which is
a fatal error that results in the server being marked down. This
is not the appropriate behavior for an idle dead timeout error
which should not result in servers being marked down.
Idle dead timeouts are locally generated and are an indication
that the server:
a. is severely overloaded and cannot process all
incoming requests in a timely fashion.
b. has a partition whose underlying disk (or iSCSI, etc) is
failing and all I/O requests on that device are blocking.
c. has a large number of threads blocking on a single vnode
and cannot process requests for other vnodes as a result.
d. is malicious.
RX_CALL_IDLE is distinct from RX_DEAD_CALL in that idle dead timeout
handling should permit failover to replicas when they exist in a
timely fashion but in the non-replica case should not be triggered
until the hard dead timeout. If the request cannot be retried, it
should fail with an I/O error. The client should not retry a request
to the same server as a result of an idle dead timeout.
In addition, RX_CALL_IDLE indicates that the client has abandoned
the call but the server has not. Therefore, the client cannot determine
whether or not the RPC will eventually succeed and it must discard
any status information it has about the object of the RPC if the
RPC could have altered the object state upon success.
This patchset splits the RX_CALL_DEAD processing in cm_Analyze() to
clarify that only RX_CALL_DEAD errors result in the server being marked
down. Since Rx idle dead timeout processing is per connection and
idle dead timeouts must differ depending upon whether or not replica
sites exist, cm_ConnBy*() are extended to select a connection based
upon whether or not replica sites exist. A separate connection object
is used for RPCs to replicated objects as compared to RPCs to non-replicated
objects (volumes or vldb).
For non-replica connections the idle dead timeout is set to the hard
dead timeout. For replica connections the idle dead timeout is set
to the configured idle dead timeout.
Idle dead timeout events and whether or not a retry was triggered
are logged to the Windows Event Log.
cm_Analyze() is given a new 'storeOp' parameter which is non-zero
when the execute RPC could modify the data on the file server.
Jeffrey Altman [Fri, 3 Feb 2012 16:21:45 +0000 (11:21 -0500)]
Windows: fix cm_DirOpDelBuffer assert
In cm_DirOpDelBuffer() the data version field for a buffer
in cm_dirOp_t.buffers[] can be CM_BUF_VERSION_BAD if the buffer
was added to the buffer list but was never fetched from the file
server. If the buffer was recycled by buf_Get() an attempt to
remove an entry from the directory will be failed as opposed to
fetching the buffer from the file server and performing the local
removal.
Jeffrey Altman [Fri, 3 Feb 2012 16:17:40 +0000 (11:17 -0500)]
Windows: buffer DV ranges do not work for directories
In cm_MergeStatus, always set cm_scache_t.bufDataVersionLow
to the new data version because the cm_dir package does not
support version ranges. All modified dir buffers have their
dataVersion field set to the current data version value.
Failure to update the bufDataVersionLow field can result in
B+ Trees being constructed from out of date directory information.
Jeffrey Altman [Sun, 22 Jan 2012 23:33:43 +0000 (18:33 -0500)]
Windows; release BIOD after status merge
Releasing the BIOD permits the accumulated buffers to be accessed.
Releasing the BIOD before the cm_MergeStatus() call creates a
window where the buffer data version is larger than the cm_scache
data version. Release the BIOD after the status merge.
Jeffrey Altman [Thu, 19 Jan 2012 20:25:44 +0000 (15:25 -0500)]
Windows: cm_buf refcnt must hold buf_globalLock
An assertion in buf_Recycle() was being triggered when a cm_buf_t
object was supposed to be in the free buffer list but wasn't.
buf_Recycle() was racing with another thread. The test for
refCount == 0 was performed while holding the buf_globalLock
exclusively but the InterlockedDecrement(refCount) in buf_Release()
was performed without holding buf_globalLock at all. buf_globalLOck
must be held at least as a read lock. Otherwise, the refCount can
reach 0 prior to the thread blocking for exclusive access to the
buf_globalLock. This provides buf_Recycle() which is holding
buf_globalLock the opportunity to race.
The solution is to make sure that buf_Release() always holds
buf_globalLock as a read lock and then use buf_ReleaseLocked()
to perform the actual decrement and test.
Jeffrey Altman [Sat, 14 Jan 2012 15:31:01 +0000 (10:31 -0500)]
Windows: restrict service to 2 cpus by default
Performance drops off considerably when the number of processors
increases due to lock contention and the cm_SyncOp wait processing.
If the MaxCPUs registry value is not set, limit ourselves to two.
Setting MaxCPUs to zero permits use of all CPUs.
Jeffrey Altman [Sat, 24 Dec 2011 08:11:04 +0000 (03:11 -0500)]
Windows: cm_BufWrite() must wait in cm_SyncOp()
Now that it is permissible for more than one store data operation
to construct BIOD lists in parallel, cm_BufWrite() must be willing
to wait in cm_SyncOp(). Otherwise, the daemon threads will spin.
Jeffrey Altman [Sat, 3 Dec 2011 22:49:47 +0000 (17:49 -0500)]
Windows: apply Nat Pings only to cm_rootUser connections
Use CM_UCELLFLAG_ROOTUSER flag to identify the cm_rootUser
connections and only apply Nat pings to those connections
instead of examining the security state of the connection.
Jeffrey Altman [Fri, 2 Dec 2011 16:14:11 +0000 (11:14 -0500)]
Windows: buf_CleanAsync is not async; rename it
buf_CleanAsync() calls cm_BufWrite() which stores the dirty
buffers synchronously. There is nothing asynchronous about
buf_CleanAsync() so rename it to buf_Clean() and buf_CleanAsyncLocked()
to buf_CleanLocked(). Update the comments to remove the references
to the asynchronous processing which doesn't exist.
That is not to say that the call to buf_Clean() in buf_GetNewLocked()
should not be asynchronous; it should. There is no such functionality
at the moment. One approach would be to modify buf_IncrSyncer to
trigger on an event set by buf_GetNewLocked() instead of the call
to buf_Clean(). Another approach would be registering a background
store event. In any case, that is for another patchset.
Jeffrey Altman [Mon, 21 Nov 2011 18:14:40 +0000 (13:14 -0500)]
Windows: cm_GetSCache do not release unheld lock
if cm_GetNewSCache() fails, an attempt would be made to
release cm_scacheLock which is not held. However, it should
be noted that cm_GetNewSCache() cannot fail without itself
triggering a panic.
Jeffrey Altman [Tue, 15 Nov 2011 23:35:26 +0000 (18:35 -0500)]
Windows: buf_CleanAsyncLocked dirty range only
buf_CleanAsyncLocked() should not instruct cm_BufWrite() to
write a full chunk if the current buffer is the only one that
is dirty. cm_BufWrite() will determine if it is appropriate
to fill a full chunk when storing. Instructing it to check
a full chunk forces it to do more work than necessary.
Jeffrey Altman [Wed, 16 Nov 2011 00:00:05 +0000 (19:00 -0500)]
Windows: cm_SetupStoreBIOD use firstModOffset chunk
When cm_SetupStoreBIOD attempts to store a chunk to the file
server it should not use *inOffsetp as the start of the range.
There is no guarantee that the buffer at *inOffsetp is dirty.
Instead use firstModOffset which refers to the first known
dirty buffer in the range specified by the caller. Attempt
to fill a chunk of consecutive dirty buffers from that point.
smb_ReceiveNTCreateX() calls cm_CheckNTOpen() which now
requires the smb_fid_t allocated fid value for use in share
mode locking. Move the allocation of the smb_fid earlier
in the function and apply necessary cleanup in error paths.
Jeffrey Altman [Sat, 12 Nov 2011 18:41:30 +0000 (13:41 -0500)]
Windows: fix locking hierarchy in service
The smb username lock and the daemon global lock can be requested
while the scache dirlock is held if there are no free buffers
and the service is forced to claw back extents from the redirector.
Adjust the locking hierarchy accordingly.
Jeffrey Altman [Sun, 28 Aug 2011 16:03:53 +0000 (12:03 -0400)]
Windows: afslogon network provider debug registry value
create a new TransarcAFSDaemon\NetworkProvider "Debug" value
to be used for activating the network provider debugging.
The overlapping use of TransarcAFSDaemon\Parameters "TraceOption"
is just too confusing.
Jeffrey Altman [Fri, 26 Aug 2011 17:57:15 +0000 (13:57 -0400)]
Windows: afslogon.dll is not a file system interface
Do not return a file system network type that corresponds
to a real file system inter since afslogon is in fact not
associated with a file system interface. We can't return
WNNC_NET_NONE (0) because that prevents NPLogonNotify()
from being executed. However, if we return an in use
file system value that can confuse the system when the
actual file system's network provider is also installed.
Jeffrey Altman [Fri, 26 Aug 2011 13:36:04 +0000 (09:36 -0400)]
Windows: torture error reporting
When LeaveThread() is called and GetLastError() has already
been called, pass the last error value to LeaveThread(). Otherwise,
the GetLastError() call in LeaveThread() may return an inaccurrate
result.
Jeffrey Altman [Tue, 23 Aug 2011 20:02:28 +0000 (16:02 -0400)]
Windows: change buf_Find*() signature to accept cm_fid_t
The buf_Find*() functions require a cm_fid_t to match with the
cm_buf_t objects not a cm_scache_t. Change the signature so
that the cm_scache_t is not required. It should be possible to
search for a buffer even if the cm_scache_t is not present in
the cache.
Jeffrey Altman [Fri, 19 Aug 2011 01:57:12 +0000 (21:57 -0400)]
Windows: be explicit when mapping sharing violation
Only one lock acquistion failure should be mapping to
CM_ERROR_SHARING_VIOLATION. That is CM_ERROR_LOCK_NOT_GRANTED.
Make it clear that is what we are doing.
Jeffrey Altman [Tue, 9 Aug 2011 18:26:33 +0000 (14:26 -0400)]
Windows: avoid duplicate volume update queries
If multiple volume update queries have stacked up in
cm_UpdateVolumeLocation() and the active query failed,
do not re-issued the blocked queries. Instead, prevent new
queries for 60 seconds and fail those that blocked during
the active query.
Andrew Deason [Fri, 24 Feb 2012 00:28:21 +0000 (18:28 -0600)]
Rewrite make_h_tree.pl in shell script
The current usage of make_h_tree.pl adds a build requirement of
/usr/bin/perl that we did not have prior to commit 1d6593e952ce82c778b1cd6e40c6e22ec756daf1. Do the same thing in a
bourne shell script instead, so we don't need perl.
Note that this is not as generalized as make_h_tree.pl, but it doesn't
need to be. Specifically, this does not strip a leading ../ from found
include directives (nothing in the tree that includes h/* files uses
this), and header filenames containing whitespace almost certainly do
not work correctly.
The h => sys mapping is also much more hardcoded, but that's all we
were using this for anyway.
Derrick Brashear [Wed, 22 Feb 2012 20:57:46 +0000 (15:57 -0500)]
libafs: retry retriable RPCs instead of abandoning
if we get e.g. an idle dead error we should retry
retriable actions, namely data stores. in order
for writing files to work correctly given how
the writeback code is structured it's important that
this not interfere with analyze's shouldRetry decision
on those RPCs
Andrew Deason [Fri, 17 Feb 2012 23:12:46 +0000 (17:12 -0600)]
viced: Relax "h_TossStuff_r failed" warnings
Currently, h_TossStuff_r bails out and logs a message if we detect
that somebody grabbed a reference or locked the host while we tried to
h_NBLock_r. The reasoning for this is that it is not legal for anyone
to h_Hold_r a host that has HOSTDELETED set (but the error is
detectable and recoverable); callers are supposed to check for
HOSTDELETED and not hold a host in that case.
However, HOSTDELETED may not be set when h_TossStuff_r is called,
since we call it if either HOSTDELETED _or_ CLIENTDELETED are set. If
CLIENTDELETED is set and HOSTDELETED is not, it's perfectly fine (and
necessary) for callers to grab a reference to the host. So, if that's
what is going on, don't log a message, since that's normal behavior.
Check for HOSTDELETED before we h_NBLock_r, since it is technically
possible (and legal) for someone to grab a reference to the host and
somehow set HOSTDELETED while we wait for h_NBLock_r to return. Also
log the flags when we see this message.
Andrew Deason [Fri, 17 Feb 2012 22:24:16 +0000 (16:24 -0600)]
viced: Remove extraneous h_AHTAHT_r in h_GetHost_r
We added this address to the host with an addInterfaceAddr_r call just
a few lines before, which adds the host to the address hash table.
Another call to h_AddHostToAddrHashTable_r is pure overhead and
confusing.
Andrew Deason [Fri, 17 Feb 2012 21:46:50 +0000 (15:46 -0600)]
viced: Set h_GetHost_r probefail if MPAA_r fails
Currently, in h_GetHost_r, if we get a connection whose address does
not match an extant host, but the reported uuid does, we ProbeUuid the
old host. If it fails, we call MultiProbeAlternateAddress_r and set
'probefail'. Later on, if 'probefail' is set, we always add the
connection address to the host, and remove the host->host,host->port
address from the host.
However, this is not always correct. Consider the following situation.
We have an existing host that has primary address 1.1.1.1, and also
has addresses 1.1.1.2 and 1.1.1.3 on the interface list but not on the
hash table. Say that host A stops responding on 1.1.1.1, and a
connection comes in from 1.1.1.2. We ProbeUuid 1.1.1.1 and get a
failure, so we call MultiProbeAlternateAddress_r.
MultiProbeAlternateAddress_r probes via rx_Multi the addresses 1.1.1.2
and 1.1.1.3. Say that 1.1.1.3 responds first, and responds
successfully, so MultiProbeAlternateAddress_r sets 1.1.1.3 to be the
primary address for the host.
After MultiProbeAlternateAddress_r returns, 'probefail' is set. A few
lines down, we see that oldHost->host does not match haddr, and
'probefail' is set, so we add 1.1.1.2 to the interface list, and
remove 1.1.1.3, and set 1.1.1.2 to be the primary address, even though
1.1.1.3 is the address we most recently 'know' is correct.
To fix this, only set 'probefail' if MultiProbeAlternateAddress_r also
fails after the failed ProbeUuid call. Conceptually this makes sense,
since if MultiProbeAlternateAddress_r succeeds, it found an address
that responds successfully to ProbeUuid, and it sets that address to
be the primary address. Therefore, after MultiProbeAlternateAddress_r
returns success, the situation is the same as if the 'good' address
was already the primary address, and the ProbeUuid call succeeded, so
'probefail' should be cleared.
Andrew Deason [Fri, 17 Feb 2012 19:14:31 +0000 (13:14 -0600)]
viced: Correctly update addrs on alt addr probe
The functions MultiBreakCallBackAlternateAddress_r and
MultiProbeAlternateAddress_r try to find a valid address in a host's
interface list of addrs. If they find one, they update host->host and
host->port. However, they do so just by changing those fields directly
and by calling h_DeleteHostFromAddrHashTable_r and
h_AddHostToAddrHashTable_r. This leaves the old host->host, host->port
on the interface list, and leaves it marked as 'valid'. Similarly, the
new host and port may still be marked as not 'valid'.
This can result in the host being on the addr hash table via an
address that is not on the host's interface list. After the above
situation occurs, we may call
and then update host->host and host->port, which happens in a variety
of places. Since host->host, host->port is not marked as valid in the
interface list, it is not removed from the addr hash table, but it is
removed from the interface list. Eventually, this can cause the host
to be referenced from the addr hash table even after it has been
freed.
Since this can result in hash table entries pointing to the 'wrong'
host, this can result in FileLog messages such as:
Sun Feb 5 03:16:35 2012 Removing address that does not belong to host 0xdeadbeefdead (1.2.3.4:7001).
To fix this, make MultiBreakCallBackAlternateAddress_r and
MultiProbeAlternateAddress_r update the address list the same way as
all of the code in host.c does; by adding the new address with
addInterfaceAddr_r, removing it with removeInterfaceAddr_r, and
updating host->host and host->port.
Andrew Deason [Thu, 16 Feb 2012 22:20:16 +0000 (16:20 -0600)]
viced: Delete dup host before probing old host
Currently, when the fileserver gets a new connection from an address
not on the addr hash table, we allocate a new host structure and add
that host to the addr hash table. If we then find that that host's
uuid matches the uuid of an extant host, we do the following:
- probe the old host with the uuid, and MultiProbeAlternateAddress_r
if the probe fails
- mark the duplicate host as HOSTDELETED
- manipulate the interface lists
Consider, for example, that we have an extant host ('oldHost') with
address 1.2.3.4:7001, but with 5.6.7.8:7001 on its alternate interface
list. At some point, the 1.2.3.4:7001 interface goes away or becomes
unreachable. A new connection comes in from that same host on
5.6.7.8:7001.
What will happen is we create a new host for address 5.6.7.8:7001, and
then detect the uuid collision. When we try to probe the old address
of 1.2.3.4:7001, it will fail, and we will try to
MultiProbeAlternateAddress_r. MultiProbeAlternateAddress_r will
determine that the alternate address 5.6.7.8:7001 responds
successfully to the probe, and it tries to set 5.6.7.8:7001 to be the
primary address of 'oldHost', and add 'oldHost' to the addr hash table
under 5.6.7.8:7001.
But the "new" host from the incoming connection is already hashed on
the address hash table under 5.6.7.8:7001, so the
h_AddHostToAddrHashTable_r call in MultiProbeAlternateAddress_r fails.
Since we later delete the new duplicate host, this results in
5.6.7.8:7001 being the primary address for the host, but that address
is not anywhere in the address hash table.
This behavior can be seen by the following pair of FileLog messages:
Wed Feb 1 11:02:38 2012 CB: ProbeUuid for 0xdeadbeefdead (1.2.3.4:7001) failed -01
Wed Feb 1 11:02:38 2012 h_AddHostToAddrHashTable_r: refusing to hash host beefdead, baadcafe (5.6.7.8:7001) already hashed
While those message do not necessarily indicate this problem, this
problem will result in those messages.
To fix this, mark the duplicate host as HOSTDELETED before we do any
probing on 'oldHost'. This way, if MultiProbeAlternateAddress_r tries
to add 'oldHost' to the addr hash table under 5.6.7.8:7001, it will be
able to do so successfully, since the old duplicate host is deleted.
Andrew Deason [Mon, 13 Feb 2012 20:11:36 +0000 (14:11 -0600)]
Rx: Avoid lastBusy/PEER_BUSY discrepancy
If an rx call has the RX_CALL_PEER_BUSY flag set, but the call's
conn->lastBusy is not set, we can easily cause an rx caller to loop
infinitely. rx_NewCall will see that lastBusy for a call channel is
not set, and will use that call channel, but rxi_CheckBusy will note
that the call appears busy and that there are non-busy call channels
on the same conn, and so will return RX_CALL_BUSY.
This can currently happen in rxi_ResetCall, since we set
RX_CALL_PEER_BUSY on the call again if the call had that flag set when
rxi_ResetCall was called. If we are calling rxi_ResetCall with
'newcall' set, the passed in call is unrelated to the new call, since
it was obtained from the free list. Thus, the busy-ness of the call
should be ignored. Fix this by only paying attention to the incoming
RX_CALL_PEER_BUSY flag if 'newcall' is not set.
Also prevent this from happening by clearing RX_CALL_PEER_BUSY in
rx_NewCall when we select a call and clear lastBusy for that call.
Derrick Brashear [Tue, 13 Dec 2011 16:24:16 +0000 (11:24 -0500)]
volser: allow clonevol purge id to be new id
effectively the same functionality that reclone already uses, but
for some reason we artificially limit it out of clone despite
the interface being there for it. it used to be there. put it back.
Andrew Deason [Wed, 8 Feb 2012 22:03:29 +0000 (16:03 -0600)]
RedHat: Fail openafs-client 'stop' on rmmod error
Currently, the openafs-client RPM init script ignores any error
reported by rmmod. If 'umount /afs' succeeds but rmmod does not, the
client may panic the machine if the client is started again (from e.g.
running the 'restart' init script method), since afsd will try to
initialize AFS with a libafs that has been shut down.
So, do not ignore errors from 'rmmod', and instead fail the 'stop'
method from the init script if we get an error.
Andrew Deason [Tue, 20 Dec 2011 22:44:42 +0000 (17:44 -0500)]
viced: Keep H_LOCK while locking host in h_Alloc_r
Currently in h_Alloc_r, we h_Lock_r the host, so we have it locked on
return. However, h_Lock_r drops the host glock, which is bad in this
situation since we have already added the host to the global hash
table, so other threads may see it. This can mean that by the time
h_Alloc_r returns, the returned host may have HOSTDELETED set, and/or
the addresses associated with the host may be completely different.
h_Alloc_r's caller, h_GetHost_r, seems to assume that the host is
still associated with the address of the passed-in connection. When
this is not true, this can result in the host structure getting into a
strange state, such as the primary addr/port may not be hashed. The
host may also have HOSTDELETED set, in which case we're not supposed
to be dealing with it at all.
To avoid these problems, lock host->lock directly in h_Alloc_r,
without going through h_Lock_r and dropping H_LOCK. Also do it as one
of the first things we do to initialize the host, just to make sure
that if anybody else happens to see the host, it is locked by us when
they do.
Tom Keiser [Wed, 1 Feb 2012 08:31:23 +0000 (03:31 -0500)]
com_err: correctly deal with lack of libintl
On machines lacking a libintl, _intlize() currently fails to initialize
the output error string--leading to tools (e.g., translate_et) returning
a null string; make afs_com_err fall back to returning the en/US canonical
error text when we don't have any i18n support...
Christof Hanke [Sun, 29 Jan 2012 17:08:57 +0000 (18:08 +0100)]
linux: fix probing for noop_fsync
Commit 267934d0e6910c8d8166a6e78f93c1bab40857b8 introduced
probing code to deal with the renameing of simple_fsync
inside the linux-kernel.
This test does not take different parameter-lists
for noop_fsync or simple_fsync resp. into account.
Fix this.
Reviewed-on: http://gerrit.openafs.org/6628 Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Derrick Brashear <shadow@dementix.org> Tested-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 20e82cecd9008f9b3467c9a323c5c3abf27f3021)
Andrew Deason [Mon, 6 Feb 2012 19:23:41 +0000 (13:23 -0600)]
Disable kernel opt by default on Solaris 10 and 11
With newer Solaris Studio (sometime in the 12.* series), cc started
adding SSE instructions to optimized x86 code, which is invalid for
kernel code and can generate panics. There appears to be no way to
turn this off currently (-xvector=%none is non-functional), so default
to not optimizing kernel code.
Andrew Deason [Thu, 2 Feb 2012 23:35:52 +0000 (17:35 -0600)]
SOLARIS: Use kcred instead of afs_osi_cred
For many vfs ops to the cache, we currently pass &afs_osi_cred for our
credentials, which is a mostly zeroed-out credential structure. In
some modern versions of Solaris (Solaris 11), at least some parts of
this structure need to not be NULL (cr_zone), or we will panic.
The Solaris kernel provides a 'kcred' credentials structure for the
purpose of using "kernel" credentials for i/o. So just use that
instead for Solaris 8 and beyond, since kcred has existed at least
since Solaris 8.
Andrew Deason [Thu, 22 Dec 2011 20:48:49 +0000 (15:48 -0500)]
afs: Panic on afs_conn refcount imbalance
An undercounted afs_conn can easily cause a panic and/or memory
corruption later on, since we put an rx_connection reference with each
afs_conn reference. Panic as soon as we detect this, as this indicates
a serious bug.
Michael Meffie [Wed, 14 Dec 2011 17:52:51 +0000 (12:52 -0500)]
Unix CM: reset blacklist on hard-mount retry
Reset black-listed servers on a request when retrying due to a
hard-mount retry. When hard-mounts are in effect, a request may
retry indefinitely. If all the servers have been black-listed
due to a transient error, the request may never complete.
Reviewed-on: http://gerrit.openafs.org/6330 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit faa58c9f60a158481bdfee27e175a37c5fcd64aa)
Andrew Deason [Thu, 10 Nov 2011 21:18:41 +0000 (15:18 -0600)]
SOLARIS: Do not build x86 kernel module on 5.11
Oracle Solaris 11 no longer supports x86 (amd64 is required). If we
try to build the x86 module, /usr/include/sys/kobj.h complains that
the ISA is unsupported, and refuses to go on. So, just remove
MODLOAD32 from the libafs directories to build on sunx86_511.
when we are going to hit the backend storage, disable keepalives.
the net effect of this is that no idle dead time is needed; instead,
the normal dead time will result in a connection with no activity
simply dying naturally if i/o blocks forever.
it's important that keepalives be enabled during callback breaks,
so that is done.
Jeffrey Altman [Mon, 28 Nov 2011 17:58:02 +0000 (12:58 -0500)]
rx: RX_CALL_IDLE and RX_CALL_BUSY
Allocate new Rx error codes for Idle and Busy calls but do not
send these errors on the wire. They are only intended for local
use.
RX_CALL_IDLE is an indication to an application that requests it
that the rx peer is maintaining an open call channel but has not
sent any actual data for the length of the registered idle dead
timeout.
RX_CALL_BUSY is an indication to an application that requests it
that the rx peer believes the selected call channel is in use by
a pre-existing call.
When either RX_CALL_IDLE or RX_CALL_BUSY are assigned as the call
error and an abort must be sent to the rx peer, the errors are
translated to RX_CALL_TIMEOUT. This is necessary because it is
not possible to add new Rx error values in a method that is safe
for peers that are not expecting them.
This patchset also documents which Rx errors defined in rx.h are
used on the wire and which are not.
The Unix and Windows cache managers are updated to build with
these new error codes.
Andrew Deason [Mon, 7 Mar 2011 17:08:26 +0000 (11:08 -0600)]
RX: Avoid timing out non-kernel busy channels
When we encounter a "busy" call channel (indicated by receiving
RX_PACKET_TYPE_BUSY packets), we can error out a call with
RX_CALL_TIMEOUT to try and get the application code to retry the call.
However, many RX applications are not aware of this, and will just
fail with an error upon receiving a single busy packet.
So instead, make this behavior optional, and only do it if the
application tells us what specific error it expects to receive when a
busy call channel is detected. Enable this behavior for the Unix cache
manager, as it can cope with receiving an RX_CALL_TIMEOUT error in
this scenario.
Andrew Deason [Fri, 13 Jan 2012 18:43:16 +0000 (13:43 -0500)]
vol: remove SYNC fatal_error processing
Currently SYNC clients will "disable" themselves on certain error
patterns. For example, if the server end closes its file descriptor
too many times, or takes too long and then closes the fd, the SYNC
client will return an error and set fatal_error. On any subsequent
SYNC requests, the request will immediately fail without contacting
the server, often making SYNC client programs effectively useless
until they are restarted.
There isn't really any reason to cause future requests to fail.
Transient problems in the fileserver can easily make this situation
possible (e.g. a fileserver can crash but still take several minutes
to close the SYNC fd while the core is written to disk), and so while
we may return an error for a specific problematic request, future
requests may be fine.
So, just remove everything related to fatal_error, so future SYNC
requests can continue to be attempted. Adjust some log messages to
reflect the new behavior.
Marc Dionne [Wed, 18 Jan 2012 01:19:54 +0000 (20:19 -0500)]
rx: Correctly test for end of call queue
The intention of this condition is to check if the current call
being considered is the last one on the queue, but the test is
incorrect. A null next pointer indicates a removed item, not
the end of the queue.
Use the queue_IsLast macro instead to correctly determine that
this is the last item in the queue and that a call has to be
selected, either the current one or a previously seen good choice.
This can cause calls to get permanently stuck in the call queue
and never get assigned to a thread, even when all threads are
idle.
Andrew Deason [Wed, 11 Jan 2012 15:00:35 +0000 (10:00 -0500)]
vol: Fix VCreateVolume special inode cleanup
In order to dec the relevant special inodes, we need to know the
parent vol id in addition to the vol id itself. Use the appropriate
volume IDs when IH_DEC'ing special inodes after we fail to create the
volume, so we don't leave behind special inodes.
Marc Dionne [Fri, 6 Jan 2012 22:22:35 +0000 (17:22 -0500)]
libuafs: only rebuild h directory when needed
A few changes to allow a "make all ; sudo make install ; make all..."
workflow to work without manually removing files in between.
Make the rebuilding of the h directory dependent on the source
files scanned to build it. This prevents it from being rebuilt
for every "make install".
While we're here, use -f when removing linktest for the clean target.
This allows "make clean" to remove it without prompting when the user
doesn't have write access to the file, as is the case when make install
rebuilds it as root.
afs: discard cached state when we are unsure of validity
in the event we got a network error, we don't know if the server
completed (or will complete) our operation. we can assume nothing.
a more complicated version of this could attempt to verify that the
state is what we expect it to be, but in extended callbacks universe
this is potentially easier to solve anyway. for now, return the
error to the caller, and mark the vcache unstat'd.
Andrew Deason [Fri, 18 Nov 2011 16:25:08 +0000 (10:25 -0600)]
DAFS: Atomically re-hash vnode in VGetFreeVnode_r
VGetFreeVnode_r pulls a vnode off of the vnode LRU, and removes the
vnode from the vnode hash table. In DAFS, we may drop the volume glock
immediately afterwards in order to close the ihandle for the old vnode
structure.
While we have the glock dropped, another thread may try to
VLookupVnode for the new vnode we are creating, find that it is not
hashed, and call VGetFreeVnode_r itself. This can result in two
threads having two separate copies of the same vnode, which bypasses
any mutual exclusion ensured by per-vnode locks, since they will lock
their own version of the vnode. This can result in a variety of
different problems where two threads try to write to the same vnode at
the same time. One example is calling CopyOnWrite on the same file in
parallel, which can cause link undercounts, writes to the wrong vnode
tag, and other CoW-related errors.
To prevent all this, make VGetFreeVnode_r atomically remove the old
vnode structure from the relevant hashes, and add it to the new hashes
before dropping the glock. This ensures that any other thread trying
to load the same vnode will see the new vnode in the hash table,
though it will not yet be valid until the vnode is loaded.
Note that this only solves this race for DAFS. For non-DAFS, the vol
glock is held over the ihandle close, so this race does not exist.
The comments around the callers of VGetFreeVnode_r indicate that
similar extant races exist here for non-DAFS, but they are unsolvable
without significant DAFS-like changes to the vnode package.
Andrew Deason [Tue, 27 Dec 2011 02:22:08 +0000 (21:22 -0500)]
afs: Grab a reference to setp in afs_icl_Event4
We can drop GLOCK in several places in afs_icl_Event4 and the
afs_icl_AppendRecord callee. To ensure that the given afs_icl_set does
not get freed while we have GLOCK dropped, grab a reference to the
set.
Thanks to Ryan C. Underwood for reporting an issue triggered by this.
Michael Meffie [Fri, 12 Aug 2011 18:29:48 +0000 (14:29 -0400)]
xstat: cm xstat time values are 32 bit
The kernel space cm xstat time structures are implemented as 32
bit values in memory and on the wire. Define the client side
xstat userspace structures as 32 bit time values as well to avoid
size mismatches on systems with native 64 bit time values.