Andrew Deason [Wed, 29 Aug 2012 19:14:39 +0000 (14:14 -0500)]
LINUX: Detect non-vectorized aio functions
In kernels before 027445c3, the functions generic_file_aio_read and
generic_file_aio_write, as well as the fs operations aio_read and
aio_write, do not deal with iovecs but rather just use a single
buffer. Detect this, so our aio_read and aio_write implementations
have the correct signatures.
Michael Meffie [Fri, 17 Aug 2012 17:25:17 +0000 (13:25 -0400)]
LINUX: make d_automount work properly on rhel5 kernels
Recent centos/rhel 5 kernels (2.6.18-308.*) started providing the
d_automount operation, but renamed the DCACHE_NEED_AUTOMOUNT flag to
DMANAGED_AUTOMOUNT.
Andrew Deason [Wed, 29 Aug 2012 16:39:01 +0000 (11:39 -0500)]
LINUX: Use struct vfs_path on RHEL5
Some revisions of the kernel from RHEL5 (2.6.18-308.* and possibly
others) renamed 'struct path' to 'struct vfs_path'. So, use
'struct vfs_path' when it exists.
This introduces the afs_linux_path_t typedef, which is defined as
either a struct path, or struct vfs_path.
Michael Meffie [Wed, 15 Aug 2012 21:19:07 +0000 (17:19 -0400)]
vldb_check: fix cross-linked mh entries
When run with -fix, consolidate server numbers in vl entries which
point to the same multi-homed entry. Use the lowest server number
from the set of server numbers which point to the same multi-homed
entry.
Remove unreferenced address entries which are duplicate multi-homed
indexes.
Two passes of vldb_check -fix may be required; first to fix the
vl entry server numbers; then to remove the duplicate address
entries.
Michael Meffie [Thu, 2 Aug 2012 21:24:02 +0000 (17:24 -0400)]
libafs: revert init req to use the real uid
The commit to use wrappers for creditial structure access
inadvertently changed the user id to be the effective uid instead of
the real uid, when no PAG is present, on linux. Revert this so
setuid programs continue to work.
Andrew Deason [Thu, 22 Mar 2012 15:42:38 +0000 (10:42 -0500)]
afs: Set DWriting when truncating a dcache entry
When we truncate a file, we truncate the contents of the relevant
dcache entry chunks, and prevent future FetchData operations from
fetching data beyond the truncation offset. If we never write anything
to that chunk, we never set DWriting, and so on disk it looks like
that dcache entry has valid data for the specified DV. However, since
the data is truncated, this is not true.
If a process holds a file open, truncates it without writing to it,
and then the client crashes (or we have trouble contacting the
fileserver when we close the file), the dcache entry will appear valid
on disk. So the next time we read the dcache entry, we will use the
incorrect cache contents as if they were accurate for the specified
DV.
To avoid this, set DWriting when we truncate a chunk. Normally we only
clear DWriting when we actually send data to the fileserver, so to
clear DWriting in this case, add an additional line to clear it in
afs_StoreAllSegments, after the StoreMini has completed.
Andrew Deason [Fri, 2 Mar 2012 23:22:12 +0000 (17:22 -0600)]
afs: Do not limit fetches based on vcache length
Currently, when we go to the fileserver to fetch some data, we try to
make sure that we do not ask for data beyond the end of the file. For
example, if our chunk size is 1M, and we need to get the first chunk
for a file that is 4 bytes long, we will only ask the fileserver for 4
bytes.
This can cause issues when the file is being extended at the same time
as when we are trying to read the file. Consider the following
example. There is a file named X that has contents "abcd" at dv 1, and
we issue a FetchData64 request for X, only requesting 4 bytes. Right
before the fileserver gets the FetchData64 request, another client
writes the contents "12345" to file X.
The client will then fetch the contents "1234" for that file, at dv 2,
and store that as the contents of the first chunk for file X. On
subsequent reads for file X, applications will now get "1234<NUL>" as
the contents, since the size of the file will be updated to 5, but the
cache manager thinks that "1234" is the correct contents for the first
chunk of X at dv 2. The cache manager will continue to think so until
the cache entry is evicted or invalidated for whatever reason.
To avoid this scenario, always request a full chunk of data if we have
any data to fetch and the file has not been locally truncated. We can
still avoid the fetch at all if it looks like we're fetching beyond
end-of-file, since we know that at least at some point that was
correct information about the file. If this results in us trying to
fetch beyond end-of-file, the fileserver will respond with the correct
length anyway.
We still need to restrict the fetch request length based on
avc->f.truncPos, since the dcache data after avc->f.truncPos needs to
stay empty, since we don't track truncated data any other way. If we
also avoided this restriction, extending a file via truncation after
reducing a file's length via truncation could cause the old file data
to appear again, instead of filling the new file range with NULs.
Note that on at least Linux, with this fix an application can still
read the contents "1234" on the first read in the above example, and
"12345" on subsequent reads. This is just due to when we give the VFS
updates about file metadata, and could be remedied by updating file
metadata immediately from the FetchStatus information from the
FetchData64 call. However, just reading the contents "1234" in the
above example seems like a somewhat plausible outcome; at the very
least, it is an improvement.
Andrew Deason [Fri, 2 Mar 2012 23:18:25 +0000 (17:18 -0600)]
afs: Log a message on invalid FetchStatus receipt
If we get an invalid AFSFetchStatus structure from a server, log a
message to indicate as such. This serves as a warning to urge people
to fix their fileservers, and to explain what is doing.
Andrew Deason [Fri, 2 Mar 2012 23:06:48 +0000 (17:06 -0600)]
afs: Sanity-check some AFSFetchStatus structures
We currently do not do any sanity checking on the AFSFetchStatus
structures returned from fileservers. Add some sanity checking for
BulkStatus and FetchStatus calls, so we do not screw up our cache if a
fileserver gives us bogus data.
If we do get an invalid AFSFetchStatus structure, act as if the server
gave us a VBUSY error code, so we will retry the request. For OpenAFS
fileservers prior to 1.6.1 that yield this situation, VBUSY is likely
the error code the fileserver should have responded anyway.
Andrew Deason [Mon, 27 Aug 2012 18:33:47 +0000 (14:33 -0400)]
rx: dec rx_nWaiting on clearing RX_CALL_WAIT_PROC
Currently, a couple of callers (rxi_ResetCall, and
rxi_AttachServerProc) will decrement rx_nWaiting only if
RX_CALL_WAIT_PROC is set for a call, and the call is on a queue
(presumably rx_incomingCallQueue). This can cause an imbalance in
rx_nWaiting if these code paths are reached when, in another thread,
rx_GetCall has removed the call from its queue, but it has not yet
cleared RX_CALL_WAIT_PROC (this can happen while it is waiting for
call->lock). In this situation, rx_GetCall will remove the call from
its queue, wait, and e.g. rxi_ResetCall will clear RX_CALL_WAIT_PROC;
neither will decrement rx_nWaiting.
This is possible if a new call is started on a call channel with an
extant call that is waiting for a thread; we will rxi_ResetCall in
rxi_ReceivePacket, but rx_GetCall may be running at the same time.
This race may also be possible via rxi_AttachServerProc via
rxi_UpdatePeerReach -> TryAttach -> rxi_AttachServerProc while
rx_GetCall is running, but I'm not sure.
To avoid this, decrement rx_nWaiting based on RX_CALL_WAIT_PROC alone,
regardless of whether or not the call is on a queue. This mirrors the
incrementing rx_nWaiting behavior, where rx_nWaiting is only
incremented if RX_CALL_WAIT_PROC is unset for a call, so this should
guarantee that rx_nWaiting does not become unbalanced.
Mark Vitale [Mon, 20 Aug 2012 21:39:06 +0000 (17:39 -0400)]
vos: convertROtoRW susceptible to VLDB changes during override prompt
vos convertROtoRW obtains a VLDB entry, then peforms some setup logic
(including a possible user prompt) before obtaining a volume lock.
This exposes the code to possible time-of-check/time-of-use issues.
After obtaining the volume lock, get a second copy of the VLDB entry
and compare it to the first copy; if it has changed, fail the conversion
with an error message asking the user to re-issue the vos convertROtoRW
command.
Marc Dionne [Tue, 14 Aug 2012 22:08:51 +0000 (18:08 -0400)]
Linux 3.6: revalidate dentry op API change
The nameidata argument is dropped, replaced by an unsigned flags
value. The configure test is very specific; kernels with the
older API with a signed int flags value should fall through.
Marc Dionne [Tue, 14 Aug 2012 01:55:25 +0000 (21:55 -0400)]
Linux 3.6: d_alias and i_dentry are now hlists
The d_alias pointer is now the head of an hlist. This means the
iterator is a different macro and has no "reverse" version since
hlists have no direct pointer to the list tail.
inode->i_dentry gets the same treatment. Adjust where we use it.
Marc Dionne [Tue, 14 Aug 2012 01:36:15 +0000 (21:36 -0400)]
Linux 3.6: dentry_open API change
dentry_open now takes a path argument that combines the dentry and
the vfsmount pointers.
Add a configure test and a new compat inline function to keep things
cleaner in the main source file.
Jeffrey Altman [Wed, 15 Aug 2012 04:53:21 +0000 (00:53 -0400)]
Windows: disable short names on Win7 and 2008 R2
After listening to a presentation from Microsoft's file system
team and speaking with anti-virus vendors, it is not only safe
to disable ShortNames on non-boot volumes in Win7 and 2008 R2
but it is a definite win for performance, stability and security
of the system.
afs_server: afs_SetServerPrefs() can never be called with null
The one and only call site never calls afs_SetServerPrefs() with a
null pointer, and some but not all of the paths through the #ifdefs
assume this. Remove code that checks for this; it confuses both
humans and the static analyzer.
Andrew Deason [Wed, 21 Dec 2011 22:01:16 +0000 (17:01 -0500)]
afs: Add afs_WriteDCache sanity checks
Writing a non-free non-discarded dcache entry with a zero volume id
can easily cause hash table corruption later on, so make sure we don't
do that. Also log something if the write itself fails, as this usually
indicates an unusual situation involving I/O errors or something.
Andrew Deason [Wed, 21 Dec 2011 21:05:40 +0000 (16:05 -0500)]
afs: Cope with afs_GetValidDSlot errors
Make callers of afs_GetValidDSlot deal with getting a NULL dcache,
which can occur if an error is encountered. Some of these just panic
at least for now, since a code path for recovery is complex, but this
is at least better than dereferencing a NULL pointer.
Andrew Deason [Wed, 21 Dec 2011 20:04:32 +0000 (15:04 -0500)]
afs: Do not always ignore errors in afs_GetDSlot
Currently afs_UFSGetDSlot will silently swallow any error in reading
the specified dslot from disk, and will return a "blank" dcache to the
caller. However, many callers of afs_GetDSlot will be asking for a
dcache that we know exists, and more importantly, we know is on the
global hash table. If a disk error is encountered and we're given a
"blank" dcache, we will erroneously believe the dcache entry is not on
the hash table, causing corruption of the hash table later on.
So instead, modify all callers of afs_GetDSlot to use either
afs_GetValidDSlot or afs_GetNewDSlot. Calling afs_GetValidDSlot
indicates that the given dentry index is known to be valid, and any
error encountered while reading the entry from disk should result in
an error (for disk I/O errors we have no control over, this results in
a NULL dentry returned; for internal consistency errors we panic).
Calling afs_GetNewDSlot indicates that the specified index may not
exist or may not be valid, and so returning a "blank" dentry in that
case is fine.
For memcache, the situation is the same, except any time we go to
"disk" it is an (internal) error, since there is no disk.
Andrew Deason [Wed, 21 Dec 2011 22:25:29 +0000 (17:25 -0500)]
afs: Remove second argument to afs_GetDSlot
All callers of afs_GetDSlot were passing NULL as the second argument
to afs_GetDSlot. So, remove the argument, and behave as if tmpdc was
NULL unconditionally.
Simon Wilkinson [Tue, 17 Jul 2012 15:50:59 +0000 (16:50 +0100)]
opr: Add UUID handling functions
Add a set of functions to the opr library to handle creating and
manipulating UUIDs.
The opr_uuid_t type is held as a 16 octet character string, which
comprises the UUID encoded into network byte order. This is the
primary form for manipulating UUIDs with this library, as it avoids
any nbo/hbo problems.
For applications which require raw access to the UUID components,
the opr_uuid_unpacked structure is provided, and
opr_uuid_pack/opr_uuid_unpack can be used to convert to and from
this format.
Finally, functions to print the UUID as a string, and parse a UUID
from a string, are provided. When printing, we use the standard UUID
format of 000000-0000-0000-0000-00000000. However, the afsUUID library
used to print UUIDs as 000000-0000-0000-00-00-00000000, so we also
accept this format.
Tom Keiser [Tue, 10 Apr 2012 20:26:42 +0000 (16:26 -0400)]
libafs: use kthread_run when available
Use the kthread_run interface on linux to create kernel
threads. This interface allows all the cpus to schedule
afsd threads, instead of just inheriting the cpu affinity of
the main afsd thread.
Michael Meffie [Sat, 28 Jul 2012 15:37:59 +0000 (11:37 -0400)]
vlserver: fix vldb header initialization
Avoid creating new vldb files with zeroed header data.
The code path is as follows; The call to Init_VLdbase makes several
passes. On the first pass, the header is found to be empty, and so a
write lock is obtained on the second pass. On this second pass,
UpdateCache creates a newly initialized header and writes it to the
db. The rd_cheader is set to the newly created header data, and the
wr_cheader is still cleared at this point.
When the transaction on the second pass ended in Init_VLdbase, the
data is committed and vlsynccache() is called. In this call to
vlsynccache(), the cleared write header buffer (wr_cheader) is
copied over the newly initialized rd_cheader buffer. Init_VLdbase
then returns to the caller, and if the caller writes to the db, the
header on disk is then cleared.
Instead of initializing the read header buffer when rebuilding the
db header, initialize the write header buffer. When the ubik
transaction is ended, the call to vlsynccache() updates the contents
of the read header buffer with contents of the new/rebuilt header.
Andrew Deason [Fri, 6 Apr 2012 19:56:07 +0000 (14:56 -0500)]
LINUX: Do not lookup immediately recursive mtpts
On Linux, having a mountpoint in a volume root that points to the same
volume can cause serious problems. By 'immediately recursive', I mean
a situation like the following:
fs mkm mtpt vol
fs mkm mtpt/mtpt vol
If there are multiple dentry aliases for the directory (which is
possible if the directory is a mountpoint), an 'rmdir' on the
recursive mountpoint can cause the client to deadlock. Since the
'rmdir' code path in Linux locks the parent directory inode to perform
the rmdir, and locks the child directory inode after performing a
couple of sanity checks. For an immediately recursive mountpoint,
these two inodes are the same, and so we will deadlock.
Andrew Deason [Fri, 6 Jul 2012 21:37:39 +0000 (16:37 -0500)]
Linux: Make dir dentry aliases act like symlinks
Currently, we try to invalidate other dentries that exist for a
particular dir inode when we look up a dentry. This is so we try to
avoid duplicate dentries for a directory, which Linux does not like
(you cannot have hardlinks to a dir).
If we cannot invalidate the other aliases (because they are being
used), right now we just return the alias. This can make it very easy
to panic the client, due to the sanity checks Linux performs when dong
things like 'rmdir'. If we do something like this:
For the 'rmdir', we will lookup 'mtpt2'. Since 'mtpt' and 'mtpt2'
are mountpoints for the same volume, their dentries point to the same
directory inode. So when we lookup 'mtpt2', we will try to invalidate
the other dentry, but we cannot do that since it is the cwd. So we
return the alias dentry (for 'mtpt'). The Linux VFS layer then does a
sanity check for the rmdir operation, checking that the child dentry's
parent inode is the same as the inode we're performing the rmdir for.
Since the dentry we returned was for 'mtpt', whose parent is 'dir1',
and the actual dir we're performing the rmdir for is 'dir2', this
sanity check fails and we BUG.
To avoid this, make the dentry alias act like a symlink when we
encounter an uninvalidateable dentry alias. That is, we allow multiple
dentry aliases for a directory, however, when the dentry aliases are
actually used, we redirect to a common dentry (via d_automount where
possible, and follow_link elsewhere).
This means that such mountpoints will behave similarly to symlinks, in
that we 'point' to a specific mountpoint dentry. This means that if we
have multiple different ways to get to the same volume, and all are
accessed at the same time, all but one of those mountpoints will
behave like symlinks, pointing to the same mountpoint. So, the '..'
entries for each path will all point to the parent dir of one
mountpoint, meaning that the '..' entry will be "wrong", but for most
cases it will still be correct.
In order to try to make the 'target', pointed-to directory consistent,
we add a new field to struct vcache: target_link. This points to the
dentry we should redirect to, whenever that vcache is referenced. To
avoid (possibly not-feasibly-solvable) problems with refcounting, this
pointer is not actually a reference to the target dentry, but just
serves as a pointer to compare to.
FIXES 130273
Reviewed-on: http://gerrit.openafs.org/7741 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit de381aa0d39e88a1ca0c8ccbb2471c5cad5a964c)
Michael Meffie [Thu, 28 Jun 2012 21:12:24 +0000 (17:12 -0400)]
vldb_check: remove unreferenced mh entries with -fix
When running vldb_check with -fix, clear any mh extent entries which
are set but are not referenced by an address entry in the
IpMappedAddr table. These unreferenced entries already generated a
warning. This commit adds the feature to clear the unreferenced mh
entries using vldb_check -fix option.
Michael Meffie [Fri, 29 Jun 2012 22:10:31 +0000 (18:10 -0400)]
vldb_check: warn about cross-linked mh entries
Warn if an mh extent entry is referenced by more than one server
number in the IpMappedAddr table.
The serveraddr table is used to determine which server numbers have
IP addresses. If, for some reason, multiple server numbers
reference the same mh entry, currently, the correct serveraddr value
is calculated only for the lowest server number in the set of server
numbers which reference the same mh entry. Handle this case, and
warn about the duplicated values in the IpMappedAddr table.
Warn about IpMappedAddr entries which reference non-existent mh
blocks.
Jeffrey Altman [Mon, 13 Aug 2012 21:56:02 +0000 (17:56 -0400)]
Windows: AFSProcessUserFsRequest NULL dereference
Protect against an Irp with a NULL FsContext2 field.
These represent Irps that are not intended for our device
since they do not have an AFSCcb associated with it.
Jeffrey Altman [Mon, 13 Aug 2012 02:51:54 +0000 (22:51 -0400)]
Windows: reset volume NOEXIST flag
In response to fs checkvolumes the NOEXIST flag should be reset.
It should also be reset if the volume location update fails
because of a commumicaton (or other) error with the VLDB server.
The volume's lastUpdateTime is refreshed on error.
Jeffrey Altman [Wed, 8 Aug 2012 20:42:47 +0000 (16:42 -0400)]
Windows: Freelance Discovery configuration
Add new "FreelanceDiscovery" configuration option to permit
Freelance dynroot mode to be used without the automatic discovery
of cells and generation of mount points.
Jeffrey Altman [Wed, 8 Aug 2012 17:54:48 +0000 (13:54 -0400)]
Windows: disable short names on Windows 8
Add "ShortNames" option to control whether 8.3 compatible short
names are generated for objects stored in AFS. Set the default
to on for all operating systems prior to Windows 8 and Server 2012.
Peter Scott [Tue, 7 Aug 2012 13:07:41 +0000 (09:07 -0400)]
Windows: FileNormalizedNameInformation take one
Add a response to FileNormalizedNameInformation requests.
Respond with the long file name. As yet there is no translation
from short name to long name for full paths.
Peter Scott [Mon, 6 Aug 2012 19:15:57 +0000 (15:15 -0400)]
Windows: FSCTL_SET_PURGE_FAILURE_MODE
Windows 8 adds FSCTL_SET_PURGE_FAILURE_MODE. Failure to respond
with success prevents anti-virus filters from scanning the file
system. For now just return success.
Peter Scott [Mon, 6 Aug 2012 19:12:12 +0000 (15:12 -0400)]
Windows: disable short names in redirector option
If requested during redirector initialization, disable short
name processing. Future versions of Windows (8, Server 2012,
and beyond) will no longer require short names.
Derrick Brashear [Mon, 21 Nov 2011 17:06:59 +0000 (12:06 -0500)]
ihandle: don't keep reallyclosing future fds
given that we can mark something invalid for future use, ever,
once we have done so for all fds, we ih_reallyclose is done.
don't persist the setting to the detriment of new fds
Michael Meffie [Thu, 2 Aug 2012 21:24:02 +0000 (17:24 -0400)]
libafs: revert init req to use the real uid
The commit to use wrappers for creditial structure access
inadvertently changed the user id to be the effective uid instead of
the real uid, when no PAG is present, on linux. Revert this so
setuid programs continue to work.
Mark Vitale [Thu, 2 Aug 2012 22:37:05 +0000 (18:37 -0400)]
vos: convertROtoRW may create 2nd RW on a fileserver
If an RW is already present on disk on the target server (any partition),
'vos convertROtoRW' will still convert the RO, creating a second RW on the server.
Detect this and refuse to convert the RO by returning EXDEV (invalid cross-device link).
Andrew Deason [Thu, 2 Aug 2012 15:58:12 +0000 (11:58 -0400)]
rx: Process ICMP unreachable errors
When a machine receives ICMP errors, we can detect them in
AFS_RXERRQ_ENV environments. Many of these errors indicate that a
machine is not reachable, so we are guaranteed to not get a response
from them. When we get such an error for a particular peer, mark all
relevant calls with an RX_CALL_DEAD error, since we know we won't get
a response from them. This allows some calls to dead/unreachable hosts
to fail much more quickly.
Do not immediately kill new calls, since obviously the host may have
come back up since then (or the routing/firewall/etc was fixed), but
only calls that were started before the current error was received.
Note that a call doesn't actually notice until the next rxi_CheckCall,
since directly killing each of the relevant calls would be rather
slow. So, we don't notice a dead peer immediately, though we notice
much more quickly than we used to.
Reorganize the error queue processing a little bit to make this easier
to do.
Andrew Deason [Wed, 1 Aug 2012 20:31:09 +0000 (16:31 -0400)]
LINUX: Fix error queue processing
Receiving error queues in the Linux kernel is a little different from
userspace. When we encounter a cmsg that is not CMSG_OK, we need to
break out of the loop, and not just continue, since we can keep trying
to process the same cmsg over and over. In addition, on successful
return, the msg_control buffer has been modified to point to the next
available buffer space, and msg_controllen contains how many bytes are
remaining. So, we need to adjust the msg_control and msg_controllen
values to get something more familiar.
Andrew Deason [Wed, 1 Aug 2012 19:56:27 +0000 (15:56 -0400)]
LINUX: Avoid SO_ERROR for RXERRQ_ENV
SO_ERROR is for receiving errors from some nonblocking operations; it
has little relevance to our network operations. For Linux, use a
similar structure as userspace error detection, instead of SO_ERROR.
Andrew Deason [Wed, 1 Aug 2012 19:19:02 +0000 (15:19 -0400)]
rx: Create AFS_ADAPT_PMTU and AFS_RXERRQ_ENV
Currently we have the ADAPT_PMTU define, which turns on functionality
in Linux to detect PMTU-related ICMP errors for Rx. However, this is
really turning on two separate pieces of functionality: the PMTU
processing, and the processing for ICMP errors in general.
So split this out into two defines: AFS_ADAPT_PMTU, and
AFS_RXERRQ_ENV. The former is for processing PMTU discovery, and the
latter is for processing ICMP errors. Both of these are left disabled
due to issues in the error processing. Although PMTU discovery is the
only functionality which makes use of ICMP errors, this will change in
the future.
Andrew Deason [Thu, 22 Dec 2011 20:01:52 +0000 (15:01 -0500)]
afs: Indicate error from afs_osi_Read/Write better
Currently afs_osi_Read and afs_osi_Write just return -1 on any I/O
error, even though they know the error code given from the OS VFS.
Just return that code instead so the caller can see what the error
was; but negate it, so it's clear that it is an error.
Andrew Deason [Thu, 22 Dec 2011 19:50:09 +0000 (14:50 -0500)]
afs: afs_osi_Read/Write returns negative on error
afs_osi_Read and afs_osi_Write need to return negative values on
error. EIO is not negative; return -EIO so we don't accidentally
return "success" if someone requested to read or write EIO bytes.
Andrew Deason [Mon, 9 Apr 2012 22:16:42 +0000 (17:16 -0500)]
vos: Do not try to remove backup volume id 0
Currently we always try to delete a BK volume if we're deleting the
RW. If the BK volume id is 0, this is never going to do anything, so
don't try to delete it.
Andrew Deason [Wed, 1 Aug 2012 18:57:06 +0000 (14:57 -0400)]
rx: Remove ADAPT_MTU and MISCMTU
Ever since 5bcf626ddaf92e199c4b46c11ad276013a47db52, ADAPT_MTU has
been unconditionally defined. MISCMTU has always been unconditionally
defined, and not used anywhere. Remove both of these, assuming they
are always defined.
Michael Meffie [Wed, 1 Aug 2012 15:42:34 +0000 (11:42 -0400)]
bozo: avoid canceling the sigkill timer for hung processes
A sigkill signal is sent to fileserver processes when a timeout is
exceeded for shutting down processes for the fs/dafs bnode.
(Currently 30 minutes for the fileserver, 1 minute for the other
server processes.)
If the bnode goal is set to run before this timeout expires, the
timer is incorrectly stopped, and a wedged process is never killed.
Fix this by not canceling the timer when a fs/dafs process has been
signaled to shutdown, regardless of the current goal.
Andrew Deason [Fri, 30 Mar 2012 19:56:52 +0000 (14:56 -0500)]
libafscp: Add afscp_LocalAuthAs
Add the function afscp_LocalAuthAs to libafscp. This allows the caller
to generate credentials based on the KeyFile on local disk, in order
to appear as an arbitrary user.
Andrew Deason [Tue, 31 Jul 2012 18:40:41 +0000 (14:40 -0400)]
LINUX: Always hold afs_xuser for unixuser read
We were failing to hold the afs_xuser lock when we entered our
unixuser traversal for the first time (when the given position is 0).
This means we can release the lock without acquiring it, causing all
kinds of weird behavior.
Just always grab afs_xuser on entry. We could possibly do some tricks
to avoid grabbing this lock until after we've printed the column
headers, but it does not seem worth it.
Andrew Deason [Thu, 26 Jul 2012 21:40:03 +0000 (16:40 -0500)]
LINUX: Hold GLOCK for proc traversal
The functions that traverse unixuser structures for display via /proc
(uu_start et al) call various libafs functions hold and release locks,
etc. To do any of that, we need GLOCK. Amongst other issues, we can
panic if we try to acquire a contested lock without GLOCK, since we
assert glock is held when we sleep for the lock or try to wake other
waiters. The same goes for the legacy CellServDB proc file.
So, hold and release GLOCK as appropriate.
Reviewed-on: http://gerrit.openafs.org/7885 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 5237d3d232f22aaf4f67f3a8727a293f4058a7ae)
Andrew Deason [Fri, 6 Apr 2012 19:56:07 +0000 (14:56 -0500)]
LINUX: Do not lookup immediately recursive mtpts
On Linux, having a mountpoint in a volume root that points to the same
volume can cause serious problems. By 'immediately recursive', I mean
a situation like the following:
fs mkm mtpt vol
fs mkm mtpt/mtpt vol
If there are multiple dentry aliases for the directory (which is
possible if the directory is a mountpoint), an 'rmdir' on the
recursive mountpoint can cause the client to deadlock. Since the
'rmdir' code path in Linux locks the parent directory inode to perform
the rmdir, and locks the child directory inode after performing a
couple of sanity checks. For an immediately recursive mountpoint,
these two inodes are the same, and so we will deadlock.
Andrew Deason [Fri, 6 Jul 2012 21:37:39 +0000 (16:37 -0500)]
Linux: Make dir dentry aliases act like symlinks
Currently, we try to invalidate other dentries that exist for a
particular dir inode when we look up a dentry. This is so we try to
avoid duplicate dentries for a directory, which Linux does not like
(you cannot have hardlinks to a dir).
If we cannot invalidate the other aliases (because they are being
used), right now we just return the alias. This can make it very easy
to panic the client, due to the sanity checks Linux performs when dong
things like 'rmdir'. If we do something like this:
For the 'rmdir', we will lookup 'mtpt2'. Since 'mtpt' and 'mtpt2'
are mountpoints for the same volume, their dentries point to the same
directory inode. So when we lookup 'mtpt2', we will try to invalidate
the other dentry, but we cannot do that since it is the cwd. So we
return the alias dentry (for 'mtpt'). The Linux VFS layer then does a
sanity check for the rmdir operation, checking that the child dentry's
parent inode is the same as the inode we're performing the rmdir for.
Since the dentry we returned was for 'mtpt', whose parent is 'dir1',
and the actual dir we're performing the rmdir for is 'dir2', this
sanity check fails and we BUG.
To avoid this, make the dentry alias act like a symlink when we
encounter an uninvalidateable dentry alias. That is, we allow multiple
dentry aliases for a directory, however, when the dentry aliases are
actually used, we redirect to a common dentry (via d_automount where
possible, and follow_link elsewhere).
This means that such mountpoints will behave similarly to symlinks, in
that we 'point' to a specific mountpoint dentry. This means that if we
have multiple different ways to get to the same volume, and all are
accessed at the same time, all but one of those mountpoints will
behave like symlinks, pointing to the same mountpoint. So, the '..'
entries for each path will all point to the parent dir of one
mountpoint, meaning that the '..' entry will be "wrong", but for most
cases it will still be correct.
In order to try to make the 'target', pointed-to directory consistent,
we add a new field to struct vcache: target_link. This points to the
dentry we should redirect to, whenever that vcache is referenced. To
avoid (possibly not-feasibly-solvable) problems with refcounting, this
pointer is not actually a reference to the target dentry, but just
serves as a pointer to compare to.
afs_server: delete code that has been ifdef'ed out for years
The comments in afs_SetServerPrefs() said "clean up, delete this".
The oldest one is a decade old. Removing these #ifdefs will make
following the rest of the spaghetti #ifdefs a bit easier.
Garrett Wollman [Tue, 9 Aug 2011 04:28:27 +0000 (00:28 -0400)]
libafs: afs_CacheFetchProc can't be called without a dcache pointer
An inspection of the only call site suggests that afs_CacheFetchProc()
can't be called with a null dcache pointer, and code further down
in this function dereferences adc unconditionally (assuming
rxfs_fetchInit() doesn't crash first) so remove the conditional
here.
Probably more of these parameters can and should be included in the
AFS_NONNULL.
OpenAFS does not have separate distributions for the United States
and the rest of the world. Nor are there any restrictions on the
capabilities of the Update Server.
volser: restructure GetNextVol and clients to remove duplicate code
There are several odd-looking but stylized loops involving GetNextVol()
which can be radically simplified if only GetNextVol() would return
a meaningful value. Move all of the code that skips non-volume-header
files in the directory into GetNextVol and have it return a truth value
(instead of always returning zero) that indicates whether it saw
something that looks like a volume header. Then all the odd while
loops and strcmps just collapse into while(GetNextVol(...)).
GetNextVol() had external scope, but there are no callers in the
tree that use it outside of volprocs.c, and it's not part of a
public library interface, so make it static.
While here, don't strcmp() past the end of a filename that begins with
'V' but is too short to be a valid volume name.
afscp: avoid null dereference in _GetSecurityObject error case
Handle the possible error return from krb5_get_host_realm in the
same way as the other error cases (using an anonymous security
object); otherwise "realm" would be left null.