Andrew Deason [Wed, 14 Nov 2012 00:27:11 +0000 (18:27 -0600)]
afs: Handle VNOSERVICE as a timeout
For whatever reason, the fileserver uses VNOSERVICE to indicate that
an Rx call was killed due to an idledead timeout. It is not used for
any volume errors, so treat it like the idle dead error codes.
Andrew Deason [Wed, 14 Nov 2012 00:15:21 +0000 (18:15 -0600)]
afs: Slight restructuring in afs_Analyze
We test for acode < 0 && acode != VRESTARTING, but then immediately
test for specific values for acode. Move this conditional down, and
remove a level of indentation for the next couple of acode checks.
This commit should introduce no functional change.
Andrew Deason [Tue, 31 Jul 2012 18:40:41 +0000 (14:40 -0400)]
LINUX: Always hold afs_xuser for unixuser read
We were failing to hold the afs_xuser lock when we entered our
unixuser traversal for the first time (when the given position is 0).
This means we can release the lock without acquiring it, causing all
kinds of weird behavior.
Just always grab afs_xuser on entry. We could possibly do some tricks
to avoid grabbing this lock until after we've printed the column
headers, but it does not seem worth it.
Andrew Deason [Tue, 23 Oct 2012 20:47:06 +0000 (15:47 -0500)]
ptserver: Avoid inet_ntoa
The ptserver uses inet_ntoa in a few places, such as for calculating
host CPS. This isn't safe in pthreaded environments, so use
afs_inet_ntoa_r instead.
Andrew Deason [Wed, 31 Oct 2012 20:55:35 +0000 (15:55 -0500)]
afs: Never use GetNewDSlot after init
Currently there are two ways to get a dcache via a slot number:
afs_GetNewDSot and afs_GetValidDSlot. afs_GetValidDSlot assumes that
the given slot number refers to a dcache entry that is valid on disk;
with afs_GetNewDSlot, the given slot may not be valid, and if it is
not, an empty 'template' dcache is returned.
afs_GetNewDSlot is useful for initializing cache files, since if a
given dcache slot exists on disk and contains valid data, we use the
dcache like normal. If it does not already exist or does not contain
valid data, we fill in the missing data after afs_GetNewDSlot returns.
However, for all other uses, afs_GetNewDSlot is incorrect, and causes
various serious problems. After we have initialized our dcache
entries, any attempt to read a dcache by slot number should succeed,
since the number of dcache entries never changes after we are started,
and we initialized all of them during client startup.
Some code outside of afs_InitCacheFile was still using
afs_GetNewDSlot; code that reads in a dslot from the free or discard
list. In these cases, if there is any error reading the dcache slot
from disk, we will be given a dcache that has some of its fields not
filled in properly. Notably, we assume that the entry is not on the
global hash table (we set tdc->f.fid.Fid.Volume to 0), and the
tdc->f.inode field is not initialized at all, leaving it set to
whatever was in memory for that tdc before we tried to read the slot
from disk.
This can cause cache corruption, since tdc->f.inode can point to the
inoder for a different existing cache file, so writing to that dcache
modifies the data for another cached file.
To avoid this, modify the non-afs_InitCacheFile callers of
afs_GetNewDSlot to avoid afs_GetNewDSlot. Since these callers read
from the free/discard list, the contents of the dcache entries are not
valid (the cell, volume, dv, etc are not valid), though they must
exist on disk (we have a valid inode number for them). So, create a
new function, afs_GetUnusedDSlot, to get a dcache that must exist on
disk, but does not represent any valid data. Use this for callers that
must get a dslot from the free/discard list.
Add some comments to try and help explain what is going on.
Andrew Deason [Wed, 21 Nov 2012 16:39:51 +0000 (10:39 -0600)]
LINUX: Dir follow_link should set LAST_BIND
For our faux-symlink directory follow_link operation, we leave the
given nameidata struct with an invalid 'last' component. That is,
nd->last is not changed or set to anything meaningful.
Usually the callers of our follow_link op do not care about the last
component of the nameidata. However, at least one caller does: the
caller near the do_link label in open_namei(). This is called during
processing for O_CREAT operations on symlinks, and since our
directories look like symlinks, it gets called. It tries to use
nd->last to look up the last component of the dereferenced path (so it
can try to create it, as necessary), but since our nd->last is not
set, this will not work.
Specifically, our nd->last.name is not pointing into the names cache,
so the subsequent putname/__putname on it will corrupt the names
cache. However, even if this were not a problem, the actual contents
of the last component do not seem meaningful so this would probably
result in incorrect behavior anyway.
To avoid all of this, set nd->last_type to LAST_BIND, so any callers
know that the last component of the given nd is not valid, and we are
pointing directly to the target component with a dentry.
Reviewed-on: http://gerrit.openafs.org/8489 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marc Dionne <marc.c.dionne@gmail.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit bd57c7d64844ca26d80f2b29db470dacd134fc56)
Change-Id: I4defb55064a4452e437b8a6c3e600887b4749fff
Reviewed-on: http://gerrit.openafs.org/8543 Reviewed-by: Ken Dreyer <ktdreyer@ktdreyer.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: Stephan Wiesand <stephan.wiesand@desy.de>
Andrew Deason [Thu, 15 Nov 2012 00:29:35 +0000 (18:29 -0600)]
afs: Do not skip flushing pages for dv-0 files
If the dv for a file is 0, we know the file is empty. Currently we
skip flushing pages for such files, presumably the idea being there is
no data in the file, so there should be no pages to flush.
However, Linux seems to keep empty pages around for empty files. So, a
future read can result in the application reading a page full of
zeroes, unless we flush the page here. While this has only been found
to happen on Linux 2.6.22 and later (and distribution-specific
backports, like RHEL 2.6.18-128), other platforms could in theory also
choose to do this. It would be difficult to find out when another
platform started to behave like this, so just remove this skip for
everyone so we never have to deal with this again.
Replace this code with a comment with a quick explanation, in case
anyone tries to add a similar optimization here in the future.
Jeffrey Altman [Tue, 27 Nov 2012 05:16:58 +0000 (00:16 -0500)]
Windows: cm_LookupInternal obtain type of target
cm_LookupInternal needs to return the target of a mount point
if the matching directory entry is a mount point. Therefore, if
the target type is unknown the status information must be queried.
Michael Meffie [Tue, 30 Oct 2012 14:22:40 +0000 (10:22 -0400)]
vol: allow non-dafs volume utils to attach with V_READONLY again
Allow non-fileserver, non-dafs, programs to attach volumes with the
V_READONLY mode again. This was lost during the code changes for
dafs.
The caller sends a fssync request to the fileserver, which updates the
on-disk contents of the volume headers, before the caller reads the
volume headers, allowing the caller to have the most recent info about
the volume. The fileserver still has the volume in use.
Later in the attachment process, the inUse check is skipped for the case
of a non-fileserver process which is attaching the volume using the
V_READONLY mode, otherwise the attachment would incorrectly mark the
volume as needing to be salvaged.
Note: The mode checks in VMustCheckOutVolume() are correct. We must
checkout the volume when attaching with the V_READONLY mode. This
fix updates the VShouldCheckInUse(), in which an additional
exception was added to cover the case for V_READONLY mode from a non-
fileserver process.
Note: A check is added in the dafs version of attach to avoid overwriting the
inUse field when a volume utility is attaching a volume in V_READONLY mode.
Currently, V_READONLY is not used by dafs, but this was done to avoid future
errors.
Reviewed-on: http://gerrit.openafs.org/8339 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 0eaa0d1baa8b8fe115301f188ce32176acc7b065)
Michael Meffie [Tue, 30 Oct 2012 14:41:12 +0000 (10:41 -0400)]
fix stale volume info from vos examine (non-dafs)
A volume examine on a non-dafs volume server/fileserver can show old
information, including old volume update time, for up to about 20
minutes. The non-dafs volume server reads the volume information
from the volume headers, which are updated by the fileserver only
periodically to avoid excessive i/o.
Before dafs, when the volume server performed a volume examine, the
volume server would send a fssync command to the fileserver with the
request FSYNC_NEEDVOLUME and mode V_READONLY. The fileserver writes
the current memory contents to disk on this fssync command. The
volume server would then attach the volume, reading the current
volume data.
The dafs volume/fileserver avoids this extra i/o by using a new set
of fssync commands to retrieve the volume information from the
fileserver. However, the non-dafs volume server does not use the new
fssync commands and reads the volume headers from disk.
Revert the volume attachment processing for the non-dafs volume
server to request the volume with the V_READONLY mode. This causes
the fileserver to update the volume headers, allowing the volume
server to read the up to date volume header data.
Sadly, this adds another dafs ifdef to the already twisty maze of
passages that all look alike.
This changes the volserver to use the V_READONLY attachment mode
only for the case of getting a single volume, as that what was
done in 1.4.x.
Reviewed-on: http://gerrit.openafs.org/8327 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit bcb077a00fb575e7beb92739646054ea67ca0b79)
Ben Kaduk [Wed, 7 Nov 2012 15:08:33 +0000 (10:08 -0500)]
Catch up to FreeBSD non-MPSAFE deorbit
All filesystems must have their own locking now.
We have been MPSAFE for quite some time, but the preprocessor macro
"MPSAFE" has been removed and we must catch up in order to compile.
The MNTK_MPSAFE macro has not yet been removed, but it is toothless
now, so we can preemptively stop using it.
Ben Kaduk [Thu, 28 Jun 2012 02:04:24 +0000 (22:04 -0400)]
Patch up FreeBSD-10 support
The auto-guessing code for sysnames produces *_fbsd_100, so we can't
just claim that we'll be *_fbsd_1000 for kicks.
Revert back to the old behavior so as to be less disruptive.
Ben Kaduk [Sat, 23 Jun 2012 01:33:50 +0000 (21:33 -0400)]
Catch up on fbsd releases
Pull in the changes needed to even have a chance at supporting
FreeBSD 8.3, 8.4, 9-stable, and 10-current.
Conditionals for changed interfaces in a follow-up commit.
Andrew Deason [Thu, 5 May 2011 16:37:12 +0000 (11:37 -0500)]
libafs: Correct afs_LoopServers flags
AFS_LS_DOWN was actually checking up servers, and AFS_LS_UP was
checking down servers. Fix the handling of the 'adown' flag so we do
the right thing.
1.6-only: Note that this does not contain the change to afs_FlushVCBs,
since 1.6 does not contain change cee2c677, which introduced the
relevant afs_FlushVCBs afs_LoopServers call.
Marc Dionne [Fri, 12 Oct 2012 20:31:24 +0000 (16:31 -0400)]
libafs: Fix second pass in ShakeLooseVCaches
Commit 3105c7ff introduced a two phase process for reclaiming
vcache entries. First go through the list and do what's possible
without sleeping (skipping aliased dentries on Linux), then do
a second pass only if necessary, allowing sleeping.
Unfortunately the test for the end of the VLRU scan is incorrect
and can never trigger, so this second pass was effectively disabled
and any code that is conditional on defersleep=1 was never
exercised. The code to start the second scan also has issues.
Fix the end of VLRU test, and also correctly set the variables
needed to restart the scan.
Michael Meffie [Wed, 4 Jul 2012 21:54:02 +0000 (17:54 -0400)]
vlserver: fix logging of ip addresses
Remove the spurious dates surrounding IP addresess in the VLLog.
Instead of multiple calls to the logging function for a given log
line, format a string containing the addresses and call the log
function once.
Changes the log output from,
... The following fileserver is being registered in the VLDB:
... [Tue Jul 4 14:11:43 2012 192.168.10.128Tue Jul 4 14:11:43 2012 ]
... It will create a new entry in the VLDB.
to,
... The following fileserver is being registered in the VLDB:
... [192.168.10.128]
... It will create a new entry in the VLDB.
Michael Meffie [Wed, 15 Aug 2012 21:19:07 +0000 (17:19 -0400)]
vldb_check: fix cross-linked mh entries
When run with -fix, consolidate server numbers in vl entries which
point to the same multi-homed entry. Use the lowest server number
from the set of server numbers which point to the same multi-homed
entry.
Remove unreferenced address entries which are duplicate multi-homed
indexes.
Two passes of vldb_check -fix may be required; first to fix the
vl entry server numbers; then to remove the duplicate address
entries.
Michael Meffie [Thu, 28 Jun 2012 21:12:24 +0000 (17:12 -0400)]
vldb_check: remove unreferenced mh entries with -fix
When running vldb_check with -fix, clear any mh extent entries which
are set but are not referenced by an address entry in the
IpMappedAddr table. These unreferenced entries already generated a
warning. This commit adds the feature to clear the unreferenced mh
entries using vldb_check -fix option.
Michael Meffie [Fri, 29 Jun 2012 22:10:31 +0000 (18:10 -0400)]
vldb_check: warn about cross-linked mh entries
Warn if an mh extent entry is referenced by more than one server
number in the IpMappedAddr table.
The serveraddr table is used to determine which server numbers have
IP addresses. If, for some reason, multiple server numbers
reference the same mh entry, currently, the correct serveraddr value
is calculated only for the lowest server number in the set of server
numbers which reference the same mh entry. Handle this case, and
warn about the duplicated values in the IpMappedAddr table.
Warn about IpMappedAddr entries which reference non-existent mh
blocks.
Mark Vitale [Mon, 27 Aug 2012 19:11:32 +0000 (15:11 -0400)]
vos: convertROtoRW - prevent VLDB corruption
vos convertROtoRW incorrectly marks the first VLDB entry as the
new RW if the converted RO is not in the VLDB. Correct this
by creating a new valid RW site in the VLDB entry.
Mark Vitale [Tue, 4 Sep 2012 13:06:44 +0000 (09:06 -0400)]
vos: convertROtoRW incorrect warning when RO not in VLDB
vos convertROtoRW will issue an incorrect warning about a partition
mismatch if the RO to convert is not in the VLDB. Only check the
partition if the RO is in the VLDB.
Mark Vitale [Mon, 20 Aug 2012 21:39:06 +0000 (17:39 -0400)]
vos: convertROtoRW susceptible to VLDB changes during override prompt
vos convertROtoRW obtains a VLDB entry, then peforms some setup logic
(including a possible user prompt) before obtaining a volume lock.
This exposes the code to possible time-of-check/time-of-use issues.
After obtaining the volume lock, get a second copy of the VLDB entry
and compare it to the first copy; if it has changed, fail the conversion
with an error message asking the user to re-issue the vos convertROtoRW
command.
Mark Vitale [Thu, 2 Aug 2012 22:37:05 +0000 (18:37 -0400)]
vos: convertROtoRW may create 2nd RW on a fileserver
If an RW is already present on disk on the target server (any partition),
'vos convertROtoRW' will still convert the RO, creating a second RW on the server.
Detect this and refuse to convert the RO by returning EXDEV (invalid cross-device link).
Andrew Deason [Tue, 25 Sep 2012 16:16:35 +0000 (11:16 -0500)]
RedHat: Avoid the DKMS escaping silliness
Depending on the version of DKMS, the current MAKE[0] variable in the
dkms.conf needs different numbers of backslashes. Commit 81a9a33e
tried to address this by changing the contents of dkms.conf depending
on whether or not we were on Fedora. However, the change occurred in
DKMS 2.2, so if someone running RHEL tries to use a newer DKMS, this
will fail.
So instead of trying to guess at the level of escaping we need, just
avoid needing to escape anything with backslashes. We can quote the
heredoc marker to avoid variable expansion inside the heredoc, we can
use a case statement instead of using backticks and local variables
and such, and we can use single quotes for the outer MAKE assignment.
With this, we should not need any backslashes when writing dkms.conf,
so we should work with any DKMS version.
Reviewed-on: http://gerrit.openafs.org/8156 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
(cherry picked from commit 72f1f345ece09b1fbd113af17c9e8e25ec9dffa5)
Andrew Deason [Tue, 21 Aug 2012 22:03:30 +0000 (17:03 -0500)]
LINUX: Avoid symlink-y resolution limits
Implementing the d_automount or follow_link function pointers for our
directories means that we can hit symlink resolution limits during
lookup, since we look like a "symlink". We can hit these limits pretty
easily if there are just too many directories in the lookup path.
Our pseudo-symlink directories cannot contribute to an infinite
resolution loop, since our destination is always an actual directory,
not a symlink that will result in more redirection. So, decrement the
total_link_count counter when our d_automount or follow_link code is
reached, so we do not contribute to hitting the max resolution limit.
Note that this is not related to recursive symlink lookup (link_count)
but only to the iterative symlink limit (total_link_count). Our
lookups are not recursive here, and we are not causing more recursive
lookups like a normal text-based symlink would do.
Marc Dionne [Tue, 14 Aug 2012 22:08:51 +0000 (18:08 -0400)]
Linux 3.6: revalidate dentry op API change
The nameidata argument is dropped, replaced by an unsigned flags
value. The configure test is very specific; kernels with the
older API with a signed int flags value should fall through.
Marc Dionne [Tue, 14 Aug 2012 01:55:25 +0000 (21:55 -0400)]
Linux 3.6: d_alias and i_dentry are now hlists
The d_alias pointer is now the head of an hlist. This means the
iterator is a different macro and has no "reverse" version since
hlists have no direct pointer to the list tail.
inode->i_dentry gets the same treatment. Adjust where we use it.
Marc Dionne [Tue, 14 Aug 2012 01:36:15 +0000 (21:36 -0400)]
Linux 3.6: dentry_open API change
dentry_open now takes a path argument that combines the dentry and
the vfsmount pointers.
Add a configure test and a new compat inline function to keep things
cleaner in the main source file.
Marc Dionne [Sun, 3 Jun 2012 00:45:08 +0000 (20:45 -0400)]
afsd: include sys/resource.h in afsd_kernel.c
With a recent glibc update, sys/wait.h no longer includes
sys/resource.h unless __USE_SVID, __USE_XOPEN or __USE_XOPEN2K8
are set.
Don't rely on the indirect inclusion to get the bits we need;
include it directly in afsd_kernel.c. This include used to be
there but was dropped when afsd_kernel.c was split off.
Michael Meffie [Thu, 2 Aug 2012 21:24:02 +0000 (17:24 -0400)]
libafs: revert init req to use the real uid
The commit to use wrappers for creditial structure access
inadvertently changed the user id to be the effective uid instead of
the real uid, when no PAG is present, on linux. Revert this so
setuid programs continue to work.
Andrew Deason [Thu, 22 Mar 2012 15:42:38 +0000 (10:42 -0500)]
afs: Set DWriting when truncating a dcache entry
When we truncate a file, we truncate the contents of the relevant
dcache entry chunks, and prevent future FetchData operations from
fetching data beyond the truncation offset. If we never write anything
to that chunk, we never set DWriting, and so on disk it looks like
that dcache entry has valid data for the specified DV. However, since
the data is truncated, this is not true.
If a process holds a file open, truncates it without writing to it,
and then the client crashes (or we have trouble contacting the
fileserver when we close the file), the dcache entry will appear valid
on disk. So the next time we read the dcache entry, we will use the
incorrect cache contents as if they were accurate for the specified
DV.
To avoid this, set DWriting when we truncate a chunk. Normally we only
clear DWriting when we actually send data to the fileserver, so to
clear DWriting in this case, add an additional line to clear it in
afs_StoreAllSegments, after the StoreMini has completed.
Andrew Deason [Fri, 2 Mar 2012 23:22:12 +0000 (17:22 -0600)]
afs: Do not limit fetches based on vcache length
Currently, when we go to the fileserver to fetch some data, we try to
make sure that we do not ask for data beyond the end of the file. For
example, if our chunk size is 1M, and we need to get the first chunk
for a file that is 4 bytes long, we will only ask the fileserver for 4
bytes.
This can cause issues when the file is being extended at the same time
as when we are trying to read the file. Consider the following
example. There is a file named X that has contents "abcd" at dv 1, and
we issue a FetchData64 request for X, only requesting 4 bytes. Right
before the fileserver gets the FetchData64 request, another client
writes the contents "12345" to file X.
The client will then fetch the contents "1234" for that file, at dv 2,
and store that as the contents of the first chunk for file X. On
subsequent reads for file X, applications will now get "1234<NUL>" as
the contents, since the size of the file will be updated to 5, but the
cache manager thinks that "1234" is the correct contents for the first
chunk of X at dv 2. The cache manager will continue to think so until
the cache entry is evicted or invalidated for whatever reason.
To avoid this scenario, always request a full chunk of data if we have
any data to fetch and the file has not been locally truncated. We can
still avoid the fetch at all if it looks like we're fetching beyond
end-of-file, since we know that at least at some point that was
correct information about the file. If this results in us trying to
fetch beyond end-of-file, the fileserver will respond with the correct
length anyway.
We still need to restrict the fetch request length based on
avc->f.truncPos, since the dcache data after avc->f.truncPos needs to
stay empty, since we don't track truncated data any other way. If we
also avoided this restriction, extending a file via truncation after
reducing a file's length via truncation could cause the old file data
to appear again, instead of filling the new file range with NULs.
Note that on at least Linux, with this fix an application can still
read the contents "1234" on the first read in the above example, and
"12345" on subsequent reads. This is just due to when we give the VFS
updates about file metadata, and could be remedied by updating file
metadata immediately from the FetchStatus information from the
FetchData64 call. However, just reading the contents "1234" in the
above example seems like a somewhat plausible outcome; at the very
least, it is an improvement.
Andrew Deason [Fri, 2 Mar 2012 23:18:25 +0000 (17:18 -0600)]
afs: Log a message on invalid FetchStatus receipt
If we get an invalid AFSFetchStatus structure from a server, log a
message to indicate as such. This serves as a warning to urge people
to fix their fileservers, and to explain what is doing.
Andrew Deason [Fri, 2 Mar 2012 23:06:48 +0000 (17:06 -0600)]
afs: Sanity-check some AFSFetchStatus structures
We currently do not do any sanity checking on the AFSFetchStatus
structures returned from fileservers. Add some sanity checking for
BulkStatus and FetchStatus calls, so we do not screw up our cache if a
fileserver gives us bogus data.
If we do get an invalid AFSFetchStatus structure, act as if the server
gave us a VBUSY error code, so we will retry the request. For OpenAFS
fileservers prior to 1.6.1 that yield this situation, VBUSY is likely
the error code the fileserver should have responded anyway.
Andrew Deason [Mon, 27 Aug 2012 18:33:47 +0000 (14:33 -0400)]
rx: dec rx_nWaiting on clearing RX_CALL_WAIT_PROC
Currently, a couple of callers (rxi_ResetCall, and
rxi_AttachServerProc) will decrement rx_nWaiting only if
RX_CALL_WAIT_PROC is set for a call, and the call is on a queue
(presumably rx_incomingCallQueue). This can cause an imbalance in
rx_nWaiting if these code paths are reached when, in another thread,
rx_GetCall has removed the call from its queue, but it has not yet
cleared RX_CALL_WAIT_PROC (this can happen while it is waiting for
call->lock). In this situation, rx_GetCall will remove the call from
its queue, wait, and e.g. rxi_ResetCall will clear RX_CALL_WAIT_PROC;
neither will decrement rx_nWaiting.
This is possible if a new call is started on a call channel with an
extant call that is waiting for a thread; we will rxi_ResetCall in
rxi_ReceivePacket, but rx_GetCall may be running at the same time.
This race may also be possible via rxi_AttachServerProc via
rxi_UpdatePeerReach -> TryAttach -> rxi_AttachServerProc while
rx_GetCall is running, but I'm not sure.
To avoid this, decrement rx_nWaiting based on RX_CALL_WAIT_PROC alone,
regardless of whether or not the call is on a queue. This mirrors the
incrementing rx_nWaiting behavior, where rx_nWaiting is only
incremented if RX_CALL_WAIT_PROC is unset for a call, so this should
guarantee that rx_nWaiting does not become unbalanced.
Andrew Deason [Wed, 21 Dec 2011 22:01:16 +0000 (17:01 -0500)]
afs: Add afs_WriteDCache sanity checks
Writing a non-free non-discarded dcache entry with a zero volume id
can easily cause hash table corruption later on, so make sure we don't
do that. Also log something if the write itself fails, as this usually
indicates an unusual situation involving I/O errors or something.
Andrew Deason [Wed, 21 Dec 2011 21:05:40 +0000 (16:05 -0500)]
afs: Cope with afs_GetValidDSlot errors
Make callers of afs_GetValidDSlot deal with getting a NULL dcache,
which can occur if an error is encountered. Some of these just panic
at least for now, since a code path for recovery is complex, but this
is at least better than dereferencing a NULL pointer.
Andrew Deason [Wed, 21 Dec 2011 20:04:32 +0000 (15:04 -0500)]
afs: Do not always ignore errors in afs_GetDSlot
Currently afs_UFSGetDSlot will silently swallow any error in reading
the specified dslot from disk, and will return a "blank" dcache to the
caller. However, many callers of afs_GetDSlot will be asking for a
dcache that we know exists, and more importantly, we know is on the
global hash table. If a disk error is encountered and we're given a
"blank" dcache, we will erroneously believe the dcache entry is not on
the hash table, causing corruption of the hash table later on.
So instead, modify all callers of afs_GetDSlot to use either
afs_GetValidDSlot or afs_GetNewDSlot. Calling afs_GetValidDSlot
indicates that the given dentry index is known to be valid, and any
error encountered while reading the entry from disk should result in
an error (for disk I/O errors we have no control over, this results in
a NULL dentry returned; for internal consistency errors we panic).
Calling afs_GetNewDSlot indicates that the specified index may not
exist or may not be valid, and so returning a "blank" dentry in that
case is fine.
For memcache, the situation is the same, except any time we go to
"disk" it is an (internal) error, since there is no disk.
Andrew Deason [Wed, 21 Dec 2011 22:25:29 +0000 (17:25 -0500)]
afs: Remove second argument to afs_GetDSlot
All callers of afs_GetDSlot were passing NULL as the second argument
to afs_GetDSlot. So, remove the argument, and behave as if tmpdc was
NULL unconditionally.
Andrew Deason [Fri, 6 Apr 2012 19:56:07 +0000 (14:56 -0500)]
LINUX: Do not lookup immediately recursive mtpts
On Linux, having a mountpoint in a volume root that points to the same
volume can cause serious problems. By 'immediately recursive', I mean
a situation like the following:
fs mkm mtpt vol
fs mkm mtpt/mtpt vol
If there are multiple dentry aliases for the directory (which is
possible if the directory is a mountpoint), an 'rmdir' on the
recursive mountpoint can cause the client to deadlock. Since the
'rmdir' code path in Linux locks the parent directory inode to perform
the rmdir, and locks the child directory inode after performing a
couple of sanity checks. For an immediately recursive mountpoint,
these two inodes are the same, and so we will deadlock.
Andrew Deason [Fri, 6 Jul 2012 21:37:39 +0000 (16:37 -0500)]
Linux: Make dir dentry aliases act like symlinks
Currently, we try to invalidate other dentries that exist for a
particular dir inode when we look up a dentry. This is so we try to
avoid duplicate dentries for a directory, which Linux does not like
(you cannot have hardlinks to a dir).
If we cannot invalidate the other aliases (because they are being
used), right now we just return the alias. This can make it very easy
to panic the client, due to the sanity checks Linux performs when dong
things like 'rmdir'. If we do something like this:
For the 'rmdir', we will lookup 'mtpt2'. Since 'mtpt' and 'mtpt2'
are mountpoints for the same volume, their dentries point to the same
directory inode. So when we lookup 'mtpt2', we will try to invalidate
the other dentry, but we cannot do that since it is the cwd. So we
return the alias dentry (for 'mtpt'). The Linux VFS layer then does a
sanity check for the rmdir operation, checking that the child dentry's
parent inode is the same as the inode we're performing the rmdir for.
Since the dentry we returned was for 'mtpt', whose parent is 'dir1',
and the actual dir we're performing the rmdir for is 'dir2', this
sanity check fails and we BUG.
To avoid this, make the dentry alias act like a symlink when we
encounter an uninvalidateable dentry alias. That is, we allow multiple
dentry aliases for a directory, however, when the dentry aliases are
actually used, we redirect to a common dentry (via d_automount where
possible, and follow_link elsewhere).
This means that such mountpoints will behave similarly to symlinks, in
that we 'point' to a specific mountpoint dentry. This means that if we
have multiple different ways to get to the same volume, and all are
accessed at the same time, all but one of those mountpoints will
behave like symlinks, pointing to the same mountpoint. So, the '..'
entries for each path will all point to the parent dir of one
mountpoint, meaning that the '..' entry will be "wrong", but for most
cases it will still be correct.
In order to try to make the 'target', pointed-to directory consistent,
we add a new field to struct vcache: target_link. This points to the
dentry we should redirect to, whenever that vcache is referenced. To
avoid (possibly not-feasibly-solvable) problems with refcounting, this
pointer is not actually a reference to the target dentry, but just
serves as a pointer to compare to.
FIXES 130273
Reviewed-on: http://gerrit.openafs.org/7741 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit de381aa0d39e88a1ca0c8ccbb2471c5cad5a964c)
Andrew Deason [Thu, 22 Dec 2011 20:01:52 +0000 (15:01 -0500)]
afs: Indicate error from afs_osi_Read/Write better
Currently afs_osi_Read and afs_osi_Write just return -1 on any I/O
error, even though they know the error code given from the OS VFS.
Just return that code instead so the caller can see what the error
was; but negate it, so it's clear that it is an error.
Andrew Deason [Thu, 22 Dec 2011 19:50:09 +0000 (14:50 -0500)]
afs: afs_osi_Read/Write returns negative on error
afs_osi_Read and afs_osi_Write need to return negative values on
error. EIO is not negative; return -EIO so we don't accidentally
return "success" if someone requested to read or write EIO bytes.
Andrew Deason [Mon, 9 Apr 2012 22:16:42 +0000 (17:16 -0500)]
vos: Do not try to remove backup volume id 0
Currently we always try to delete a BK volume if we're deleting the
RW. If the BK volume id is 0, this is never going to do anything, so
don't try to delete it.
Andrew Deason [Thu, 26 Jul 2012 21:40:03 +0000 (16:40 -0500)]
LINUX: Hold GLOCK for proc traversal
The functions that traverse unixuser structures for display via /proc
(uu_start et al) call various libafs functions hold and release locks,
etc. To do any of that, we need GLOCK. Amongst other issues, we can
panic if we try to acquire a contested lock without GLOCK, since we
assert glock is held when we sleep for the lock or try to wake other
waiters. The same goes for the legacy CellServDB proc file.
So, hold and release GLOCK as appropriate.
Reviewed-on: http://gerrit.openafs.org/7885 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 5237d3d232f22aaf4f67f3a8727a293f4058a7ae)
Niklas Jonsson [Wed, 20 Jun 2012 14:03:54 +0000 (10:03 -0400)]
Auth: increase size of DNS resolver answer buffer
This patchset increases the size of the res_search() answer
buffer from 1024 octets to 4096 octets. This is not a proper
long term solution but will permit sites with longer response
lists to make use of SRV and AFSDB records.
This patchset only impacts UNIX systems. Windows uses the
Win32 DNS resolver which dynamically allocates memory based
upon the size of the response.
Andrew Deason [Mon, 18 Jun 2012 20:06:49 +0000 (15:06 -0500)]
doc: Consolidate NetRestrict format docmentation
We were specifying exactly the same format in two different places;
consolidate them into one place. In addition, explicitly say there are
is no way to specify a range of addresses, in case some people are
confused by the previous versions of this man page that erroneously
said you could use 255 as a wildcard.
Andrew Deason [Mon, 18 Jun 2012 20:06:49 +0000 (15:06 -0500)]
doc: Consolidate NetRestrict format docmentation
We were specifying exactly the same format in two different places;
consolidate them into one place. In addition, explicitly say there are
is no way to specify a range of addresses, in case some people are
confused by the previous versions of this man page that erroneously
said you could use 255 as a wildcard.
Marc Dionne [Sun, 3 Jun 2012 01:35:53 +0000 (21:35 -0400)]
Linux 3.5: encode_fh API change
The encode_fh export operation now expects two inode arguments
instead of a dentry and a "connectable" flag. Use the inode of
the dentry we're interested in, and NULL as the parent inode which
is the same as passing a 0 flag in the previous API.
Andrew Deason [Fri, 18 May 2012 21:40:38 +0000 (17:40 -0400)]
afs: Do not QueueVCB before osi_dnlc_purge*
In afs_FlushVCache, when we QueueVCB, we might drop the afs_xvcache
lock (as of 76158df491f47de56d1febe1d1d2d17d316c9a74). The vcache may
still be on the DNLC, so a lookup while xvcache is dropped can cause
someone else to grab a reference to the vcache while it is being
flushed. This can cause panics and failed assertions, since someone
will have a reference to the flushed vcache, which is effectively
freed and many of the structure fields are no longer valid.
So instead, do not call QueueVCB until we have purged the vcache from
the DNLC.
Andrew Deason [Mon, 21 May 2012 22:11:29 +0000 (17:11 -0500)]
afsd: Avoid dir interpolation for memcache
memcache doesn't make use of fullpn_DCacheFile, fullpn_VolInfoFile,
etc. Do not even try to generate these strings for memcache, since
cacheBaseDir will be NULL, and so this can cause a segfault on some
platforms including Solaris.
Mark Vitale [Tue, 8 May 2012 14:01:12 +0000 (10:01 -0400)]
vos: convertROtoRW may create two RW volumes
If the RW volume is listed after the RO convert target in the VLDB,
the code failed to detect that an RW is already present and would
create a second RW volume.
Reviewed-on: http://gerrit.openafs.org/7385 Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@dementix.org>
(cherry picked from commit 9a728fd86c7add13e15dfa0d3062fa94cc77c53f)
modify afshelper to just run what it's told instead
of offering fixed operations. this avoids having a setuid
tool around. in spite of apple's suggestion this is correct,
it's actually more dangerous. instead, elevate privilege only
to run a small set of commands, then drop. allow
unlocking of the prefs pane, but make the menu extra prompt
for authentication when needed.
deactivate controls in the prefs pane when locked.