From 8c78a44cf5197ceee6907e947074973138c442f0 Mon Sep 17 00:00:00 2001 From: Andrew Deason Date: Mon, 27 Oct 2014 16:39:34 -0500 Subject: [PATCH] rx: Reset lastSendData when resetting call Currently we use call->lastSendData to attempt to detect a stalled call, if it's been too long since the last time the call sent any data. However, we never initialize lastSendData to anything when creating a new call. This means that when rx_NewCall (or rxi_NewCall) returns, lastSendData can be nonzero. This can happen if we reuse a DALLY call, or if we pull a call off of rx_freeCallQueue. This can be a time very far in the past, since the lastSendData time has not changed since the last time the call was used; it will remain unchanged until a user of the new call writes something to the call stream. This can be a problem between the time when a caller creates a new call with rx_NewCall and when the caller actually writes something to the stream. Between those two times, if lastSendData happens to be set to a time in the past, we may call rxi_CheckCall on that call, and abort the call for being idle. The call will thus be aborted before it even sent any data on the wire. This is of particular concern for multi_Rx calls, since those can create a large number of call structures, possibly introducing a delay between calling rx_NewCall and writing anything to the stream (if one of the later rx_NewCall invocations blocks waiting for an open call channel, for instance, all of the previous allocated calls will stick around unused for potentially a long time). One such multi_Rx call is done by the cache manager, where it periodically uses multi_Rx to call RXAFS_GetCapabilities to probe fileservers for reachability. If this issue occurs during that operation you can see a large number of servers get marked down for code -9 (RX_CALL_IDLE), and then get marked as coming back up. To fix this, set lastSendData to 0 when resetting a call, along with most of the other fields in a call, to indicate that the call has never sent any data. As long as lastSendData is 0, the call will never get aborted with RX_CALL_IDLE, and this situation will be avoided. This ensures that this issue cannot happen, since rxi_ResetCall is guaranteed to be called at some point whenever we reuse a call structure for any reason. Change-Id: Ie96d1c640616fd5a234c635f60dfef4ad7c19d28 Reviewed-on: http://gerrit.openafs.org/11557 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Jeffrey Altman --- src/rx/rx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/rx/rx.c b/src/rx/rx.c index 416ce074c..a333c39a8 100644 --- a/src/rx/rx.c +++ b/src/rx/rx.c @@ -5406,6 +5406,7 @@ rxi_ResetCall(struct rx_call *call, int newcall) call->rprev = 0; call->lastAcked = 0; call->localStatus = call->remoteStatus = 0; + call->lastSendData = 0; if (flags & RX_CALL_READER_WAIT) { #ifdef RX_ENABLE_LOCKS -- 2.39.5