From: Marcio Barbosa Date: Thu, 22 Feb 2018 22:53:23 +0000 (-0500) Subject: ubik: don't set database epoch to 0 if not needed X-Git-Tag: upstream/1.8.1_pre2^2~85 X-Git-Url: https://git.michaelhowe.org/gitweb/?a=commitdiff_plain;h=b41065f2b8877580a7e1858b8e2857973ddf6503;p=packages%2Fo%2Fopenafs.git ubik: don't set database epoch to 0 if not needed If our attempt to receive a fresh database from a peer fails, we will overwrite the version.epoch field of our current local copy of the database with an invalid value, "0". The idea behind this approach is to make sure that this database will not be seen as a legit copy if the transfer is not completed properly. Although it is questionable if this approach is still necessary (since the current version writes the data into a temporary file), it is undisputed that the database version does not have to be invalidated if the transfer fails in a early stage where no data has been written and we could safely continue to reuse the local copy for read-only queries. Early failures may happen if: 1. The peer sending the database to us is not the peer we believe to be the sync site; 2. The sender is not authorized to call DISK_SendFile; In both cases, the database epoch is invalidated. As a result of that, we may have the following consequences: 1. Reads may not be allowed Once the on disk epoch is invalidated, if the server in question is rebooted, the invalid on disk epoch will be used to initialize the in memory epoch. At this point, reads may not be allowed since urecovery_AllBetter checks if the in memory epoch is greater than 1. Reads should not be blocked forever since the sync-site will send a new database to this remote and, as a result of that, the invalid version will be corrected. 2. Data can be lost If the site with the invalid epoch is the one with the most recent database, the database can be rolled back to an earlier version during a new quorum establishment. Consider the following scenario where we have three sites: Site A (up - database up to date) (sync-site) Site B (up - database up to date) Site C (down - old database) The epoch of B is invalidated due to the problem fixed by this patch. Then, A is turned off and C is turned on. In this scenario, the new sync-site will distribute the old database held by C since its epoch is greater than 0. To fix the problem in question, do not set the database epoch to 0 if the local database was not modified. Acknowledgements: Hartmut Reuter - found the problem; - suggested a possible solution; Benjamin Kaduk - submitted the first version; Andrew Deason - suggested changes; Reviewed-on: https://gerrit.openafs.org/12924 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie (cherry picked from commit bd6a2484011dad6298c4ce97dd0cd68e0834baa5) Change-Id: I64808d4adf6a5925083a671308a60f93ca427180 Reviewed-on: https://gerrit.openafs.org/12937 Reviewed-by: Hartmut Reuter Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Benjamin Kaduk --- diff --git a/src/ubik/remote.c b/src/ubik/remote.c index d3d0f138a..9f88aaff0 100644 --- a/src/ubik/remote.c +++ b/src/ubik/remote.c @@ -463,8 +463,7 @@ SDISK_SendFile(struct rx_call *rxcall, afs_int32 file, pbuffer[0] = '\0'; if ((code = ubik_CheckAuth(rxcall))) { - DBHOLD(dbase); - goto failed; + return code; } /* next, we do a sanity check to see if the guy sending us the database is @@ -483,9 +482,10 @@ SDISK_SendFile(struct rx_call *rxcall, afs_int32 file, otherHost = ubikGetPrimaryInterfaceAddr(rx_HostOf(tpeer)); if (offset && offset != otherHost) { /* we *know* this is the wrong guy */ - code = USYNC; - DBHOLD(dbase); - goto failed; + ubik_print + ("Ubik: Refusing synchronization with server %s since it is not the sync-site.\n", + afs_inet_ntoa_r(otherHost, hoststr)); + return USYNC; } DBHOLD(dbase);