Copied from parent
From the code analysis above, it is clear that FetchForBucket() should not be using a RetryHelper object, which is what creates the middle of the three retry loops that will iterate 301 times:
- checkForTokens() outer retry loop executes once every 30 seconds and will terminate the first time the keyspace is found to be invalid.
- FetchForBucket() middle retry loop executes 301 times and is not needed. (I find this routine has its own inner retry loop as well but it is not triggered by invalid keyspace, only for invalid ClusterInfoCache object.)
- RefreshBucket() (and also RefreshManifest() called just after it) inner retry loops are low-level routines that do 5x retries in tight loops. These retries are needed for general robustness of these routines, including for other callers, but will not flood the log once the middle retry loop is removed.
Removing the middle retry loop will result in this scenario producing only 5 of the log message instead of 1,505, and it will make the code match the original intent.
FetchForBucket() (cluster_info.go) was created in 7.1.0 via
MB-46245 change set https://review.couchbase.org/c/indexing/+/155866, creating the middle 301x retry loop, so it is a regression and the root cause of the current MB.
This change was then backported to 7.0.2 via
MB-47635 change set https://review.couchbase.org/c/indexing/+/159091, thus the bug now exists there too.
checkValidKeyspace() (metadata_provider.go) and the every-30-seconds check were added in 7.0.0 via
MB-46058 change set https://review.couchbase.org/c/indexing/+/153284. These are not causes of the bug.