Description
This seems to be quite old bug but it's still bug and quite embarrassing. Seemingly our change to update config during rebalance made it much more probable.
So what happens is ns_config clear clears config, waits for saver and then reloads config. The problem is that wait for saver wait's only for currently running save and can spawn new saver if changes were made since saver was started. Exactly this happens when config is cleared while saver is running. Leading config reload to race with saver. I've observed this seemingly few times already myself.
UPDATE: When I filed this bug I was thinking about particular race of ns_config:clear and async config saving. But actual bug was filed because people (including me) where seeing this weird condition when ejected node couldn't be added back and was thinking it's still part of cluster. As can be seen below in comments we traced this down to race in shutting down config merger process and clearing config.
Attachments
For Gerrit Dashboard: MB-5110 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
15006,1 | Unexport confusing and unused ns_config:reload. MB-5110 | branch-181 | ns_server | Status: MERGED | +2 | +1 |
15007,1 | fixed ns_config saver race in reload and terminate. MB-5110 | branch-181 | ns_server | Status: ABANDONED | 0 | +1 |
15022,1 | Merge remote-tracking branch 'membase/branch-181' into merge | branch-18 | ns_server | Status: MERGED | +2 | +1 |
15049,1 | shutdown config merger when shutting down disco_sup. MB-5110 | branch-181 | ns_server | Status: ABANDONED | -2 | +1 |
15050,2 | made config saver failures more visible. | branch-181 | ns_server | Status: MERGED | +2 | +1 |
15066,1 | Merge remote-tracking branch 'couchbase/branch-18' | master | ns_server | Status: MERGED | +2 | +1 |
17576,1 | MB-5110: shutdown config merger when shutting down disco_sup | master | ns_server | Status: MERGED | +2 | +1 |