NRE when rebalancing and cluster map is missing an alternate address

Description

Scenario: Mixed node upgrade (Enterprise Edition 6.6.3 build 9808 and Enterprise Edition 7.0.3 build 7032). First bootstrap to 6.6.3 and run a load. Then add 7.0.3 and hit rebalance; the server will not return the AlternateAddress for the 7.0.3 node.

During a rebalance of a cluster using alternate addresses, its possible for the server to not return an alternative address:

If the NRE is handled (check for null AlternateAddress) in the SDK, this is in turn leads up to NMVB being returned after rebalance completes:

If the application is then stopped and restarted, then bootstrapping fails with the following socket exception (because there is no alternate address to use for the second node "cb2.lan"):

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Attachments

4

Activity

Show:

Jeffry Morris April 5, 2022 at 5:30 PM

Reopening to fix the NRE that is thrown within the SDK, however, this will cause NMVB's as the cluster state is bad at this point. The resolution is to set the alternate addresses again on the node that was swapped out as it state will be "refreshed". This is really a server configuration issue at this point. Once the alternate addresses have been set on the server, the SDK will resolve itself when an updated cluster map is returned.

Jeffry Morris April 1, 2022 at 7:50 PM

Closing as I have been convinced this is expected behavior: when a node is removed from a cluster it resets itself back to a "fresh state".

Jeffry Morris April 1, 2022 at 5:49 PM

To recreate:

  • Provision 2 nodes 6.6.3 and 7.0.3: enable alternate addresses on each (couchbase-cli setting-alternate-address...)

  • Have an app put a load on the 6.6.3 node and then add 7.0.3 node, then rebalance.

  • Remove the 7.0.3 node and rebalance.

  • Add the same 7.0.3 and rebalance...the server will return NMVB and not recover.

  • Go back to the 7.0.3 node an reenable alternate addresses: couchbase-cli setting-alternate-address -c localhost:8091 --username Administrator --password password --set --node cb2.lan --hostname mbp.local --ports mgmt=9092,kv=11212;

  • Restart the app and it will then work correctly.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Components

Fix versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created March 31, 2022 at 11:09 PM
Updated April 5, 2022 at 5:30 PM
Resolved April 5, 2022 at 5:30 PM
Instabug