Details
Description
centos-6, version 3.0.0-603, 64 vbuckets
Cluster used: 10.6.2.144, 10.6.2.145, 10.6.2.147, 10.6.2.148, 10.6.2.149, 10.6.2.150
1. Add 6 nodes to cluster
2. Add 1 default bucket
3. Add 1000 items
4. Wait till all replication is done
and disk queues are empty
5. Failover node 10.6.2.145
6. Rebalance
7. Compare active vs replica failover logs
Please note that during failover-rebalance, we did not do add any extra item.
Expected: Failovers logs to in sync
Observed: Failover logs are not in sync for some v-buckets
bucket default, vbucket vb_14 :: Original node 10.6.2.147 active :: UUID 220495252186831, Change node 10.6.2.144 replica UUID 142057680169654
bucket default, vbucket vb_20 :: Original node 10.6.2.144 replica :: UUID 41362337452486, Change node 10.6.2.150 active UUID 65310417409592
bucket default, vbucket vb_12 :: Original node 10.6.2.144 active :: seq 0, Change node 10.6.2.150 replica :: seq 7
bucket default, vbucket vb_17 :: Original node 10.6.2.148 active :: UUID 123013237419052, Change node 10.6.2.149 replica UUID 139842677528282
Note that we observed the same for failover when a node was stopped.
bucket default, vbucket vb_14 :: Original node 10.6.2.147 active :: UUID 60821065035648, Change node 10.6.2.144 replica UUID 121182545756523
bucket default, vbucket vb_20 :: Original node 10.6.2.144 replica :: UUID 260849933196696, Change node 10.6.2.150 active UUID 186457677132903
bucket default, vbucket vb_12 :: Original node 10.6.2.144 active :: seq 0, Change node 10.6.2.150 replica :: seq 7
bucket default, vbucket vb_17 :: Original node 10.6.2.148 active :: UUID 139028362900196, Change node 10.6.2.149 replica UUID 1165172246300
Few more observation: Did not see this issue when we did graceful failover
I have attached logs for both cases when failover is normal vs for stopped node