Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Untriaged
-
-
1
-
Unknown
Description
Build : 7.0.0-5169
Test : -test tests/2i/cheshirecat/test_idx_clusterops_cheshire_cat_recovery.yml -scope tests/2i/cheshirecat/scope_idx_cheshire_cat_dgm.yml
Scale : 2
Iteration : 1st
This is the new GSI component test with more recovery steps. After the steady state phase, a rebalance operation is started to add a new indexer node 172.23.96.31 to the cluster. While this rebalance is on, after a few mins, indexer process on 172.23.97.77 is killed. Rebalance fails as expected. This rebalance is automatically retried in a couple of mins. The retried rebalance is hung for about 22 hrs now as 1 index is stuck in moving state.
Details of the index stuck in moving state :
{
|
"bucket" : "bucket2",
|
"collection" : "coll_9",
|
"completion" : 100,
|
"definition" : "CREATE INDEX `idx1_YXvO` ON `bucket2`.`scope_1`.`coll_9`(`country`,(distinct (array ((`r`.`ratings`).`Check in / front desk`) for `r` in `reviews` end)),array_count(`public_likes`),array_count(`reviews`) DESC,`type`,`phone`,`price`,`email`,`address`,`name`,`url`) WITH { \"defer_build\":true, \"nodes\":[ \"172.23.96.30:8091\",\"172.23.97.77:8091\",\"172.23.97.82:8091\",\"172.23.97.83:8091\" ], \"num_replica\":2 }",
|
"defnId" : 11843842764277554498,
|
"hosts" : [
|
"172.23.96.30:8091",
|
"172.23.97.82:8091"
|
],
|
"indexName" : "idx1_YXvO",
|
"indexType" : "plasma",
|
"instId" : 12561991181710981895,
|
"lastScanTime" : "Sun May 16 13:06:50 PDT 2021",
|
"name" : "idx1_YXvO",
|
"numPartition" : 2,
|
"numReplica" : 2,
|
"partitionMap" : {
|
"172.23.96.30:8091" : [
|
0
|
],
|
"172.23.97.82:8091" : [
|
0
|
]
|
},
|
"partitioned" : false,
|
"progress" : 100,
|
"replicaId" : 0,
|
"scheduled" : false,
|
"scope" : "scope_1",
|
"secExprs" : [
|
"`country`",
|
"(distinct (array ((`r`.`ratings`).`Check in / front desk`) for `r` in `reviews` end))",
|
"array_count(`public_likes`)",
|
"array_count(`reviews`)",
|
"`type`",
|
"`phone`",
|
"`price`",
|
"`email`",
|
"`address`",
|
"`name`",
|
"`url`"
|
],
|
"stale" : false,
|
"status" : "Moving"
|
}
|
The rebalance was initiated at 2021-05-15T17:36:26. Following is from the test console :
[2021-05-15T17:36:26-07:00, sequoiatools/couchbase-cli:7.0:68fafa] server-add -c 172.23.104.16:8091 --server-add https://172.23.96.31 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index
|
[2021-05-15T17:36:36-07:00, sequoiatools/couchbase-cli:7.0:6951cf] rebalance -c 172.23.104.16:8091 -u Administrator -p password
|
[2021-05-15T17:36:41-07:00, sequoiatools/cmd:e19b37] 60
|
[2021-05-15T17:37:47-07:00, sequoiatools/cmd:622ca9] 300
|
[pull] vijayviji/sshpass
|
[2021-05-15T17:43:21-07:00, vijayviji/sshpass:fbd7e7] sshpass -p couchbase ssh -o StrictHostKeyChecking=no root@172.23.97.77 kill -SIGKILL $(pgrep indexer)
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.104.16:8091 -u Administrator -p password]
|
|
docker logs 6951cf
|
docker start 6951cf
|
|
*Unable to display progress bar on this os
|
JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
[2021-05-15T17:43:26-07:00, sequoiatools/cmd:cb2101] 420
|
[2021-05-15T17:50:32-07:00, appropriate/curl:e82955] -s -u Administrator:password 172.23.104.16:8091/pools/default/rebalanceProgress
|
This issue could be similar to MB-46319, but the builds are different, and so are the tests.
Indexer nodes in the cluster : 172.23.121.165, 172.23.96.30, 172.23.96.31, 172.23.97.77, 172.23.97.82, 172.23.97.83
The latest getIndexStatus output is attached. Also, the logs are from ~2 AM on 5/16. Let me know if you need logs from before or after this time.