Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51330

[System Test] Rebalance stuck in index phase due to 2 indexes in moving state

    XMLWordPrintable

Details

    Description

      Build : 7.1.0-2416
      Test : -test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
      Scale : 3
      Iteration : 3rd

      A step in the longevity test is to perform a multi node fail over for KV nodes, and then rebalance them out.

      [2022-03-04T21:14:49-08:00, sequoiatools/cbutil:bbd5d9] /cbinit.py 172.23.106.100 root couchbase stop
      [2022-03-04T21:15:14-08:00, sequoiatools/cmd:bc0cc6] 10
      [2022-03-04T21:15:30-08:00, sequoiatools/cbutil:c62b83] /cbinit.py 172.23.120.107 root couchbase stop
      [2022-03-04T21:15:40-08:00, sequoiatools/cmd:fad5eb] 10
      [2022-03-04T21:15:56-08:00, sequoiatools/cbutil:85b1b7] /cbinit.py 172.23.99.11 root couchbase stop
      [2022-03-04T21:16:15-08:00, sequoiatools/cmd:102979] 10
      [2022-03-04T21:16:31-08:00, sequoiatools/couchbase-cli:7.1:1336ff] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      

      Rebalance is stuck for ~60 hrs at the indexer phase as 2 indexes are stuck in Moving state.

            {
               "bucket" : "bucket7",
               "collection" : "coll_3",
               "completion" : 100,
               "definition" : "CREATE INDEX `idx10_xXgf` ON `bucket7`.`scope_2`.`coll_3`((all (array (all (array flatten_keys(`n`, `v`) for `n` : `v` in (`r`.`ratings`) end)) for `r` in `reviews` end))) PARTITION BY hash((meta().`id`)) WITH {  \"defer_build\":true, \"nodes\":[ \"172.23.104.67:8091\",\"172.23.120.245:8091\",\"172.23.96.251:8091\",\"172.23.96.252:8091\" ], \"num_replica\":1, \"num_partition\":2 }",
               "defnId" : 5451174261260198682,
               "hosts" : [
                  "172.23.104.67:8091",
                  "172.23.96.251:8091",
                  "172.23.96.252:8091"
               ],
               "indexName" : "idx10_xXgf",
               "indexType" : "plasma",
               "instId" : 10518299366001305340,
               "lastScanTime" : "Sat Mar  5 05:37:44 PST 2022",
               "name" : "idx10_xXgf",
               "numPartition" : 3,
               "numReplica" : 1,
               "partitionMap" : {
                  "172.23.104.67:8091" : [
                     1
                  ],
                  "172.23.96.251:8091" : [
                     2
                  ],
                  "172.23.96.252:8091" : [
                     1
                  ]
               },
               "partitioned" : true,
               "progress" : 100,
               "replicaId" : 0,
               "scheduled" : false,
               "scope" : "scope_2",
               "secExprs" : [
                  "(all (array (all (array flatten_keys(`n`, `v`) for `n` : `v` in (`r`.`ratings`) end)) for `r` in `reviews` end))"
               ],
               "stale" : false,
               "status" : "Moving"
            }

            {
               "bucket" : "bucket7",
               "collection" : "coll_3",
               "completion" : 100,
               "definition" : "CREATE INDEX `idx1_eLiKR` ON `bucket7`.`scope_2`.`coll_3`(`country`,(distinct (array ((`r`.`ratings`).`Check in / front desk`) for `r` in `reviews` end)),array_count(`public_likes`),array_count(`reviews`) DESC,`type`,`phone`,`price`,`email`,`address`,`name`,`url`) PARTITION BY hash((meta().`id`)) WITH {  \"defer_build\":true, \"nodes\":[ \"172.23.104.155:8091\",\"172.23.104.67:8091\",\"172.23.120.245:8091\",\"172.23.96.251:8091\",\"172.23.96.252:8091\" ], \"num_replica\":2, \"num_partition\":3 }",
               "defnId" : 12484504482944935375,
               "hosts" : [
                  "172.23.104.155:8091",
                  "172.23.120.245:8091",
                  "172.23.96.252:8091"
               ],
               "indexName" : "idx1_eLiKR",
               "indexType" : "plasma",
               "instId" : 8445560715018320344,
               "lastScanTime" : "Sat Mar  5 05:37:43 PST 2022",
               "name" : "idx1_eLiKR",
               "numPartition" : 4,
               "numReplica" : 2,
               "partitionMap" : {
                  "172.23.104.155:8091" : [
                     3,
                     2
                  ],
                  "172.23.120.245:8091" : [
                     1
                  ],
                  "172.23.96.252:8091" : [
                     1
                  ]
               },
               "partitioned" : true,
               "progress" : 100,
               "replicaId" : 0,
               "scheduled" : false,
               "scope" : "scope_2",
               "secExprs" : [
                  "`country`",
                  "(distinct (array ((`r`.`ratings`).`Check in / front desk`) for `r` in `reviews` end))",
                  "array_count(`public_likes`)",
                  "array_count(`reviews`)",
                  "`type`",
                  "`phone`",
                  "`price`",
                  "`email`",
                  "`address`",
                  "`name`",
                  "`url`"
               ],
               "stale" : false,
               "status" : "Moving"
            }

      On indexer node 172.23.96.252, following can be seen in the logs after the start of the rebalance:

      root@k8s-worker-2 logs]# zgrep "bucket7:scope_2:coll_3" indexer.log* | grep "2022-03-04T21" | grep -e "\[Warn" -e "\[Erro"
      indexer.log.4.gz:2022-03-04T21:40:20.444-08:00 [Warn] StreamState::updateHWT Received Partial Last Snapshot in HWT KeyspaceId bucket7:scope_2:coll_3 StreamId INIT_STREAM vbucket 546 Snapshot 0-154757 Seqno 154757 Vbuuid 130479846032254 lastSnap 1-1 lastSnapSeqno 154756
      indexer.log.4.gz:2022-03-04T21:40:22.950-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.104.70:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:22.950-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.104.70:9999
      indexer.log.4.gz:2022-03-04T21:40:22.960-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.105.107:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:22.960-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.105.107:9999
      indexer.log.4.gz:2022-03-04T21:40:22.986-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.108.103:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:22.986-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.108.103:9999
      indexer.log.4.gz:2022-03-04T21:40:23.000-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.121.117:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:23.000-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.121.117:9999
      indexer.log.4.gz:2022-03-04T21:40:23.006-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.96.253:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:23.006-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.96.253:9999
      indexer.log.4.gz:2022-03-04T21:40:23.012-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.97.121:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:23.012-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.97.121:9999
      indexer.log.4.gz:2022-03-04T21:40:23.019-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.97.122:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:23.019-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.97.122:9999
      indexer.log.4.gz:2022-03-04T21:40:23.026-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.99.21:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:23.026-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.99.21:9999
      indexer.log.4.gz:2022-03-04T21:40:23.040-08:00 [Error] KVSender::sendDelKeyspacesRequest Unexpected Error During Del Buckets Request Projector 172.23.99.25:9999 Topic INIT_STREAM_TOPIC_11e8efedc8e58fadd12f9682c371f3ab KeyspaceIds [bucket7:scope_2:coll_3]. Err genServer.closed
      indexer.log.4.gz:2022-03-04T21:40:23.040-08:00 [Error] KVSender::deleteKeyspacesFromStream INIT_STREAM [bucket7:scope_2:coll_3] Error Received genServer.closed from 172.23.99.25:9999
      

      Similar errors and warnings can be seen on other indexer nodes as well - 172.23.104.155, 172.23.120.245, 172.23.104.67, 172.23.96.251

      Logs attached are from ~3/4/22 11 PM. Let me know if you need logs from a later time stamp.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty