Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49051

[System Test] Index rebalance stuck due to 1 index in moving state

    XMLWordPrintable

Details

    Description

      Build : 7.1.0-1524
      Test : -test tests/integration/neo/test_neo_couchstore_milestone2.yml -scope tests/integration/neo/scope_couchstore.yml
      Scale : 3
      Iteration : 2nd

      In the 2nd iteration of the longevity test, a swap rebalance operation for index nodes started at 2021-10-19T18:19:34

      [2021-10-19T18:19:18-07:00, sequoiatools/couchbase-cli:7.0:231375] server-add -c 172.23.108.103:8091 --server-add https://172.23.104.67 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index
      [2021-10-19T18:19:34-07:00, sequoiatools/couchbase-cli:7.0:49dd8c] rebalance -c 172.23.108.103:8091 --server-remove 172.23.104.69 -u Administrator -p password
      

      This is stuck since 20 hrs now in the index phase of the rebalance due to one index in Moving state.

       {
               "bucket" : "bucket7",
               "collection" : "coll_1",
               "completion" : 100,
               "definition" : "CREATE INDEX `idx3_b31z` ON `bucket7`.`scope_3`.`coll_1`(`free_breakfast`,`free_parking`,`country`,`city`) PARTITION BY hash((meta().`id`)) WITH {  \"defer_build\":true, \"nodes\":[ \"172.23.104.67:8091\",\"172.23.104.69:8091\",\"172.23.105.111:8091\",\"172.23.120.245:8091\",\"172.23.121.117:8091\",\"172.23.96.252:8091\",\"172.23.96.253:8091\" ], \"num_replica\":3, \"num_partition\":5 }",
               "defnId" : 15684706469183385686,
               "hosts" : [
                  "172.23.104.67:8091",
                  "172.23.104.69:8091",
                  "172.23.105.111:8091",
                  "172.23.120.245:8091",
                  "172.23.96.252:8091",
                  "172.23.96.253:8091"
               ],
               "indexName" : "idx3_b31z",
               "indexType" : "plasma",
               "instId" : 8555298990214263616,
               "lastScanTime" : "NA",
               "name" : "idx3_b31z",
               "numPartition" : 6,
               "numReplica" : 3,
               "partitionMap" : {
                  "172.23.104.67:8091" : [
                     4
                  ],
                  "172.23.104.69:8091" : [
                     4
                  ],
                  "172.23.105.111:8091" : [
                     3
                  ],
                  "172.23.120.245:8091" : [
                     1
                  ],
                  "172.23.96.252:8091" : [
                     2
                  ],
                  "172.23.96.253:8091" : [
                     5
                  ]
               },
               "partitioned" : true,
               "progress" : 100,
               "replicaId" : 0,
               "scheduled" : false,
               "scope" : "scope_3",
               "secExprs" : [
                  "`free_breakfast`",
                  "`free_parking`",
                  "`country`",
                  "`city`"
               ],
               "stale" : false,
               "status" : "Moving"
            }
      

      From the stats, 318 mutations are pending for this index on 172.23.104.67 since a long time. From the indexer logs, the index has been in CATCHUP state since 2021-10-19T20:08:48 -

      2021-10-19T20:08:48.386-07:00 [Info] Rebalancer::waitForIndexBuild: Index: bucket7:scope_3:coll_1:idx3_b31z State: INDEX_STATE_CATCHUP Pending: 316 EstTime: 0 Partitions: [4] Destination: 127.0.0.1:9102
      2021-10-19T20:08:51.404-07:00 [Info] Rebalancer::waitForIndexBuild: Index: bucket7:scope_3:coll_1:idx3_b31z State: INDEX_STATE_CATCHUP Pending: 316 EstTime: 0 Partitions: [4] Destination: 127.0.0.1:9102
      2021-10-19T20:08:52.159-07:00 [Info] Rebalancer::waitForIndexBuild: Index: bucket7:scope_3:coll_1:idx3_b31z State: INDEX_STATE_CATCHUP Pending: 316 EstTime: 0 Partitions: [4] Destination: 127.0.0.1:9102
      

      This doesn't look to be related to MB-49031, as I couldnt find the keyword "committed harakiri" in the index logs on any of the indexer nodes.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty