Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51000

[System Test] Rebalance after multi-node failover failed with error : prepare_rebalance_failed, failed_nodes

    XMLWordPrintable

Details

    Description

      Build : 7.1.0-2298
      Test : -test tests/integration/neo/test_neo_magma_milestone4.yml -scope tests/integration/neo/scope_neo_magma.yml
      Scale : 3
      Iteration : 1st

      In the Magma longevity test, there is a step to attempt auto-failover for 3 data nodes, and then perform a rebalance.

      [2022-02-14T18:05:28-08:00, sequoiatools/couchbase-cli:7.1:d57d47] setting-autofailover -c 172.23.108.139:8091 -u Administrator -p password --enable-auto-failover=1 --auto-failover-timeout=5 --max-failovers=3
      [2022-02-14T18:05:34-08:00, sequoiatools/cmd:46963f] 10
      [2022-02-14T18:05:49-08:00, sequoiatools/cbutil:7df5a8] /cbinit.py 172.23.108.141 root couchbase stop
      [2022-02-14T18:06:21-08:00, sequoiatools/cmd:b875be] 10
      [2022-02-14T18:06:38-08:00, sequoiatools/cbutil:a65e13] /cbinit.py 172.23.108.132 root couchbase stop
      [2022-02-14T18:06:50-08:00, sequoiatools/cmd:aa7407] 10
      [2022-02-14T18:07:07-08:00, sequoiatools/cbutil:fe3130] /cbinit.py 172.23.108.146 root couchbase stop
      [2022-02-14T18:07:19-08:00, sequoiatools/cmd:e5dc8e] 10
      [2022-02-14T18:07:35-08:00, sequoiatools/couchbase-cli:7.1:ff576c] rebalance -c 172.23.108.139:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.139:8091 -u Administrator -p password]
       
      docker logs ff576c
      docker start ff576c
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2022-02-14T18:08:12-08:00, sequoiatools/cmd:72814a] 60
      

      From the error.log on the orchestrator node 172.23.108.139, the rebalance failed due to this error -

      [user:error,2022-02-14T18:08:05.982-08:00,ns_1@172.23.108.139:<0.26787.0>:ns_orchestrator:log_rebalance_completion:1428]Rebalance exited with reason {prepare_rebalance_failed,
                                    {error,
                                     {failed_nodes,
                                      [{'ns_1@172.23.108.132',{error,timeout}}]}}}.
      Rebalance Operation Id = 118bda2ba468d6e4a4b55e007201fea1
      

      Now, couchbase-server service was stopped on 172.23.108.132 at 2022-02-14T18:06, so this error during rebalance is unexpected.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty