Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56810

[System test upgrade] : Swap rebalances after system test upgrade from 7.1.4-3601 -> 7.2.0-5323 keeps failing continuously

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 0
    • Yes

    Description

      Steps to Repro
      1. Run neo longevity test for 3 days on 7.1.4-3601.

      ./sequoia -client 172.23.104.254:2375 -provider file:centos_third_cluster.yml -test tests/integration/neo/test_neo.yml -scope tests/integration/neo/scope_neo_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.1.4-3601 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
      

      2. Upgrade this cluster to 7.2.0-5323 using a online upgrade with failover/recovery strategy.
      3. Updated the bucket properties of the all the buckets post upgrade using the following rest api.

      [root@s20507w12r2 ~]# curl localhost:8091/pools/default/buckets/default  -u Administrator:password -X POST -d historyRetentionBytes=2147483648
      

      4. Did rebalance out on upgraded 7.2.0 cluster which worked fine.
      4. Tried to do swap rebalance and it keeps failing continuously.

      172.23.120.74 3:39:19 AM 9 May, 2023

      Starting rebalance, KeepNodes = ['ns_1@172.23.120.58','ns_1@172.23.120.73',
      'ns_1@172.23.120.74','ns_1@172.23.120.75',
      'ns_1@172.23.120.77','ns_1@172.23.120.81',
      'ns_1@172.23.120.86','ns_1@172.23.121.77',
      'ns_1@172.23.123.25','ns_1@172.23.123.26',
      'ns_1@172.23.123.31','ns_1@172.23.123.32',
      'ns_1@172.23.123.33','ns_1@172.23.96.122',
      'ns_1@172.23.96.243','ns_1@172.23.96.254',
      'ns_1@172.23.96.48','ns_1@172.23.97.105',
      'ns_1@172.23.97.110','ns_1@172.23.97.112',
      'ns_1@172.23.97.148','ns_1@172.23.97.241',
      'ns_1@172.23.97.74'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 15d6b8453948fc4adc52bbe192c48030
      

      172.23.120.74 3:39:42 AM 9 May, 2023

      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.259.332>,
      {{wait_seqno_persisted_failed,"default",990,
      701227,
      [{'ns_1@172.23.97.74',
      {'EXIT',
      {socket_closed,
      {gen_server,call,
      [{'janitor_agent-default',
      'ns_1@172.23.97.74'},
      {if_rebalance,<0.30380.331>,
      {wait_seqno_persisted,990,701227}},
      infinity]}}}}]},
      [{ns_single_vbucket_mover,
      '-wait_seqno_persisted_many/5-fun-2-',5,
      [{file,"src/ns_single_vbucket_mover.erl"},
      {line,474}]},
      {proc_lib,init_p,3,
      [{file,"proc_lib.erl"},{line,211}]}]}}}}.
      Rebalance Operation Id = 15d6b8453948fc4adc52bbe192c48030
      

      cbcollect_info attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty