Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43290

Rebalance failure observed in build sanity after backup service is added

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Issue observed in: 7.0.0-4025

      Test:
      ./testrunner -i node_conf.ini -p get-cbcollect-info=True,get-couch-dbinfo=True,skip_cleanup=False,skip_log_scan=False -t ent_backup_restore.enterprise_backup_restore_test.EnterpriseBackupRestoreTest.test_backup_restore_sanity,items=1000

      From diag.log:

      020-12-14T11:05:49.498-08:00, memcached_config_mgr:0:info:message(ns_1@172.23.105.153) - Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]
      2020-12-14T11:05:50.109-08:00, ns_orchestrator:0:info:message(ns_1@172.23.105.151) - Starting rebalance, KeepNodes = ['ns_1@172.23.105.151','ns_1@172.23.105.153'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 4f9a3be20d968903fc7ea27ccb5b3b56
      2020-12-14T11:05:52.360-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.105.151) - Rebalance exited with reason {{badmatch,failed},
                                    [{ns_rebalancer,rebalance_body,5,
                                         [{file,"src/ns_rebalancer.erl"},
                                          {line,532}]},
                                     {async,'-async_init/4-fun-1-',3,
                                         [{file,"src/async.erl"},{line,197}]}]}.
      Rebalance Operation Id = 4f9a3be20d968903fc7ea27ccb5b3b56
      2020-12-14T11:06:00.305-08:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.105.151) - Client-side error-report for user "<ud>Administrator</ud>" on node 'ns_1@172.23.105.151':
      User-Agent:Python-httplib2/0.13.1 (gzip)
      Starting rebalance from test, ejected nodes ['ns_1@172.23.105.153']
      2020-12-14T11:06:00.313-08:00, ns_orchestrator:0:info:message(ns_1@172.23.105.151) - Starting rebalance, KeepNodes = ['ns_1@172.23.105.151'], EjectNodes = ['ns_1@172.23.105.153'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 0045d716e47be11e253d0725577b86cf
      2020-12-14T11:06:10.439-08:00, ns_cluster:1:info:message(ns_1@172.23.105.153) - Node 'ns_1@172.23.105.153' is leaving cluster.
      2020-12-14T11:06:10.447-08:00, ns_orchestrator:0:info:message(ns_1@172.23.105.151) - Rebalance completed successfully.
      Rebalance Operation Id = 0045d716e47be11e253d0725577b86cf
      2020-12-14T11:06:10.644-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.105.151) - Node 'ns_1@172.23.105.151' saw that node 'ns_1@172.23.105.153' went down. Details: [{nodedown_reason,
                                                                                           connection_closed}]
      2020-12-14T11:07:01.831-08:00, ns_cookie_manager:3:info:cookie update(ns_1@172.23.105.151) - Initial otp cookie generated: {sanitized,
                                        <<"VOL7MTlDuCj/QIAJDPiYpZNWoVQkVkznD/h9HETT13E=">>}
      2020-12-14T11:07:01.957-08:00, menelaus_sup:1:info:web start ok(ns_1@172.23.105.151) - Couchbase Server has started on web port 8091 on node 'ns_1@172.23.105.151'. Version: "7.0.0-4025-enterprise".
      2020-12-14T11:07:02.094-08:00, mb_master:0:info:message(ns_1@172.23.105.151) - I'm the only node, so I'm the master.
      2020-12-14T11:07:02.170-08:00, compat_mode_manager:0:warning:message(ns_1@172.23.105.151) - Changed cluster compat mode from undefined to [7,0]
      2020-12-14T11:07:02.203-08:00, auto_failover:0:info:message(ns_1@172.23.105.151) - Enabled auto-failover with timeout 120 and max count 1
      2020-12-14T11:07:08.878-08:00, menelaus_web:102:warning:client-side error report(ns_1@172.23.105.151) - Client-side error-report for user "<ud>Administrator</ud>" on node 'ns_1@172.23.105.151':
      User-Agent:Python-httplib2/0.13.1 (gzip)
      2020-12-14 11:07:08.856707 : test_backup_restore_sanity finished 
      -------------------------------
       
       
      per_node_processes('ns_1@172.23.105.151') =
           {<0.5656.0>,
            [{backtrace,
                 [<<"Program counter: 0x00007f261dcf6ff0 (diag_handler:'-collect_diag_per_node/1-fun-1-'/2 + 112)">>,
                  <<"CP: 0x0000000000000000 (invalid)">>,<<>>,
                  <<"0x00007f25d7f7a470 Return addr 0x00007f26653d6390 (proc_lib:init_p/3 + 200)">>,
                  <<"y(0)     <0.5655.0>">>,<<>>,
                  <<"0x00007f25d7f7a480 Return addr 0x0000000000986fa8 (<terminate process normally>)">>,
                  <<"y(0)     []">>,<<"y(1)     []">>,
                  <<"y(2)     Catch 0x00007f26653d63a0 (proc_lib:init_p/3 + 216)">>,
                  <<>>]},
             {messages,[]},
             {dictionary,
                 [{'$ancestors',[<0.5655.0>]},
                  {'$initial_call',
                      {diag_handler,'-collect_diag_per_node/1-fun-1-',0}}]},
             {registered_name,[]},
             {status,waiting},
             {initial_call,{proc_lib,init_p,3}},
             {error_handler,error_handler},
             {garbage_collection,
                 [{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                  {min_bin_vheap_size,46422},
                  {min_heap_size,233},
                  {fullsweep_after,512},
                  {minor_gcs,0}]},
             {garbage_collection_info,
                 [{old_heap_block_size,0},
                  {heap_block_size,233},
                  {mbuf_size,0},
                  {recent_size,0},
                  {stack_size,6},
                  {old_heap_size,0},
                  {heap_size,32},
                  {bin_vheap_size,0},
                  {bin_vheap_block_size,46422},
                  {bin_old_vheap_size,0},
                  {bin_old_vheap_block_size,46422}]},
             {links,[<0.5655.0>]},
             {monitors,[{process,<0.339.0>},{process,<0.5655.0>}]},
             {monitored_by,[]},
             {memory,2860},
             {message_queue_len,0},
             {reductions,13},
             {trap_exit,false},
             {current_location,
                 {diag_handler,'-collect_diag_per_node/1-fun-1-',2,
                     [{file,"src/diag_handler.erl"},{line,228}]}}]}
      

      We added backup service to build sanity and started seeing this failure - attaching logs

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            arunkumar Arunkumar Senthilnathan (Inactive)
            arunkumar Arunkumar Senthilnathan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty