Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61017

Intermittent test failure - failover_tests:auto_failover_test (Auto failover)

    XMLWordPrintable

Details

    • 0

    Description

      Seen in https://cv.jenkins.couchbase.com/job/ns-server-ns-test/9367

      Summary error:

      *failed*
      in function failover_tests:auto_failover_t/2 (test/failover_tests.erl, line 481)
      in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71)
      in call from eunit_proc:run_test/1 (eunit_proc.erl, line 531)
      in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 356)
      in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 514)
      in call from eunit_proc:tests_inorder/3 (eunit_proc.erl, line 456)
      in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 346)
      in call from eunit_proc:run_group/2 (eunit_proc.erl, line 570)
      **error:{assertNotEqual,[{module,failover_tests},
                       {line,481},
                       {expression,"proplists : get_value ( failover_complete , Counters )"},
                       {value,undefined}]}
        output:<<"">>
      

      Detailed logs:

          failover_tests:365: -auto_failover_test_/0-fun-6- (Auto failover)...[ns_server:debug,2024-03-01T22:18:59.098Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:init_logic_state:249]Using auto-failover logic state {state,[],
                                          [{service_state,kv,nil,false},
                                           {service_state,n1ql,nil,false},
                                           {service_state,index,nil,false},
                                           {service_state,fts,nil,false}],
                                          -1}
      [user:info,2024-03-01T22:18:59.098Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:enable_auto_failover:448]Enabled auto-failover with timeout 1 and max count 5
      [ns_server:debug,2024-03-01T22:18:59.098Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:update_and_save_auto_failover_state:467]No change in timeout 1
      [ns_server:debug,2024-03-01T22:18:59.098Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:update_and_save_auto_failover_state:475]No change in max count 5
      [ns_server:debug,2024-03-01T22:18:59.108Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:log_down_nodes_reason:435]Node q is considered down. Reason:"The cluster manager did not respond. "
      [ns_server:debug,2024-03-01T22:18:59.108Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:log_down_nodes_reason:435]Node b is considered down. Reason:"The cluster manager did not respond. "
      [ns_server:debug,2024-03-01T22:18:59.109Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:log_down_nodes_reason:435]Node a is considered down. Reason:"The cluster manager did not respond. "
      [ns_server:debug,2024-03-01T22:18:59.109Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:130]Transitioned node {c,undefined} state new -> up
      [ns_server:debug,2024-03-01T22:18:59.109Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,c,new,
                                            up,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:130]Transitioned node {d,undefined} state new -> up
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,d,new,
                                            up,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{a,undefined},0,new,undefined}
      ->{node_state,{a,undefined},0,half_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,a,new,
                                            half_down,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{b,undefined},0,new,undefined}
      ->{node_state,{b,undefined},0,half_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,b,new,
                                            half_down,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{q,undefined},0,new,undefined}
      ->{node_state,{q,undefined},0,half_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.110Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,q,new,
                                            half_down,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.112Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:process_frame:205]List of candidates changed from [] to [{a,undefined},
                                             {b,undefined},
                                             {q,undefined}]. Resetting counter
      [ns_server:debug,2024-03-01T22:18:59.112Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{a,undefined},0,half_down,undefined}
      ->{node_state,{a,undefined},0,nearly_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.112Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,a,
                                            half_down,nearly_down,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.112Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{b,undefined},0,half_down,undefined}
      ->{node_state,{b,undefined},0,nearly_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.112Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,b,
                                            half_down,nearly_down,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.112Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{q,undefined},0,half_down,undefined}
      ->{node_state,{q,undefined},0,nearly_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.112Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,q,
                                            half_down,nearly_down,0}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.114Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{a,undefined},0,nearly_down,undefined}
      ->{node_state,{a,undefined},1,nearly_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.114Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,a,
                                            nearly_down,nearly_down,1}: {error,
                                                                         badarg}
      [ns_server:debug,2024-03-01T22:18:59.114Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{b,undefined},0,nearly_down,undefined}
      ->{node_state,{b,undefined},1,nearly_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.114Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,b,
                                            nearly_down,nearly_down,1}: {error,
                                                                         badarg}
      [ns_server:debug,2024-03-01T22:18:59.114Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{q,undefined},0,nearly_down,undefined}
      ->{node_state,{q,undefined},1,nearly_down,undefined}
      [ns_server:debug,2024-03-01T22:18:59.114Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,q,
                                            nearly_down,nearly_down,1}: {error,
                                                                         badarg}
      [ns_server:debug,2024-03-01T22:18:59.115Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{a,undefined},1,nearly_down,undefined}
      ->{node_state,{a,undefined},1,failover,undefined}
      [ns_server:debug,2024-03-01T22:18:59.115Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,a,
                                            nearly_down,failover,1}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.115Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{b,undefined},1,nearly_down,undefined}
      ->{node_state,{b,undefined},1,failover,undefined}
      [ns_server:debug,2024-03-01T22:18:59.116Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,b,
                                            nearly_down,failover,1}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.116Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:log_master_activity:132]Incremented down state:
      {node_state,{q,undefined},1,nearly_down,undefined}
      ->{node_state,{q,undefined},1,failover,undefined}
      [ns_server:debug,2024-03-01T22:18:59.116Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_node_state_change,q,
                                            nearly_down,failover,1}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.116Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover_logic:process_frame:233]Decided on following actions: [{failover,
                                         [{a,undefined},
                                          {b,undefined},
                                          {q,undefined}]}]
      [ns_server:debug,2024-03-01T22:18:59.116Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2366.0>:auto_failover:trigger_autofailover:636]Initiating failover request against orchestrator for [a,b,q]
      [ns_server:debug,2024-03-01T22:18:59.193Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2355.0>:failover:start:54]Starting failover with Nodes = [a,b,q], Options = #{allow_unsafe => false,
                                                          auto => true,
                                                          down_nodes => [q,b,a],
                                                          failover_reasons =>
                                                           [{a,
                                                             "The cluster manager did not respond. "},
                                                            {b,
                                                             "The cluster manager did not respond. "},
                                                            {q,
                                                             "The cluster manager did not respond. "}]}
      [user:info,2024-03-01T22:18:59.194Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:orchestrate:172]Starting failing over [a,b,q]
      [ns_server:debug,2024-03-01T22:18:59.194Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2355.0>:failover:start:78]Failover started. Pid = <0.2370.0>
      [ns_server:debug,2024-03-01T22:18:59.194Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:master_activity_events:submit_cast:75]Failed to send master activity event {failover,[a,b,q]}: {error,badarg}
      [user:info,2024-03-01T22:18:59.194Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2355.0>:ns_orchestrator:handle_start_failover:1861]Starting failover of nodes [a,b,q] AllowUnsafe = false Operation Id = b5db1694a3ab7b3a7b1a52f45889adfb
      [ns_server:info,2024-03-01T22:18:59.194Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:pre_failover_config_sync:219]Going to sync with chronicle quorum
      [ns_server:debug,2024-03-01T22:18:59.197Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:master_activity_events:submit_cast:75]Failed to send master activity event {bucket_failover_started,"default",
                                            [a,b],
                                            <0.2370.0>}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.198Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:promote_max_replicas:842]Retrieved the following vbuckets information: [{1,[]}]
      [ns_server:debug,2024-03-01T22:18:59.198Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:failover_bucket_prep:611]Original vbucket map: [[a,b,c,d]]
      VBucket map with failover applied: [[c,d,undefined,undefined]]
      [ns_server:debug,2024-03-01T22:18:59.199Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:master_activity_events:submit_cast:75]Failed to send master activity event {set_ff_map,"default",undefined}: {error,
                                                                              badarg}
      [ns_server:debug,2024-03-01T22:18:59.201Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:ns_janitor:check_states_match:399]Found states mismatch in bucket "default":
      [{0,
        [c,d,undefined,undefined],
        [{c,active,[{high_prepared_seqno,1},{high_seqno,1}]},
         {d,replica,[{high_prepared_seqno,1},{high_seqno,1}]},
         {undefined,replica,[{high_prepared_seqno,1},{high_seqno,1}]},
         {undefined,replica,[{high_prepared_seqno,1},{high_seqno,1}]}]}]
      [ns_server:debug,2024-03-01T22:18:59.201Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:ns_janitor:pull_config:418]Going to pull config
      [ns_server:debug,2024-03-01T22:18:59.202Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:master_activity_events:submit_cast:75]Failed to send master activity event {bucket_failover_ended,"default",
                                            [a,b],
                                            <0.2370.0>}: {error,badarg}
      [ns_server:debug,2024-03-01T22:18:59.202Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:update_failover_vbuckets_sets:498]Updating failover_vbuckets for a with [{"default",[0]}]
      Existing vbuckets: []
      New vbuckets: [{"default",[0]}]
      [ns_server:debug,2024-03-01T22:18:59.203Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:update_failover_vbuckets_sets:498]Updating failover_vbuckets for b with [{"default",[0]}]
      Existing vbuckets: []
      New vbuckets: [{"default",[0]}]
      [ns_server:debug,2024-03-01T22:18:59.203Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:complete_services_failover:644]Failover nodes [a,b,q] from services [query]
      [ns_server:debug,2024-03-01T22:18:59.209Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:ns_cluster_membership:service_clear_pending_failover:630]Clear pending failover for service query
      [ns_server:debug,2024-03-01T22:18:59.210Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:complete_failover_service:654]Failed over service query on nodes [a,b,q] successfully
      [user:info,2024-03-01T22:18:59.210Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:orchestrate:182]Failed over [a,b,q]: ok
      [user:info,2024-03-01T22:18:59.210Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:failover:deactivate_nodes:237]Deactivating failed over nodes [a,b,q]
      [ns_server:debug,2024-03-01T22:18:59.210Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:ns_cluster_membership:deactivate:227]Deactivate nodes [a,b,q]
      [ns_server:debug,2024-03-01T22:18:59.210Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2370.0>:master_activity_events:submit_cast:75]Failed to send master activity event {failover_ended}: {error,badarg}
      [user:info,2024-03-01T22:18:59.210Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2355.0>:ns_orchestrator:log_rebalance_completion:1661]Failover completed successfully.
      Rebalance Operation Id = b5db1694a3ab7b3a7b1a52f45889adfb
      *failed*
      in function failover_tests:auto_failover_t/2 (test/failover_tests.erl, line 481)
      in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71)
      in call from eunit_proc:run_test/1 (eunit_proc.erl, line 531)
      in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 356)
      in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 514)
      in call from eunit_proc:tests_inorder/3 (eunit_proc.erl, line 456)
      in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 346)
      in call from eunit_proc:run_group/2 (eunit_proc.erl, line 570)
      **error:{assertNotEqual,[{module,failover_tests},
                       {line,481},
                       {expression,"proplists : get_value ( failover_complete , Counters )"},
                       {value,undefined}]}
        output:<<"">>
       
      [ns_server:info,2024-03-01T22:18:59.217Z,test-yRgSvKGDx0djLt4A@b2c327656839:leader_registry<0.2319.0>:leader_registry:handle_down:286]Process <0.2368.0> registered as 'ns_rebalance_observer' terminated.
      [ns_server:debug,2024-03-01T22:18:59.217Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_done,a,
                                            "cluster_manager_down"}: {error,badarg}
      [user:info,2024-03-01T22:18:59.218Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:log_failover_success:659]Node (a) was automatically failed over. Reason: The cluster manager did not respond. 
      [ns_server:debug,2024-03-01T22:18:59.224Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_done,b,
                                            "cluster_manager_down"}: {error,badarg}
      [user:info,2024-03-01T22:18:59.224Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:log_failover_success:659]Node (b) was automatically failed over. Reason: The cluster manager did not respond. 
      [ns_server:debug,2024-03-01T22:18:59.224Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:master_activity_events:submit_cast:75]Failed to send master activity event {autofailover_done,q,
                                            "cluster_manager_down"}: {error,badarg}
      [user:info,2024-03-01T22:18:59.224Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2356.0>:auto_failover:log_failover_success:659]Node (q) was automatically failed over. Reason: The cluster manager did not respond. 
      [ns_server:info,2024-03-01T22:18:59.229Z,test-yRgSvKGDx0djLt4A@b2c327656839:leader_registry<0.2319.0>:leader_registry:handle_down:286]Process <0.2356.0> registered as 'auto_failover' terminated.
      [ns_server:debug,2024-03-01T22:19:00.563Z,test-yRgSvKGDx0djLt4A@b2c327656839:leader_registry<0.2464.0>:leader_registry:handle_new_leader:275]New leader is 'test-yRgSvKGDx0djLt4A@b2c327656839'. Invalidating name cache.
      [ns_server:info,2024-03-01T22:19:00.885Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.1614.0>:misc:start_singleton:913]start_singleton(gen_server, start_link, [{via,leader_registry,
                                                auto_reprovision},
                                               auto_reprovision,[],[]]): started as <0.2478.0> on 'test-yRgSvKGDx0djLt4A@b2c327656839'
       
      [ns_server:info,2024-03-01T22:19:01.227Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.1614.0>:misc:start_singleton:913]start_singleton(gen_statem, start_link, [{via,leader_registry,ns_orchestrator},
                                               ns_orchestrator,[],[]]): started as <0.2500.0> on 'test-yRgSvKGDx0djLt4A@b2c327656839'
       
      [ns_server:debug,2024-03-01T22:19:01.234Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.2501.0>:auto_failover:init:223]init auto_failover.
      [ns_server:info,2024-03-01T22:19:01.235Z,test-yRgSvKGDx0djLt4A@b2c327656839:<0.1614.0>:misc:start_singleton:913]start_singleton(gen_server, start_link, [{via,leader_registry,auto_failover},
                                               auto_failover,[],[]]): started as <0.2501.0> on 'test-yRgSvKGDx0djLt4A@b2c327656839'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            peter.searby Peter Searby
            peter.searby Peter Searby
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty