Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45788

[Volume Test] [MOI 10K test] Rebalance to add back failed over node failed due to error - no_connection,index-service_api

    XMLWordPrintable

Details

    Description

      Build : 7.0.0-4960
      Test : -test tests/2i/cheshirecat/test_idx_cc_vol_10K_moi_tmp.yml -scope tests/2i/cheshirecat/scope_idx_cc_vol_10K_moi.yml (MOI 10K indexes test)
      Scale : 5

      At 2021-04-19T12:16 indexer node 172.23.120.74 was hard failed over, and subsequently recovered. The rebalance operation to add it back to the cluster failed under 1 min with the error :

      Rebalance exited with reason {service_rebalance_failed,index,
      {agent_died,<30213.17198.987>,
      {linked_process_died,<30213.18073.987>,
      {no_connection,"index-service_api"}}}}.
      Rebalance Operation Id = afea1b05f48eadbcba049622f8e2034f
      

      Test console :

      [2021-04-19T12:16:11-07:00, sequoiatools/couchbase-cli:7.0:9c2da4] failover -c 172.23.97.74:8091 --server-failover 172.23.120.74:8091 -u Administrator -p password --hard
      [2021-04-19T12:16:23-07:00, sequoiatools/couchbase-cli:7.0:129e13] recovery -c 172.23.97.74:8091 --server-recovery 172.23.120.74:8091 --recovery-type full -u Administrator -p password
      [2021-04-19T12:16:34-07:00, sequoiatools/couchbase-cli:7.0:e842aa] rebalance -c 172.23.97.74:8091 -u Administrator -p password
      →  
       
      Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.97.74:8091 -u Administrator -p password]
       
      docker logs e842aa
      docker start e842aa
       
      ������*Unable to display progress bar on this os
      ������JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2021-04-19T12:17:43-07:00, sequoiatools/cmd:5a6ef4] 60
      

      From the debug logs on 172.23.120.74, the following errors are seen :

      [ns_server:error,2021-04-19T12:17:36.143-07:00,ns_1@172.23.120.74:<0.18073.987>:service_agent:wait_for_connection_loop:345]No connection with label "index-service_api" after 60000ms. Exiting.
      [error_logger:error,2021-04-19T12:17:36.144-07:00,ns_1@172.23.120.74:<0.18073.987>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: service_agent:'-spawn_connection_waiter/2-fun-0-'/0
          pid: <0.18073.987>
          registered_name: []
          exception exit: {no_connection,"index-service_api"}
            in function  service_agent:wait_for_connection_loop/3 (src/service_agent.erl, line 347)
          ancestors: ['service_agent-index',service_agent_children_sup,
                        service_agent_sup,ns_server_sup,ns_server_nodes_sup,
                        <0.9311.0>,ns_server_cluster_sup,root_sup,<0.138.0>]
          message_queue_len: 0
          messages: []
          links: [<0.17198.987>,<0.18586.987>]
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 1598
          stack_size: 27
          reductions: 2562
        neighbours:
       
      [ns_server:debug,2021-04-19T12:17:36.144-07:00,ns_1@172.23.120.74:<0.18586.987>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {json_rpc_events,<0.18073.987>} exited with reason {no_connection,
                                                                                         "index-service_api"}
      [ns_server:error,2021-04-19T12:17:36.144-07:00,ns_1@172.23.120.74:service_agent-index<0.17198.987>:service_agent:handle_info:277]Linked process <0.18073.987> died with reason {no_connection,
                                                     "index-service_api"}. Terminating
      [ns_server:error,2021-04-19T12:17:36.145-07:00,ns_1@172.23.120.74:service_agent-index<0.17198.987>:service_agent:terminate:306]Terminating abnormally
      [error_logger:error,2021-04-19T12:17:36.145-07:00,ns_1@172.23.120.74:service_agent-index<0.17198.987>:ale_error_logger_handler:do_log:101]
      =========================ERROR REPORT=========================
      ** Generic server 'service_agent-index' terminating
      ** Last message in was {'EXIT',<0.18073.987>,
                                     {no_connection,"index-service_api"}}
      ** When Server state == {state,index,
                               {dict,60,16,16,8,80,48,
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                                {{[[{uuid,<<"f024244e895bf88d497f9962abf63562">>}|
                                    'ns_1@172.23.106.136']],
                                  [[{uuid,<<"49fb081b12e2b0e57401b211cedc2d60">>}|
                                    'ns_1@172.23.120.75']],
                                  [[{node,'ns_1@172.23.120.81'}|
                                    <<"0ca6f6ade5b881dbcf323697e57bd50f">>],
                                   [{uuid,<<"885ec982e10f9d55dff146df140de45e">>}|
                                    'ns_1@172.23.121.77'],
                                   [{node,'ns_1@172.23.123.31'}|
                                    <<"c164986a722efb761366ec55b0aa171c">>],
                                   [{uuid,<<"18e3b4afa855e31a47c08208541be350">>}|
                                    'ns_1@172.23.97.110'],
                                   [{uuid,<<"4103403bdfb6ae0d67022776f80424f6">>}|
                                    'ns_1@172.23.97.151'],
                                   [{node,'ns_1@172.23.97.151'}|
                                    <<"4103403bdfb6ae0d67022776f80424f6">>],
                                   [{node,'ns_1@172.23.97.241'}|
                                    <<"c450c30318b67ac88e34797e84c79330">>],
                                   [{uuid,<<"fc2b1bc898c9760ba2341313a1e4960a">>}|
                                    'ns_1@172.23.97.74']],
                                  [[{node,'ns_1@172.23.106.134'}|
                                    <<"c98960b82ceef50322fc8c5d5f97321a">>],
                                   [{node,'ns_1@172.23.120.74'}|
                                    <<"83a3dc4da11c0ec7214e65d58232cb23">>],
                                   [{node,'ns_1@172.23.123.24'}|
                                    <<"7a726f4aeea4dec7c7cb850fbe19f307">>],
                                   [{node,'ns_1@172.23.96.14'}|
                                    <<"f26e54ddeca8d521499b4932e4984f56">>],
                                   [{uuid,<<"386741b266e0e4b91924363e45302334">>}|
                                    'ns_1@172.23.96.254'],
                                   [{node,'ns_1@172.23.96.254'}|
      ...
      ...
      ** Reason for termination ==
      ** {linked_process_died,<0.18073.987>,{no_connection,"index-service_api"}}
       
      [error_logger:error,2021-04-19T12:17:36.147-07:00,ns_1@172.23.120.74:service_agent-index<0.17198.987>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: service_agent:init/1
          pid: <0.17198.987>
          registered_name: 'service_agent-index'
          exception exit: {linked_process_died,<0.18073.987>,
                              {no_connection,"index-service_api"}}
            in function  gen_server:handle_common_reply/8 (gen_server.erl, line 751)
          ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup,
                        ns_server_nodes_sup,<0.9311.0>,ns_server_cluster_sup,
                        root_sup,<0.138.0>]
          message_queue_len: 1
          messages: [{'EXIT',<0.17459.987>,
                            {linked_process_died,<0.18073.987>,
                                {no_connection,"index-service_api"}}}]
          links: [<0.18950.987>,<0.9683.0>]
          dictionary: []
          trap_exit: true
          status: running
          heap_size: 999631
          stack_size: 27
          reductions: 159456
        neighbours:
       
      [ns_server:debug,2021-04-19T12:17:36.147-07:00,ns_1@172.23.120.74:<0.18950.987>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {ns_config_events,<0.17198.987>} exited with reason {linked_process_died,
                                                                                          <0.18073.987>,
                                                                                          {no_connection,
                                                                                           "index-service_api"}}
      [error_logger:error,2021-04-19T12:17:36.148-07:00,ns_1@172.23.120.74:service_agent_children_sup<0.9683.0>:ale_error_logger_handler:do_log:101]
      

      Indexer process had started on 2021-04-19T12:16:36 after the recovery.

      Indexer nodes : 172.23.106.136, 172.23.120.58, 172.23.120.74, 172.23.120.75, 172.23.120.77, 172.23.120.81, 172.23.120.86, 172.23.123.31, 172.23.123.32, 172.23.123.33, 172.23.96.243, 172.23.96.254, 172.23.97.105, 172.23.97.110, 172.23.97.112

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty