Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19708

[system tests windows] Rebalance exited with reason {noproc, {gen_server,call + erlang crash

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.6.0
    • 4.5.0
    • ns_server
    • 4.5.0-2585
    • Untriaged
    • Windows 64-bit
    • Unknown

    Description

      the same test/step as in MB-19687
      rebalance in 3+1 node on src cluster afte dataload more then 1 day

      I see 3 issues here but most likely they are interrelated

      1

      Rebalance exited with reason {noproc,
                                    {gen_server,call,
                                     [{'janitor_agent-RevAB','ns_1@172.23.107.85'},
                                      {get_dcp_docs_estimate,811,
                                       ['ns_1@172.23.107.48','ns_1@172.23.105.87']},
                                      infinity]}}
      ns_orchestrator 002	ns_1@172.23.105.87	12:46:18 PM Fri May 20, 2016
      

      2

      Service 'ns_server' exited with status 1. Restarting. Messages: [os_mon] win32 supervisor port (win32sysinfo): Erlang has closed
      {"Kernel pid terminated",application_controller,"{application_terminated,os_mon,shutdown}"}
       
      Crash dump was written to: erl_crash.dump
      Kernel pid terminated (application_controller) ({application_terminated,os_mon,shutdown})	ns_log 000	ns_1@172.23.107.85	12:47:00 PM Fri May 20, 2016
      

      3

      Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=3
      MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=4
      Metadata service not available after 30 retries. 
       
      [goport] 2016/05/20 12:46:50 c:/Program Files/Couchbase/Server/bin/goxdcr.exe terminated: exit status 1	ns_log 000	ns_1@172.23.107.85	12:47:00 PM Fri May 20, 2016
      

      all console logs:

      Bucket "AbRegNums" loaded on node 'ns_1@172.23.107.85' in 0 seconds.	ns_memcached 000	ns_1@172.23.107.85	12:47:08 PM Fri May 20, 2016
      Bucket "MsgsCalls" loaded on node 'ns_1@172.23.107.85' in 0 seconds.	ns_memcached 000	ns_1@172.23.107.85	12:47:08 PM Fri May 20, 2016
      Bucket "RevAB" loaded on node 'ns_1@172.23.107.85' in 0 seconds.	ns_memcached 000	ns_1@172.23.107.85	12:47:07 PM Fri May 20, 2016
      Bucket "UserInfo" loaded on node 'ns_1@172.23.107.85' in 0 seconds.	ns_memcached 000	ns_1@172.23.107.85	12:47:06 PM Fri May 20, 2016
      Couchbase Server has started on web port 8091 on node 'ns_1@172.23.107.85'. Version: "4.5.0-2585-enterprise".	menelaus_sup 001	ns_1@172.23.107.85	12:47:04 PM Fri May 20, 2016
      Node 'ns_1@172.23.105.94' saw that node 'ns_1@172.23.107.85' came up. Tags: []	ns_node_disco 004	ns_1@172.23.105.94	12:47:02 PM Fri May 20, 2016
      Node 'ns_1@172.23.107.48' saw that node 'ns_1@172.23.107.85' came up. Tags: []	ns_node_disco 004	ns_1@172.23.107.48	12:47:01 PM Fri May 20, 2016
      Node 'ns_1@172.23.107.85' synchronized otp cookie aaqxrnpoboutibbv from cluster	ns_cookie_manager 002	ns_1@172.23.107.85	12:47:00 PM Fri May 20, 2016
      Node 'ns_1@172.23.105.87' saw that node 'ns_1@172.23.107.85' came up. Tags: []	ns_node_disco 004	ns_1@172.23.105.87	12:47:00 PM Fri May 20, 2016
      Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=3
      MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=4
      Metadata service not available after 30 retries. 
       
      [goport] 2016/05/20 12:46:50 c:/Program Files/Couchbase/Server/bin/goxdcr.exe terminated: exit status 1	ns_log 000	ns_1@172.23.107.85	12:47:00 PM Fri May 20, 2016
      Service 'ns_server' exited with status 1. Restarting. Messages: [os_mon] win32 supervisor port (win32sysinfo): Erlang has closed
      {"Kernel pid terminated",application_controller,"{application_terminated,os_mon,shutdown}"}
       
      Crash dump was written to: erl_crash.dump
      Kernel pid terminated (application_controller) ({application_terminated,os_mon,shutdown})	ns_log 000	ns_1@172.23.107.85	12:47:00 PM Fri May 20, 2016
      Node 'ns_1@172.23.107.48' saw that node 'ns_1@172.23.107.85' went down. Details: [{nodedown_reason,
                                                                                         connection_closed}]	ns_node_disco 005	ns_1@172.23.107.48	12:46:24 PM Fri May 20, 2016
      Node 'ns_1@172.23.105.87' saw that node 'ns_1@172.23.107.85' went down. Details: [{nodedown_reason,
                                                                                         connection_closed}]	ns_node_disco 005	ns_1@172.23.105.87	12:46:24 PM Fri May 20, 2016
      Node 'ns_1@172.23.105.94' saw that node 'ns_1@172.23.107.85' went down. Details: [{nodedown_reason,
                                                                                         connection_closed}]	ns_node_disco 005	ns_1@172.23.105.94	12:46:24 PM Fri May 20, 2016
      Shutting down bucket "AbRegNums" on 'ns_1@172.23.107.85' for server shutdown	ns_memcached 000	ns_1@172.23.107.85	12:46:22 PM Fri May 20, 2016
      Shutting down bucket "MsgsCalls" on 'ns_1@172.23.107.85' for server shutdown	ns_memcached 000	ns_1@172.23.107.85	12:46:19 PM Fri May 20, 2016
      Rebalance exited with reason {noproc,
                                    {gen_server,call,
                                     [{'janitor_agent-RevAB','ns_1@172.23.107.85'},
                                      {get_dcp_docs_estimate,811,
                                       ['ns_1@172.23.107.48','ns_1@172.23.105.87']},
                                      infinity]}}
      ns_orchestrator 002	ns_1@172.23.105.87	12:46:18 PM Fri May 20, 2016
      Shutting down bucket "RevAB" on 'ns_1@172.23.107.85' for server shutdown	ns_memcached 000	ns_1@172.23.107.85	12:46:17 PM Fri May 20, 2016
      Shutting down bucket "UserInfo" on 'ns_1@172.23.107.85' for server shutdown	ns_memcached 000	ns_1@172.23.107.85	12:46:15 PM Fri May 20, 2016
      Bucket "RevAB" rebalance does not seem to be swap rebalance	ns_vbucket_mover 000	ns_1@172.23.105.87	12:36:18 PM Fri May 20, 2016
      Bucket "RevAB" loaded on node 'ns_1@172.23.107.48' in 0 seconds.	ns_memcached 000	ns_1@172.23.107.48	12:36:17 PM Fri May 20, 2016
      Started rebalancing bucket RevAB	ns_rebalancer 000	ns_1@172.23.105.87	12:36:16 PM Fri May 20, 2016
      Bucket "UserInfo" rebalance does not seem to be swap rebalance	ns_vbucket_mover 000	ns_1@172.23.105.87	12:28:55 PM Fri May 20, 2016
      Bucket "UserInfo" loaded on node 'ns_1@172.23.107.48' in 0 seconds.	ns_memcached 000	ns_1@172.23.107.48	12:28:49 PM Fri May 20, 2016
      Started rebalancing bucket UserInfo	ns_rebalancer 000	ns_1@172.23.105.87	12:28:48 PM Fri May 20, 2016
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.87','ns_1@172.23.105.94',
                                       'ns_1@172.23.107.48','ns_1@172.23.107.85'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
      ns_orchestrator 004	ns_1@172.23.105.87	12:28:45 PM Fri May 20, 2016
      
      

      Attachments

        For Gerrit Dashboard: MB-19708
        # Subject Branch Project Status CR V

        Activity

          People

            Aliaksey Artamonau Aliaksey Artamonau (Inactive)
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty