Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7113

windows - constant restarts of mb_master during small scale performance tests

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 2.0
    • 2.0-beta-2
    • ns_server
    • Security Level: Public
    • VMs, Windows 64-bit, 4 nodes, HDD, 4 cores, 24GB
      Build 1940

    Description

      Restarts happen on 1 or 2 nodes every time I run tests, usually with the same error.
      No problems with loading data and initial indexing. Why does it happen?

      [ns_server:info,2012-11-06T12:43:31.635,ns_1@10.2.3.31:mb_master<0.18558.13>:mb_master:terminate:288]Synchronously shutting down child mb_master_sup
      [error_logger:error,2012-11-06T12:43:32.745,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor:

      {local,mb_master_sup}

      Context: shutdown_error
      Reason: killed
      Offender: [

      {pid,<0.19815.13>}

      ,

      {name,ns_orchestrator}

      ,
      {mfargs,{ns_orchestrator,start_link,[]}},

      {restart_type,permanent},
      {shutdown,20},
      {child_type,worker}]


      [stats:warn,2012-11-06T12:43:32.651,ns_1@10.2.3.31:system_stats_collector<0.478.0>:system_stats_collector:handle_info:133]lost 7 ticks
      [ns_server:debug,2012-11-06T12:43:33.495,ns_1@10.2.3.31:<0.18559.13>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {ns_config_events,<0.18558.13>} exited with reason {timeout,
      {gen_server,
      call,
      [ns_node_disco,
      nodes_wanted]}}
      [error_logger:error,2012-11-06T12:43:34.073,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** State machine mb_master terminating
      ** Last message in was send_heartbeat
      ** When State == master
      ** Data == {state,<0.19814.13>,'ns_1@10.2.3.31',
      ['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34',
      'ns_1@10.2.3.35'],
      {1352,234605,120106}}
      ** Reason for termination =
      ** {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}

      [ns_server:debug,2012-11-06T12:43:35.166,ns_1@10.2.3.31:ns_server_sup<0.385.0>:mb_master:check_master_takeover_needed:144]Sending master node question to the following nodes: ['ns_1@10.2.3.35',
      'ns_1@10.2.3.34',
      'ns_1@10.2.3.33']
      [error_logger:error,2012-11-06T12:43:35.276,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: mb_master:init/1
      pid: <0.18558.13>
      registered_name: mb_master
      exception exit: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
      in function gen_fsm:terminate/7
      ancestors: [ns_server_sup,ns_server_cluster_sup,<0.66.0>]
      messages: [send_heartbeat,send_heartbeat,send_heartbeat,send_heartbeat,
      {#Ref<0.0.372.79904>, ['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34', 'ns_1@10.2.3.35']}]
      links: [<0.385.0>,<0.18559.13>,<0.63.0>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 377
      stack_size: 24
      reductions: 147300
      neighbours:

      [error_logger:error,2012-11-06T12:43:35.307,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor: {local,ns_server_sup}
      Context: child_terminated
      Reason: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
      Offender: [{pid,<0.18558.13>},
      {name,mb_master},
      {mfargs,{mb_master,start_link,[]}},
      {restart_type,permanent}

      ,

      {shutdown,infinity}

      ,

      {child_type,supervisor}

      ]

      [ns_server:error,2012-11-06T12:43:35.323,ns_1@10.2.3.31:<0.788.0>:ns_memcached:verify_report_long_call:297]call

      {stats,<<>>}

      took too long: 10203000 us
      [couchdb:error,2012-11-06T12:43:41.588,ns_1@10.2.3.31:<0.24345.2>:couch_log:error:42]Uncaught error in HTTP request: {exit,
      {timeout,

      {gen_server,call,[ns_config,get]}

      }}

      Stacktrace: [

      {diag_handler,diagnosing_timeouts,1}

      ,

      {menelaus_auth,check_auth,1}

      ,

      {menelaus_auth,bucket_auth_fun,1}

      ,

      {menelaus_auth,is_bucket_accessible,2}

      ,

      {capi_frontend,do_db_req,2}

      ,

      {couch_httpd,handle_request,6}

      ,

      {mochiweb_http,headers,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]
      [error_logger:error,2012-11-06T12:43:41.604,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server disksup terminating

        • Last message in was timeout
        • When Server state == [{data,[{"OS",{win32,nt}},
          {"Timeout",60000}

          ,

          {"Threshold",80}

          ,

          Unknown macro: {"DiskData", [{"C:\\",52324348,51}, {"E:\\",268432380,14}]}

          ]}]

        • Reason for termination ==
        • Unknown macro: {timeout,{gen_server,call,[os_mon_sysinfo,get_disk_info]}}

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            bcui Bin Cui (Inactive)
            pavelpaulau Pavel Paulau (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty