Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7272 memcached/ep-engine crashes in flusher or other paths when it receives a shutdown message from ns-server
  3. MB-7263

Service memcached constantly exited on dest master node after certain steps in XDCR + rebalance scenarious: Port server memcached on node 'ns_1@10.3.121.63' exited with status 71. failed to listen on TCP port 11210: Address already in use

    Details

    • Type: Technical task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None

      Description

      build 1965

      steps: 2 clusters with 5 nodes in each, 10 buckets* 100K
      10.3.121.63* and 10.3.121.11* on
      1. swap rebalance 1 node on each cluster
      2. during rebalance on cluster http://10.3.121.112/ create replication of default bucket on http://10.3.121.63/ cluster

      result rebalanced stopped as in ticket
      MB-7212 Rebalance stopped by janitor when xdcr replication is created on source cluster (Resetting rebalance status since it's not really running)

      additional issue is that master node on destination cluster is 'down'
      there are a lot of errors:
      Port server memcached on node 'ns_1@10.3.121.63' exited with status 71. Restarting. Messages: Mon Nov 26 11:08:10.176462 PST 3: failed to listen on TCP port 11210: Address already in use ns_port_server000 ns_1@10.3.121.63 22:08:10 - Mon Nov 26, 2012
      Service memcached exited on node 'ns_1@10.3.121.63' in 0.08s
      supervisor_cushion001 ns_1@10.3.121.63 22:08:05 - Mon Nov 26, 2012
      Port server memcached on node 'ns_1@10.3.121.63' exited with status 71. Restarting. Messages: Mon Nov 26 11:08:05.090544 PST 3: failed to listen on TCP port 11210: Address already in use ns_port_server000 ns_1@10.3.121.63 22:08:05 - Mon Nov 26, 2012
      Service memcached exited on node 'ns_1@10.3.121.63' in 0.08s
      (repeated 5 times) supervisor_cushion001 ns_1@10.3.121.63 22:08:00 - Mon Nov 26, 2012

      [root@cen-2708 ~]# netstat -tulpn| grep 11210
      tcp 0 0 0.0.0.0:11210 0.0.0.0:* LISTEN 5273/memcached
      udp 0 0 0.0.0.0:11210 0.0.0.0:* 5273/memcached

      [couchdb:error,2012-11-26T8:39:29.816,ns_1@10.3.121.63:<0.22216.7>:couch_log:error:42]Uncaught error in HTTP request: {exit,
      {noproc,
      {gen_server,call,
      ['ns_memcached-default',

      {stats,<<>>},
      180000]}}}

      Stacktrace: [{gen_server,call,3},
      {ns_memcached,do_call,3},
      {capi_frontend,get_db_info,1},
      {couch_httpd_db,db_req,2},
      {couch_db_frontend,do_db_req,2},
      {couch_httpd,handle_request,6},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]
      [ns_server:debug,2012-11-26T8:39:30.905,ns_1@10.3.121.63:<0.22520.7>:supervisor_cushion:init:43]starting ns_port_server with delay of 5000
      [error_logger:error,2012-11-26T8:39:30.906,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.22485.7> terminating
      ** Last message in was {die,{abnormal,71}}
      ** When Server state == {state,memcached,5000,
      {1353,947965,773925},
      undefined,60000}
      ** Reason for termination ==
      ** {abnormal,71}

      [error_logger:error,2012-11-26T8:39:30.907,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: supervisor_cushion:init/1
      pid: <0.22485.7>
      registered_name: []
      exception exit: {abnormal,71}
      in function gen_server:terminate/6
      ancestors: [ns_port_sup,ns_server_sup,ns_server_cluster_sup,<0.59.0>]
      messages: []
      links: [<0.17826.4>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 1597
      stack_size: 24
      reductions: 2304
      neighbours:

      [error_logger:error,2012-11-26T8:39:30.907,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor: {local,ns_port_sup}
      Context: child_terminated
      Reason: {abnormal,71}
      Offender: [{pid,<0.22485.7>},
      {name,
      {memcached,"/opt/couchbase/bin/memcached",
      ["-X",
      "/opt/couchbase/lib/memcached/stdin_term_handler.so",
      "-X",
      "/opt/couchbase/lib/memcached/file_logger.so,cyclesize=104857600;sleeptime=19;filename=/opt/couchbase/var/lib/couchbase/logs/memcached.log",
      "-l","0.0.0.0:11210,0.0.0.0:11209:1000","-p",
      "11210","-E",
      "/opt/couchbase/lib/memcached/bucket_engine.so",
      "-B","binary","-r","-c","10000","-e",
      "admin=_admin;default_bucket_name=default;auto_create=false",
      []],
      [{env,
      [{"EVENT_NOSELECT","1"},
      {"MEMCACHED_TOP_KEYS","100"},
      {"ISASL_PWFILE", "/opt/couchbase/var/lib/couchbase/data/isasl.pw"},
      {"ISASL_DB_CHECK_TIME","1"}]},
      use_stdio,stderr_to_stdout,exit_status,
      port_server_send_eol,stream]}},
      {mfargs,
      {erlang,apply,
      [#Fun<ns_port_sup.3.119727222>,
      [memcached,"/opt/couchbase/bin/memcached",
      ["-X",
      "/opt/couchbase/lib/memcached/stdin_term_handler.so",
      "-X",
      "/opt/couchbase/lib/memcached/file_logger.so,cyclesize=104857600;sleeptime=19;filename=/opt/couchbase/var/lib/couchbase/logs/memcached.log",
      "-l","0.0.0.0:11210,0.0.0.0:11209:1000","-p",
      "11210","-E",
      "/opt/couchbase/lib/memcached/bucket_engine.so",
      "-B","binary","-r","-c","10000","-e",
      "admin=_admin;default_bucket_name=default;auto_create=false",
      []],
      [{env,
      [{"EVENT_NOSELECT","1"},
      {"MEMCACHED_TOP_KEYS","100"},
      {"ISASL_PWFILE", "/opt/couchbase/var/lib/couchbase/data/isasl.pw"},
      {"ISASL_DB_CHECK_TIME","1"}]},
      use_stdio,stderr_to_stdout,exit_status,
      port_server_send_eol,stream]]]}},
      {restart_type,permanent},
      {shutdown,86400000},
      {child_type,worker}]


      [error_logger:info,2012-11-26T8:39:30.908,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================PROGRESS REPORT=========================
      supervisor: {local,ns_port_sup}
      started: [{pid,<0.22520.7>},
      {name,
      {memcached,"/opt/couchbase/bin/memcached",
      ["-X",
      "/opt/couchbase/lib/memcached/stdin_term_handler.so",
      "-X",
      "/opt/couchbase/lib/memcached/file_logger.so,cyclesize=104857600;sleeptime=19;filename=/opt/couchbase/var/lib/couchbase/logs/memcached.log",
      "-l","0.0.0.0:11210,0.0.0.0:11209:1000","-p",
      "11210","-E",
      "/opt/couchbase/lib/memcached/bucket_engine.so",
      "-B","binary","-r","-c","10000","-e",
      "admin=_admin;default_bucket_name=default;auto_create=false",
      []],
      [{env,
      [{"EVENT_NOSELECT","1"},
      {"MEMCACHED_TOP_KEYS","100"},
      {"ISASL_PWFILE", "/opt/couchbase/var/lib/couchbase/data/isasl.pw"},
      {"ISASL_DB_CHECK_TIME","1"}]},
      use_stdio,stderr_to_stdout,exit_status,
      port_server_send_eol,stream]}},
      {mfargs,
      {erlang,apply,
      [#Fun<ns_port_sup.3.119727222>,
      [memcached,"/opt/couchbase/bin/memcached",
      ["-X",
      "/opt/couchbase/lib/memcached/stdin_term_handler.so",
      "-X",
      "/opt/couchbase/lib/memcached/file_logger.so,cyclesize=104857600;sleeptime=19;filename=/opt/couchbase/var/lib/couchbase/logs/memcached.log",
      "-l","0.0.0.0:11210,0.0.0.0:11209:1000",
      "-p","11210","-E",
      "/opt/couchbase/lib/memcached/bucket_engine.so",
      "-B","binary","-r","-c","10000","-e",
      "admin=_admin;default_bucket_name=default;auto_create=false",
      []],
      [{env,
      [{"EVENT_NOSELECT","1"},
      {"MEMCACHED_TOP_KEYS","100"},
      {"ISASL_PWFILE", "/opt/couchbase/var/lib/couchbase/data/isasl.pw"},
      {"ISASL_DB_CHECK_TIME","1"}]},
      use_stdio,stderr_to_stdout,exit_status,
      port_server_send_eol,stream]]]}},
      {restart_type,permanent},
      {shutdown,86400000},
      {child_type,worker}]

      [user:info,2012-11-26T8:39:30.978,ns_1@10.3.121.63:ns_port_memcached<0.22521.7>:ns_port_server:handle_info:107]Port server memcached on node 'ns_1@10.3.121.63' exited with status 71. Restarting. Messages: Mon Nov 26 08:39:30.973727 PST 3: failed to listen on TCP port 11210: Address already in use
      [ns_server:info,2012-11-26T8:39:30.978,ns_1@10.3.121.63:ns_port_memcached<0.22521.7>:ns_port_server:log:171]memcached<0.22521.7>: Mon Nov 26 08:39:30.973727 PST 3: failed to listen on TCP port 11210: Address already in use

      [ns_server:info,2012-11-26T8:39:30.978,ns_1@10.3.121.63:<0.22520.7>:supervisor_cushion:handle_info:62]Cushion managed supervisor for memcached failed: {abnormal,71}
      [user:warn,2012-11-26T8:39:30.979,ns_1@10.3.121.63:<0.22520.7>:supervisor_cushion:die_slowly:81]Service memcached exited on node 'ns_1@10.3.121.63' in 0.07s

      [ns_server:info,2012-11-26T8:39:30.979,ns_1@10.3.121.63:ns_log<0.17736.4>:ns_log:handle_cast:125]suppressing duplicate log supervisor_cushion:1("Service memcached exited on node 'ns_1@10.3.121.63' in 0.07s\n") because it's been seen 2 times in the past 10.265318 secs (last seen 5.132754 secs ago
      [error_logger:error,2012-11-26T8:39:30.979,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.22521.7> terminating
      ** Last message in was {#Port<0.384970>,{exit_status,71}}
      ** When Server state == {state,#Port<0.384970>,memcached,
      {["Mon Nov 26 08:39:30.973727 PST 3: failed to listen on TCP port 11210: Address already in use", empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty, empty,empty,empty,empty,empty,empty,empty], [empty]},
      {ok,{1353947971173861,#Ref<0.0.79.258928>}},
      ["Mon Nov 26 08:39:30.973727 PST 3: failed to listen on TCP port 11210: Address already in use"],
      0,true}
      ** Reason for termination ==
      ** {abnormal,71}

      [error_logger:error,2012-11-26T8:39:30.980,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_port_server:init/1
      pid: <0.22521.7>
      registered_name: ns_port_memcached
      exception exit: {abnormal,71}
      in function gen_server:terminate/6
      ancestors: [<0.22520.7>,ns_port_sup,ns_server_sup,ns_server_cluster_sup,
      <0.59.0>]
      messages: [{'EXIT',#Port<0.384970>,normal}]
      links: [<0.22520.7>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 2291
      neighbours:

      [stats:error,2012-11-26T8:39:31.858,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:31.878,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:31.900,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:31.903,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:31.950,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:31.951,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:31.983,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:31.986,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.002,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.002,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.017,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.023,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.034,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.040,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.051,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.059,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.070,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.084,ns_1@10.3.121.63:<0.16196.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.093,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [stats:error,2012-11-26T8:39:32.110,ns_1@10.3.121.63:<0.13202.7>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.121.63']
      [error_logger:error,2012-11-26T8:39:32.130,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor: {local,menelaus_sup}
      Context: child_terminated
      Reason: {noproc,
      {gen_server,call,
      [{'stats_reader-bucket1','ns_1@10.3.121.63'},
      {latest,minute,1}]}}
      Offender: [{pid,<0.22511.7>},
      {name,menelaus_web_alerts_srv},
      {mfargs,{menelaus_web_alerts_srv,start_link,[]}},
      {restart_type,permanent},
      {shutdown,5000},
      {child_type,worker}]


      [error_logger:info,2012-11-26T8:39:32.131,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================PROGRESS REPORT=========================
      supervisor: {local,menelaus_sup}
      started: [{pid,<0.22543.7>},
      {name,menelaus_web_alerts_srv},
      {mfargs,{menelaus_web_alerts_srv,start_link,[]}},
      {restart_type,permanent},
      {shutdown,5000},
      {child_type,worker}]

      [couchdb:error,2012-11-26T8:39:32.294,ns_1@10.3.121.63:<0.22225.7>:couch_log:error:42]Uncaught error in HTTP request: {exit,
      {noproc,
      {gen_server,call,
      ['ns_memcached-default',
      {stats,<<>>}

      ,
      180000]}}}

      Stacktrace: [

      {gen_server,call,3},
      {ns_memcached,do_call,3},
      {capi_frontend,get_db_info,1},
      {couch_httpd_db,db_req,2},
      {couch_db_frontend,do_db_req,2},
      {couch_httpd,handle_request,6},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]
      [couchdb:error,2012-11-26T8:39:32.314,ns_1@10.3.121.63:<0.22226.7>:couch_log:error:42]Uncaught error in HTTP request: {exit,
      {noproc,
      {gen_server,call,
      ['ns_memcached-default',
      {stats,<<>>},
      180000]}}}

      Stacktrace: [{gen_server,call,3}

      ,

      {ns_memcached,do_call,3}

      ,

      {capi_frontend,get_db_info,1}

      ,

      {couch_httpd_db,db_req,2}

      ,

      {couch_db_frontend,do_db_req,2}

      ,

      {couch_httpd,handle_request,6}

      ,

      {mochiweb_http,headers,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]
      [ns_server:debug,2012-11-26T8:39:32.460,ns_1@10.3.121.63:compaction_daemon<0.17942.4>:compaction_daemon:handle_info:269]Starting compaction for the following buckets:
      [<<"standard_bucket4">>,<<"standard_bucket3">>,<<"standard_bucket2">>,
      <<"standard_bucket1">>,<<"standard_bucket0">>,<<"bucket3">>,<<"bucket2">>,
      <<"bucket1">>,<<"bucket0">>,<<"default">>]
      [ns_server:info,2012-11-26T8:39:32.468,ns_1@10.3.121.63:<0.22554.7>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `standard_bucket4`
      [ns_server:info,2012-11-26T8:39:32.469,ns_1@10.3.121.63:<0.22554.7>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket standard_bucket4 with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:debug,2012-11-26T8:39:32.474,ns_1@10.3.121.63:<0.22557.7>:compaction_daemon:file_needs_compaction:831]Estimated size for `standard_bucket4`: data 4358861, file 5087769
      [ns_server:debug,2012-11-26T8:39:32.474,ns_1@10.3.121.63:<0.22557.7>:compaction_daemon:file_needs_compaction:836]Estimated file size for `standard_bucket4` is less than min_file_size 53870592; skipping
      [ns_server:info,2012-11-26T8:39:32.484,ns_1@10.3.121.63:<0.22558.7>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `standard_bucket3`
      [ns_server:info,2012-11-26T8:39:32.485,ns_1@10.3.121.63:<0.22558.7>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket standard_bucket3 with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:debug,2012-11-26T8:39:32.490,ns_1@10.3.121.63:<0.22561.7>:compaction_daemon:file_needs_compaction:831]Estimated size for `standard_bucket3`: data 4358835, file 48752409
      [ns_server:debug,2012-11-26T8:39:32.490,ns_1@10.3.121.63:<0.22561.7>:compaction_daemon:file_needs_compaction:836]Estimated file size for `standard_bucket3` is less than min_file_size 53870592; skipping
      [ns_server:info,2012-11-26T8:39:32.500,ns_1@10.3.121.63:<0.22562.7>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `standard_bucket2`
      [ns_server:info,2012-11-26T8:39:32.501,ns_1@10.3.121.63:<0.22562.7>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket standard_bucket2 with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:debug,2012-11-26T8:39:32.508,ns_1@10.3.121.63:<0.22565.7>:compaction_daemon:file_needs_compaction:831]Estimated size for `standard_bucket2`: data 4377561, file 49594137
      [ns_server:debug,2012-11-26T8:39:32.508,ns_1@10.3.121.63:<0.22565.7>:compaction_daemon:file_needs_compaction:836]Estimated file size for `standard_bucket2` is less than min_file_size 53870592; skipping
      [ns_server:info,2012-11-26T8:39:32.517,ns_1@10.3.121.63:<0.22566.7>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `standard_bucket1`
      [ns_server:info,2012-11-26T8:39:32.517,ns_1@10.3.121.63:<0.22566.7>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket standard_bucket1 with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:debug,2012-11-26T8:39:32.523,ns_1@10.3.121.63:<0.22569.7>:compaction_daemon:file_needs_compaction:831]Estimated size for `standard_bucket1`: data 4474557, file 5087769
      [ns_server:debug,2012-11-26T8:39:32.523,ns_1@10.3.121.63:<0.22569.7>:compaction_daemon:file_needs_compaction:836]Estimated file size for `standard_bucket1` is less than min_file_size 53870592; skipping
      [ns_server:info,2012-11-26T8:39:32.538,ns_1@10.3.121.63:<0.22570.7>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `standard_bucket0`
      [ns_server:info,2012-11-26T8:39:32.539,ns_1@10.3.121.63:<0.22570.7>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket standard_bucket0 with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:error,2012-11-26T8:39:32.540,ns_1@10.3.121.63:compaction_daemon<0.17942.4>:compaction_daemon:handle_info:287]Compactor <0.22570.7> exited unexpectedly: {noproc,
      {gen_server,call,
      ['capi_set_view_manager-standard_bucket0',

      {foreach_doc, #Fun<capi_ddoc_replication_srv.1.36030090>},
      infinity]}}. Moving to the next bucket.
      [error_logger:error,2012-11-26T8:39:32.540,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: compaction_daemon:spawn_bucket_compactor/3-fun-2/0
      pid: <0.22570.7>
      registered_name: []
      exception exit: {noproc,
      {gen_server,call,
      ['capi_set_view_manager-standard_bucket0',
      {foreach_doc, #Fun<capi_ddoc_replication_srv.1.36030090>},
      infinity]}}
      in function gen_server:call/3
      in call from capi_ddoc_replication_srv:foreach_live_ddoc_id/2
      in call from capi_ddoc_replication_srv:fetch_ddoc_ids/1
      in call from compaction_daemon:'spawn_bucket_compactor/3-fun-2'/4
      ancestors: [compaction_daemon,<0.17941.4>,ns_server_sup,
      ns_server_cluster_sup,<0.59.0>]
      messages: []
      links: [<0.17942.4>]
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 28657
      stack_size: 24
      reductions: 24648
      neighbours:

      [ns_server:info,2012-11-26T8:39:32.548,ns_1@10.3.121.63:<0.22571.7>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `bucket3`
      [ns_server:info,2012-11-26T8:39:32.548,ns_1@10.3.121.63:<0.22571.7>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket bucket3 with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:error,2012-11-26T8:39:32.549,ns_1@10.3.121.63:compaction_daemon<0.17942.4>:compaction_daemon:handle_info:287]Compactor <0.22571.7> exited unexpectedly: {noproc,
      {gen_server,call,
      ['capi_set_view_manager-bucket3',
      {foreach_doc, #Fun<capi_ddoc_replication_srv.1.36030090>}

      ,
      infinity]}}. Moving to the next bucket.
      [error_logger:error,2012-11-26T8:39:32.549,ns_1@10.3.121.63:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: compaction_daemon:spawn_bucket_compactor/3-fun-2/0
      pid: <0.22571.7>
      registered_name: []
      exception exit: {noproc,
      {gen_server,call,
      ['capi_set_view_manager-bucket3',

      {foreach_doc, #Fun<capi_ddoc_replication_srv.1.36030090>}

      ,
      infinity]}}
      in function gen_server:call/3
      in call from capi_ddoc_replication_srv:foreach_live_ddoc_id/2
      in call from capi_ddoc_replication_srv:fetch_ddoc_ids/1
      in call from compaction_daemon:'spawn_bucket_compactor/3-fun-2'/4
      ancestors: [compaction_daemon,<0.17941.4>,ns_server_sup,
      ns_server_cluster_sup,<0.59.0>]
      messages: []
      links: [<0.17942.4>]
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 28657
      stack_size: 24
      reductions: 24423
      neighbours:

      Chiyoung, could you look at the issue, 11210 is used by memcached, and it is strange that we have such errors

      1. 10.3.121.63-8091-diag.txt.gz
        12.31 MB
        Andrei Baranouski
      2. 10.3.121.64-8091-diag.txt.gz
        8.19 MB
        Andrei Baranouski
      3. 10.3.121.66-8091-diag.txt.gz
        7.82 MB
        Andrei Baranouski
      4. 10.3.121.67-8091-diag.txt.gz
        8.03 MB
        Andrei Baranouski
      5. 10.3.121.68-8091-diag.txt.gz
        8.18 MB
        Andrei Baranouski
      6. 10.3.121.69-8091-diag.txt.gz
        6.50 MB
        Andrei Baranouski
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        jin Jin Lim (Inactive) added a comment - - edited

        It sounds like all three bugs, MB-7272, MB-7110, and MB-7263, intermix when they are actually talking about two issues here:

        1) rebalance failure due to ep engine error (crash) during the shutdown. This can be caused by the ep engine warmup thread still running while disregarding the pending request of shutdown (delete).

        2) rebalance failure due to the TCP port conflict (memcached exit with 71). This can be caused by a new memcached instance trying to bind a port that has been already bound to an existing memcached instance that is still running.

        I suggest that we combine MB-7272 & MB-7110 and use them to solely track the issue (1) and close it when the fix for the ep engine crash is merged. And, use MB-7263 to track the issue (2).

        Show
        jin Jin Lim (Inactive) added a comment - - edited It sounds like all three bugs, MB-7272 , MB-7110 , and MB-7263 , intermix when they are actually talking about two issues here: 1) rebalance failure due to ep engine error (crash) during the shutdown. This can be caused by the ep engine warmup thread still running while disregarding the pending request of shutdown (delete). 2) rebalance failure due to the TCP port conflict (memcached exit with 71). This can be caused by a new memcached instance trying to bind a port that has been already bound to an existing memcached instance that is still running. I suggest that we combine MB-7272 & MB-7110 and use them to solely track the issue (1) and close it when the fix for the ep engine crash is merged. And, use MB-7263 to track the issue (2).
        Hide
        jin Jin Lim (Inactive) added a comment -

        Alek and I took a look at the live system (10.3.121.112) where Andrei had reproduced this issue (TCP port conflict). Per Alek re-assign it to ns_server team for further investigation. Thanks.

        Show
        jin Jin Lim (Inactive) added a comment - Alek and I took a look at the live system (10.3.121.112) where Andrei had reproduced this issue (TCP port conflict). Per Alek re-assign it to ns_server team for further investigation. Thanks.
        Hide
        jin Jin Lim (Inactive) added a comment -

        Alek, this is the bug we took a look at this afternoon.

        NS_SERVER timed out from terminating a process (appears to be memcached) after 60000ms:
        [ns_server:warn,2012-12-03T2:56:06.019,ns_1@10.3.121.112:<0.432.0>:supervisor_cushion:terminate:97]Cushioned process <0.433.0> failed to terminate within 60000ms. Killing it brutally.

        Then later on ran into the TCP port conflict while trying to restart memcached:
        [user:info,2012-12-03T2:56:13.335,ns_1@10.3.121.112:ns_port_memcached<0.11105.3>:ns_port_server:handle_info:107]Port server memcached on node 'ns_1@10.3.121.112' exited with status 71. Restarting. Messages: Mon Dec 3 02:56:13.244378 PST 3: failed to listen on TCP port 11210: Address already in use

        Show
        jin Jin Lim (Inactive) added a comment - Alek, this is the bug we took a look at this afternoon. NS_SERVER timed out from terminating a process (appears to be memcached) after 60000ms: [ns_server:warn,2012-12-03T2:56:06.019,ns_1@10.3.121.112:<0.432.0>:supervisor_cushion:terminate:97] Cushioned process <0.433.0> failed to terminate within 60000ms. Killing it brutally. Then later on ran into the TCP port conflict while trying to restart memcached: [user:info,2012-12-03T2:56:13.335,ns_1@10.3.121.112:ns_port_memcached<0.11105.3>:ns_port_server:handle_info:107] Port server memcached on node 'ns_1@10.3.121.112' exited with status 71. Restarting. Messages: Mon Dec 3 02:56:13.244378 PST 3: failed to listen on TCP port 11210: Address already in use
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        We've found issue behind ns_server allowing memcached to become 'orphan' (i.e. ns_server losing track of it's child memcached) and we're going to have separate ticket for that.

        This particular ticket is just duplicate of numerous timeout-related bugs we have. In this case max_restart_intenstity_reached was hit and caused restart of everything including memcached.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - We've found issue behind ns_server allowing memcached to become 'orphan' (i.e. ns_server losing track of it's child memcached) and we're going to have separate ticket for that. This particular ticket is just duplicate of numerous timeout-related bugs we have. In this case max_restart_intenstity_reached was hit and caused restart of everything including memcached.
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        build 1974, memcached works fine but rebalance failed. new bug for it MB-7365 Resetting rebalance status since it's not really running when add XDCR replication

        Show
        andreibaranouski Andrei Baranouski added a comment - build 1974, memcached works fine but rebalance failed. new bug for it MB-7365 Resetting rebalance status since it's not really running when add XDCR replication

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes