Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11501

[System tests with DGM]Rebalance in exited with reason wait_seqno_persisted_failed(segmentation fault)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • 3.0.0-848
    • Untriaged
    • Centos 64-bit
    • Unknown

    Description

      system test info before rebalance:
      4 buckets:
      AbRegNum: 500MB ram quota, ~ 20 resident ratio
      RevAB : 4500MB ram quota, ~70 resident ratio
      MsgsCalls: 300MB ram quota, ~70 resident ratio
      UserINno: 300MB ram quota, ~100 resident ratio

      3 nodes in the cluster:
      172.23.105.22, 172.23.105.157, 172.23.105.158
      UniXDCR replication with other cluster: 172.23.105.159

      Starting rebalance, KeepNodes = ['ns_1@172.23.105.22','ns_1@172.23.105.157',
      'ns_1@172.23.105.158','ns_1@172.23.105.156',
      'ns_1@172.23.105.160'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes

      rebalance stuck for a long time with progress ~1% then failed with wait_seqno_persisted
      don't see any crashes on vms

      Rebalance exited with reason {unexpected_exit,
      {'EXIT',<0.13953.427>,
      {wait_seqno_persisted_failed,"RevAB",849,
      17733,
      [{'ns_1@172.23.105.157',
      {'EXIT',
      badmatch,{error,closed,
      {gen_server,call,
      [

      {'janitor_agent-RevAB', 'ns_1@172.23.105.157'},
      {if_rebalance,<0.9603.440>,
      {wait_seqno_persisted,849,17733}},
      infinity]}}}}]}}}
      ns_orchestrator002 ns_1@172.23.105.158 12:07:10 - Sat Jun 21, 2014
      <0.9744.443> exited with {unexpected_exit,
      {'EXIT',<0.13953.427>,
      {wait_seqno_persisted_failed,"RevAB",849,17733,
      [{'ns_1@172.23.105.157',
      {'EXIT',
      badmatch,{error,closed,
      {gen_server,call,
      [{'janitor_agent-RevAB', 'ns_1@172.23.105.157'}

      ,
      {if_rebalance,<0.9603.440>,
      {wait_seqno_persisted,849,17733}},
      infinity]}}}}]}}} ns_vbucket_mover000 ns_1@172.23.105.158 12:07:10 - Sat Jun 21, 2014
      Bucket "AbRegNums" loaded on node 'ns_1@172.23.105.157' in 27 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:58 - Sat Jun 21, 2014
      Bucket "MsgsCalls" loaded on node 'ns_1@172.23.105.157' in 3 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:35 - Sat Jun 21, 2014
      Bucket "UserInfo" loaded on node 'ns_1@172.23.105.157' in 28 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:31 - Sat Jun 21, 2014
      Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary,
      stats_recv,
      4,
      [

      {file, "src/mc_client_binary.erl"}, {line, 163}]},
      {mc_client_binary,
      stats,
      4,
      [{file, "src/mc_client_binary.erl"}

      ,

      {line, 411}]},
      {ns_memcached,
      handle_info,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 725}]},
      {gen_server,
      handle_msg,
      5,
      [{file, "gen_server.erl"}, {line, 604}]},
      {ns_memcached,
      init,
      1,
      [{file, "src/ns_memcached.erl"}, {line, 170}]},
      {gen_server,
      init_it,
      6,
      [{file, "gen_server.erl"}, {line, 304}]},
      {proc_lib,
      init_p_do_apply,
      3,
      [{file, "proc_lib.erl"}, {line, 239}]}]} (repeated 2 times) ns_memcached000 ns_1@172.23.105.157 12:06:13 - Sat Jun 21, 2014
      Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary,
      cmd_vocal_recv,
      5,
      [{file, "src/mc_client_binary.erl"}, {line, 149}]},
      {mc_client_binary,
      select_bucket,
      2,
      [{file, "src/mc_client_binary.erl"}, {line, 344}]},
      {ns_memcached,
      ensure_bucket,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 1280}]},
      {ns_memcached,
      handle_info,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 750}]},
      {gen_server,
      handle_msg,
      5,
      [{file, "gen_server.erl"}, {line, 604}]},
      {ns_memcached,
      init,
      1,
      [{file, "src/ns_memcached.erl"},{line, 170}]},
      {gen_server,
      init_it,
      6,
      [{file, "gen_server.erl"}, {line, 304}]},
      {proc_lib,
      init_p_do_apply,
      3,
      [{file, "proc_lib.erl"}, {line, 239}]}]} (repeated 3 times) ns_memcached000 ns_1@172.23.105.157 12:06:13 - Sat Jun 21, 2014
      Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {badmatch,
      {error,
      closed}} ns_memcached000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
      Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary,
      stats_recv,
      4,
      [{file, "src/mc_client_binary.erl"}, {line, 163}]},
      {mc_client_binary,
      stats,
      4,
      [{file, "src/mc_client_binary.erl"},{line, 411}

      ]},
      {ns_memcached,
      handle_info,
      2,
      [

      {file, "src/ns_memcached.erl"}, {line, 725}]},
      {gen_server,
      handle_msg,
      5,
      [{file, "gen_server.erl"}, {line, 604}]},
      {ns_memcached,
      init,
      1,
      [{file, "src/ns_memcached.erl"}

      ,

      {line, 170}]},
      {gen_server,
      init_it,
      6,
      [{file, "gen_server.erl"}, {line, 304}]},
      {proc_lib,
      init_p_do_apply,
      3,
      [{file, "proc_lib.erl"}, {line, 239}]}]} ns_memcached000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
      Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 139. Restarting. Messages: Sat Jun 21 12:05:50.696179 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 363) Stream closing, 0 items sent from disk, 0 items sent from memory, 894 was last seqno sent
      Sat Jun 21 12:05:50.696196 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 363) stream created with start seqno 894 and end seqno 894
      Sat Jun 21 12:05:50.698921 PDT 3: (AbRegNums) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:AbRegNums - (vb 363) stream created with start seqno 894 and end seqno 0
      Sat Jun 21 12:05:50.699524 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 424) Stream closing, 0 items sent from disk, 0 items sent from memory, 920 was last seqno sent
      Sat Jun 21 12:05:50.699544 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 424) stream created with start seqno 920 and end seqno 920 ns_log000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
      Bucket "AbRegNums" loaded on node 'ns_1@172.23.105.157' in 36 seconds. ns_memcached000 ns_1@172.23.105.157 12:05:46 - Sat Jun 21, 2014
      Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 139. Restarting. Messages: Sat Jun 21 12:04:53.361372 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 162) stream created with start seqno 17580 and end seqno 0
      Sat Jun 21 12:04:53.371897 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 381) stream created with start seqno 17966 and end seqno 0
      Sat Jun 21 12:04:53.401541 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 98) stream created with start seqno 17760 and end seqno 0
      Sat Jun 21 12:04:53.454580 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 166) stream created with start seqno 17704 and end seqno 0
      Sat Jun 21 12:04:53.743529 PDT 3: (RevAB) Notified the timeout on checkpoint persistence for vbucket 921, cookie 0x663d500 ns_log000 ns_1@172.23.105.157 12:05:05 - Sat Jun 21, 2014
      Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary,
      cmd_vocal_recv,
      5,
      [{file, "src/mc_client_binary.erl"}, {line, 149}]},
      {mc_client_binary,
      select_bucket,
      2,
      [{file, "src/mc_client_binary.erl"}, {line, 344}]},
      {ns_memcached,
      ensure_bucket,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 1280}]},
      {ns_memcached,
      handle_info,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 750}]},
      {gen_server,
      handle_msg,
      5,
      [{file, "gen_server.erl"}, {line, 604}]},
      {ns_memcached,
      init,
      1,
      [{file, "src/ns_memcached.erl"},{line, 170}

      ]},
      {gen_server,
      init_it,
      6,
      [

      {file, "gen_server.erl"}

      ,

      {line, 304}

      ]},
      {proc_lib,
      init_p_do_apply,
      3,
      [

      {file, "proc_lib.erl"}

      ,

      {line, 239}

      ]}]} ns_memcached000 ns_1@172.23.105.157 12:05:05 - Sat Jun 21, 2014
      Bucket "RevAB" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.158 11:38:27 - Sat Jun 21, 2014
      Started rebalancing bucket RevAB ns_rebalancer000 ns_1@172.23.105.158 11:38:24 - Sat Jun 21, 2014
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.22','ns_1@172.23.105.157',
      'ns_1@172.23.105.158','ns_1@172.23.105.156',
      'ns_1@172.23.105.160'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
      ns_orchestrator004 ns_1@172.23.105.158 11:38:23 - Sat Jun 21, 2014

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty