Details

    • Technical task
    • Resolution: Duplicate
    • Blocker
    • 3.0
    • 2.0
    • couchbase-bucket
    • Security Level: Public
    • centos 6.2 64bit build 2.0.0-1931

    Description

      Cluster information:

      • 8 centos 6.2 64bit server with 4 cores CPU
      • Each server has 32 GB RAM and 400 GB SSD disk.
      • 24.8 GB RAM for couchbase server at each node
      • SSD disk format ext4 on /data
      • Each server has its own SSD drive, no disk sharing with other server.
      • Create cluster with 6 nodes installed couchbase server 2.0.0-1931
      • Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1931-rel.rpm.manifest.xml
      • Cluster has 2 buckets, default and saslbucket (12GB/each with 1 replica) and with 64 vbuckets setup.
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)

      10.6.2.37
      10.6.2.38
      10.6.2.44
      10.6.2.45
      10.6.2.42
      10.6.2.43

      • Load 20 million items to each bucket. Each key has size 1024 bytes
      • After done loading, wait until initial index.
      • After initial indexing done, mutate all items with size from 1024 to 1512 bytes.
      • Queries all 4 views from 2 docs
      • Add node 44 and rebalance. Passed
      • Add node 45 and rebalance. Passed.
      • Check auto failover is enable on cluster.
      • Turn on firewall on node 40
        iptables -A INPUT -p tcp -i eth0 --dport 1000:60000 -j REJECT
        iptables -A OUTPUT -p tcp -o eth0 --sport 1000:60000 -j REJECT
      • Node 40 was down as expected.
      • Auto failover kicked in after one minute.
      • Disable firewall on node 40. Cluster saw node 40 up.
      • Add node 40 back to cluster and rebalance. In few seconds, rebalance failed with error: "Failed to wait deletion of some buckets on some nodes." Filed bug MB-7110
      • Wait about 1 and half hour, rebalance again. Rebalance failed with error:" wait_checkpoint_persisted_failed"

      ns_server:info,2012-11-06T5:42:13.901,ns_1@10.6.2.37:janitor_agent-default<0.30140.0>:janitor_agent:handle_info:676]Undoing temporary vbucket states caused by rebalance
      [error_logger:error,2012-11-06T5:42:13.901,ns_1@10.6.2.37:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.11943.2727>
      registered_name: []
      exception exit: {unexpected_exit,
      {'EXIT',<0.12020.2727>,
      {{wait_checkpoint_persisted_failed,"default",50,3131,
      [{'ns_1@10.6.2.40',
      {'EXIT',
      {{badmatch,{error,timeout,
      [

      {mc_client_binary,cmd_binary_vocal_recv,5},
      {mc_client_binary,select_bucket,2},
      {ns_memcached,ensure_bucket,2},
      {ns_memcached,handle_info,2},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['ns_memcached-default',
      {wait_for_checkpoint_persistence,37,2959},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.6.2.40'},
      {if_rebalance,<0.32081.2694>,
      {wait_checkpoint_persisted,50,3131}},
      infinity]}}}}]},
      [{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}
      in function ns_single_vbucket_mover:spawn_and_wait/1
      in call from ns_single_vbucket_mover:mover_inner/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.32081.2694>,<0.18896.2646>]
      messages: [{'EXIT',<0.32081.2694>,
      {unexpected_exit,
      {'EXIT',<0.20985.2736>,
      {{wait_checkpoint_persisted_failed,"default",37,2959,
      [{'ns_1@10.6.2.40',
      {'EXIT',
      {{badmatch,{error,timeout,
      [{mc_client_binary,cmd_binary_vocal_recv,5}

      ,

      {mc_client_binary,select_bucket,2},
      {ns_memcached,ensure_bucket,2},
      {ns_memcached,handle_info,2},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['ns_memcached-default',
      {wait_for_checkpoint_persistence,37,2959},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.6.2.40'},
      {if_rebalance,<0.32081.2694>,
      {wait_checkpoint_persisted,37,2959}},
      infinity]}}}}]},
      [{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}]}}}}]
      links: [<0.32081.2694>,<0.17284.2744>]
      dictionary: [{cleanup_list,[<0.11946.2727>,<0.12020.2727>]}]
      trap_exit: true
      status: running
      heap_size: 6765
      stack_size: 24
      reductions: 12015
      neighbours:

      [user:info,2012-11-06T5:42:13.903,ns_1@10.6.2.37:<0.14641.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {unexpected_exit,
      {'EXIT',<0.20985.2736>,
      {{wait_checkpoint_persisted_failed,"default",
      37,2959,
      [{'ns_1@10.6.2.40',
      {'EXIT',
      {{badmatch,{error,timeout,
      [{mc_client_binary, cmd_binary_vocal_recv,5},
      {mc_client_binary,select_bucket,2}

      ,

      {ns_memcached,ensure_bucket,2},
      {ns_memcached,handle_info,2},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['ns_memcached-default',
      {wait_for_checkpoint_persistence,37, 2959},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@10.6.2.40'},
      {if_rebalance,<0.32081.2694>,
      {wait_checkpoint_persisted,37,2959}},
      infinity]}}}}]},
      [{ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}]}}}

      [error_logger:error,2012-11-06T5:42:13.902,ns_1@10.6.2.37:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.32081.2694> terminating
      ** Last message in was {'EXIT',<0.20927.2736>,
      {unexpected_exit,
      {'EXIT',<0.20985.2736>,
      {{wait_checkpoint_persisted_failed,"default",37,
      2959,
      [{'ns_1@10.6.2.40',
      {'EXIT',
      {{badmatch,{error,timeout,
      [{mc_client_binary,cmd_binary_vocal_recv,5},
      {mc_client_binary,select_bucket,2},
      {ns_memcached,ensure_bucket,2}

      ,

      {ns_memcached,handle_info,2}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      ['ns_memcached-default',

      {wait_for_checkpoint_persistence,37,2959}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.6.2.40'}

      ,
      {if_rebalance,<0.32081.2694>,
      {wait_checkpoint_persisted,37,2959}},
      infinity]}}}}]},
      [

      {ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}

      ]}}}}

        • When Server state == {state,"default",<0.32082.2694>,
          {dict,8,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]}

          ,
          {{[['ns_1@10.6.2.40'|8]],
          [],
          [['ns_1@10.6.2.42'|3]],
          [['ns_1@10.6.2.43'|3]],

      I will upload collect info later

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ketaki Ketaki Gangal (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty