Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5394

Rebalance fails with "{missing_checkpoint_stat,'ns_1@10.3.2.41', 244}}}" " on large Cluster in dgm

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 1.8.1
    • 1.8.1
    • ns_server
    • Security Level: Public
    • None
    • 15 Node Ubuntu cluster
      1 Bucket, 1024 vBuckets
      Build - 181-842

    Description

      Setup
      1. Create a 13 node cluster with 1 bucket, 1024 vBuckets
      2. Load 51M items on the cluster [256 bytes - 512 bytes]
      3. Enable auto-failover
      4. Mutate existing items and create new items [200 -612 bytes] to create around 60M items.
      5. Each node has high swap usage(20%) [refer bug - MB-5392]
      6. Add 2 nodes (10.3.2.8, 10.3.2.9) and issue rebalance

      Output
      1. Rebalance fails with error " missing_checkpoint_stats"

      Stats/Resources/Screenshots
      1.Attached are the memory stats at https://s3.amazonaws.com/bugdb/jira/MB-rebalanceFail/05-29-rebal.tar
      2.Attaching the screenshot from the cluster
      3. Cluster can be accessed at http://10.3.2.8:8091/index.html#sec=overview

      Some errors that could be related to rebalance failure

      delete_vbucket and stats call taking too long.

      [ns_server:error] [2012-05-29 10:58:22] [ns_1@10.3.2.42:ns_doctor:ns_doctor:update_status:154] The following buckets became not ready on node 'ns_1@10.3.2.42': ["default"], those of them are active ["default"]
      [ns_server:error] [2012-05-29 10:58:31] [ns_1@10.3.2.42:'ns_memcached-default':ns_memcached:handle_call:135] call

      {delete_vbucket,874} took too long: 18455998 us
      [ns_server:error] [2012-05-29 11:09:34] [ns_1@10.3.2.42:'ns_memcached-default':ns_memcached:handle_call:135] call {stats,<<>>} took too long: 577806 us
      [ns_server:error] [2012-05-29 11:13:19] [ns_1@10.3.2.42:'ns_memcached-default':ns_memcached:handle_info:277] handle_info(ensure_bucket,..) took too long: 864638 us


      ========================CRASH REPORT=========================
      crasher:
      initial call: ns_janitor:cleanup/2
      pid: <0.21807.93>
      registered_name: []
      exception exit: {timeout,
      {gen_server,call,
      [{'ns_memcached-default','ns_1@10.3.2.42'},
      {delete_vbucket,874}

      ,
      30000]}}
      in function gen_server:call/3
      in call from ns_memcached:do_call/3
      in call from lists:foreach/2
      in call from ns_janitor:do_sanify_chain/6
      in call from ns_janitor:sanify_chain/6
      in call from ns_janitor:'sanify/5-lc$^1/1-1'/5
      in call from ns_janitor:'sanify/5-lc$^1/1-1'/5
      in call from ns_janitor:do_cleanup/3
      ancestors: [<0.217.0>,mb_master_sup,mb_master,ns_server_sup,
      ns_server_cluster_sup,<0.60.0>]
      messages: []
      links: [<0.217.0>]
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 75025
      stack_size: 24
      reductions: 1835901
      neighbours:

      [error_logger:error] [2012-05-29 10:58:07] [ns_1@10.3.2.42:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server 'ns_memcached-default' terminating

        • Last message in was {delete_vbucket,874}
        • When Server state ==
          Unknown macro: {state,{interval,#Ref<0.0.82.161858>}, connected, {1338,314257,216912}, "default",#Port<0.226981>}

        • Reason for termination ==
        • badmatch,{error,timeout,
          [ {mc_client_binary,cmd_binary_vocal_recv,5}

          ,

          {mc_client_binary,delete_vbucket,2}

          ,

          {ns_memcached,do_handle_call,3}

          ,

          {ns_memcached,handle_call,3}

          ,

          {gen_server,handle_msg,5}

          ,

          {proc_lib,init_p_do_apply,3}

          ]}

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            ketaki Ketaki Gangal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty