Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6025

rebalance failed due to control connection to memcached disconnected

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Major
    • 2.0
    • 2.0-beta
    • ns_server
    • Security Level: Public
    • None
    • centos 6.2 64bit

    Description

      Install couchbase server 2.0.0-1492 on 12 nodes centos 6.2 64bit to test large and longevity cluster
      Load 72 million items to default bucket.
      Do rebalance in and out node cluster as following:
      Remove node 26 and 28
      Add back node 26 and 28, remove nodes 24 and 25 (swap rebalance)
      Reboot centos server on node 14
      Add node 24. Rebalance. After few minutes, stop rebalance, add node 25 and remove node 13. Then rebalance.

      Rebalance failed with error " Rebalance exited with reason {exited,
      {'EXIT',<0.2347.73>,

      {missing_checkpoint_stat,'ns_1@10.3.121.14', 0}

      }}"

      Check log, see

      2012-07-25 19:55:00.344 ns_orchestrator:4:info:message(ns_1@10.3.121.13) - Starting rebalance, KeepNodes = ['ns_1@10.3.121.14','ns_1@10.3.121.15',
      'ns_1@10.3.121.16','ns_1@10.3.121.17',
      'ns_1@10.3.121.20','ns_1@10.3.121.22',
      'ns_1@10.3.121.23','ns_1@10.3.121.24',
      'ns_1@10.3.121.26','ns_1@10.3.121.28',
      'ns_1@10.3.121.25'], EjectNodes = ['ns_1@10.3.121.13']

      2012-07-25 19:55:15.716 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "default"
      2012-07-25 19:55:15.974 ns_rebalancer:0:info:message(ns_1@10.3.121.13) - Started rebalancing bucket default
      2012-07-25 19:55:17.820 ns_memcached:1:info:message(ns_1@10.3.121.25) - Bucket "default" loaded on node 'ns_1@10.3.121.25' in 0 seconds.
      2012-07-25 19:59:17.102 mb_master:0:info:message(ns_1@10.3.121.15) - Haven't heard from a higher priority node or a master, so I'm taking over.
      2012-07-25 20:10:37.935 mb_master:0:info:message(ns_1@10.3.121.14) - Haven't heard from a higher priority node or a master, so I'm taking over.
      2012-07-25 20:11:26.353 menelaus_web:19:warning:server error during request processing(ns_1@10.3.121.14) - Server error during processing: ["web request failed",

      {path,"/pools/default/saslBucketsStreaming"}

      ,

      {type,exit},
      {what,
      {timeout,
      {gen_server,call,
      [timeout_diag_logger,
      {diag,
      {timeout,
      {gen_server,call,[ns_config,get]}}}]}}},
      {trace,
      [{gen_server,call,2},
      {diag_handler,diagnosing_timeouts,1},
      {menelaus_web_buckets, '-handle_sasl_buckets_streaming/2-fun-2-', 2},
      {menelaus_web,streaming_inner,3},
      {menelaus_web,handle_streaming,4},
      {menelaus_web,loop,3},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]}]
      2012-07-25 20:11:35.791 menelaus_web:19:warning:server error during request processing(ns_1@10.3.121.14) - Server error during processing: ["web request failed",
      {path,"/nodes/self"},
      {type,exit}

      ,
      {what,
      timeout,{gen_server,call,[ns_config,get],

      {gen_server,call, [ns_node_disco,nodes_wanted]}}},
      {trace,
      [{gen_server,call,2},
      {menelaus_web,handle_node,3},
      {menelaus_web,loop,3},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]}]
      2012-07-25 20:11:37.144 menelaus_web_alerts_srv:1:info:message(ns_1@10.3.121.14) - IP address seems to have changed. Unable to listen on 'ns_1@10.3.121.14'.
      2012-07-25 20:11:43.244 menelaus_web:19:warning:server error during request processing(ns_1@10.3.121.14) - Server error during processing: ["web request failed",
      {path,"/pools/default"},
      {type,exit},
      {what,
      {timeout,
      {gen_server,call, [ns_cookie_manager,cookie_get]}}},
      {trace,
      [{gen_server,call,2},
      {menelaus_web,build_nodes_info_fun,3},
      {menelaus_web,build_pool_info,4},
      {menelaus_web,handle_pool_info,2},
      {menelaus_web,check_and_handle_pool_info,2},
      {menelaus_web,loop,3},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]}]
      2012-07-25 20:11:51.869 ns_memcached:4:info:message(ns_1@10.3.121.14) - Control connection to memcached on 'ns_1@10.3.121.14' disconnected: {badmatch,
      {error,
      timeout}}
      2012-07-25 20:12:08.368 menelaus_web:19:warning:server error during request processing(ns_1@10.3.121.14) - Server error during processing: ["web request failed",
      {path,"/pools/default"},
      {type,exit},
      {what,
      {timeout,
      {gen_server,call, [ns_doctor,get_tasks_version]}}},
      {trace,
      [{gen_server,call,2},
      {menelaus_web,build_pool_info,4},
      {menelaus_web,handle_pool_info,2},
      {menelaus_web,check_and_handle_pool_info,2},
      {menelaus_web,loop,3},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]}]
      2012-07-25 20:12:15.321 ns_vbucket_mover:0:critical:message(ns_1@10.3.121.13) - <0.2346.73> exited with {exited,
      {'EXIT',<0.2347.73>,
      {missing_checkpoint_stat,'ns_1@10.3.121.14', 0}}}
      2012-07-25 20:12:20.547 ns_memcached:1:info:message(ns_1@10.3.121.14) - Bucket "default" loaded on node 'ns_1@10.3.121.14' in 11 seconds.
      2012-07-25 20:12:40.242 menelaus_web:19:warning:server error during request processing(ns_1@10.3.121.14) - Server error during processing: ["web request failed",
      {path,"/nodes/self"},
      {type,exit},
      {what,
      timeout,{gen_server,call,[ns_config,get],
      {gen_server,call, [ns_node_disco,nodes_wanted]}

      }},
      {trace,
      [

      {gen_server,call,2},
      {menelaus_web,handle_node,3},
      {menelaus_web,loop,3},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]}] (repeated 1 times)
      2012-07-25 20:12:40.242 menelaus_web:19:warning:server error during request processing(ns_1@10.3.121.14) - Server error during processing: ["web request failed",
      {path,"/pools/default"},
      {type,exit},
      {what,
      {timeout,
      {gen_server,call, [ns_doctor,get_tasks_version]}}},
      {trace,
      [{gen_server,call,2}

      ,

      {menelaus_web,build_pool_info,4}

      ,

      {menelaus_web,handle_pool_info,2}

      ,

      {menelaus_web,check_and_handle_pool_info,2}

      ,

      {menelaus_web,loop,3}

      ,

      {mochiweb_http,headers,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}] (repeated 1 times)
      2012-07-25 20:12:55.967 ns_orchestrator:2:info:message(ns_1@10.3.121.13) - Rebalance exited with reason {exited,
      {'EXIT',<0.2347.73>,

      {missing_checkpoint_stat,'ns_1@10.3.121.14', 0}

      }}

      Diags is stored in the following
      https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/12-nodes-reb-missing_checkpoint_stat.tgz

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            farshid Farshid Ghods (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty