Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59829

[Rebalance] : Rebalance fails with reason {pre_rebalance_janitor_run_failed,"bucket1",{error,wait_for_memcached_failed

    XMLWordPrintable

Details

    Description

      Steps to repro

      1. Created a 4 node kv cluster
      2. Created 10 buckets with different configurations
      3. Created 5 scopes per bucket and 20 collections per scope
      4. Loaded data onto each collection (Around 4000 docs onto each collection)
      5. Multiple operations were performed
        1. Add node
        2. Remove node
        3. Failover 
        4. Failover and recovery
        5. Shuffling nodes between groups
        6. Editing bucket properties
        7. Stop rebalance and restart (Rebalance failed once at Timestamp 2023-11-26T00:10:10.905-08:00 reported in MB-59828)
      6. A failure was induced in the latest rebalance by stopping couchbase server in one of the nodes (Rebalance at timestamp 2023-11-26T12:27:20.999-08:00)
      7. The failure was reverted by starting couchbase server
      8. Rebalance was retried multiple times and it fails

      Rebalance fails

      2023-11-26T12:28:11.541-08:00, ns_orchestrator:0:info:message(ns_1@172.23.104.66) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.66','ns_1@172.23.105.179',                                 'ns_1@172.23.105.192','ns_1@172.23.121.71',                                 'ns_1@172.23.96.168','ns_1@172.23.96.196',                                 'ns_1@172.23.96.220','ns_1@172.23.96.221',                                 'ns_1@172.23.97.78'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 1122b687523e734d64a07288f16a24f92023-11-26T12:28:21.627-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.66) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket1",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.97.78']}}.Rebalance Operation Id = 1122b687523e734d64a07288f16a24f92023-11-26T12:28:46.228-08:00, ns_memcached:0:info:message(ns_1@172.23.97.78) - Bucket "bucket8" loaded on node 'ns_1@172.23.97.78' in 1 seconds.2023-11-26T12:28:46.228-08:00, ns_memcached:0:info:message(ns_1@172.23.97.78) - Bucket "bucket10" loaded on node 'ns_1@172.23.97.78' in 40 seconds.2023-11-26T12:28:46.228-08:00, ns_memcached:0:info:message(ns_1@172.23.97.78) - Bucket "bucket9" loaded on node 'ns_1@172.23.97.78' in 1 seconds.2023-11-26T12:29:00.953-08:00, ns_orchestrator:0:info:message(ns_1@172.23.104.66) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.66','ns_1@172.23.105.179',                                 'ns_1@172.23.105.192','ns_1@172.23.121.71',                                 'ns_1@172.23.96.168','ns_1@172.23.96.196',                                 'ns_1@172.23.96.220','ns_1@172.23.96.221',                                 'ns_1@172.23.97.78'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = fdca67823a6a9fcf58980cb30da671932023-11-26T12:29:11.063-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.66) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket1",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.97.78']}}.Rebalance Operation Id = fdca67823a6a9fcf58980cb30da671932023-11-26T12:29:15.809-08:00, ns_orchestrator:0:info:message(ns_1@172.23.104.66) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.66','ns_1@172.23.105.179',                                 'ns_1@172.23.105.192','ns_1@172.23.121.71',                                 'ns_1@172.23.96.168','ns_1@172.23.96.196',                                 'ns_1@172.23.96.220','ns_1@172.23.96.221',                                 'ns_1@172.23.97.78'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = bdb1583cf32b0d8b3bb0d171033cca932023-11-26T12:29:25.973-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.66) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket1",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.97.78']}}.Rebalance Operation Id = bdb1583cf32b0d8b3bb0d171033cca93 

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            raghav.sk Raghav S K
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty