Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7590

[system test] rebalance failed due to buckets shutdown in orchestrator node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 2.0.1
    • 2.0.1
    • couchbase-bucket
    • Security Level: Public
    • windows R2 2008 64bit

    Description

      Environment:

      • 9 windows 2008 R2 64bit.
      • Each server has 4 CPU, 8GB RAM and SSD disk
      • Cluster has 2 buckets, default and sasl bucket with consistent view enable.
      • Load 26 million items to default bucket and 16 million items to sasl bucket. Each key has size from 128 to 512 bytes
      • Each bucket has one doc and 2 views for each doc.
      • Rebalance out 2 nodes 10.3.121.173 and 10.3.121.243

      Starting rebalance, KeepNodes = ['ns_1@10.3.3.181','ns_1@10.3.121.47',
      'ns_1@10.3.3.214','ns_1@10.3.3.182',
      'ns_1@10.3.3.180','ns_1@10.3.121.171',
      'ns_1@10.3.121.169'], EjectNodes = ['ns_1@10.3.121.173',
      'ns_1@10.3.121.243'] ns_orchestrator004 ns_1@10.3.121.169 23:26:03 - Tue Jan 22, 2013

      • Rebalance failed due to buckets were shutting down on orchestrator node.

      ns_server:debug,2013-01-23T8:29:27.672,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
      rebalance_status ->

      {none,<<"Rebalance stopped by janitor.">>}

      [user:info,2013-01-23T8:29:26.219,ns_1@10.3.121.169:ns_memcached-default<0.968.1>:ns_memcached:terminate:661]Shutting down bucket "default" on 'ns_1@10.3.121.169' for server shutdown
      [ns_server:error,2013-01-23T8:29:26.219,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
      {<0.12009.70>,
      [

      {registered_name,[]}

      ,

      {status,waiting}

      ,
      {initial_call,{proc_lib,init_p,5}},

      {backtrace,[<<"Program counter: 0x04e7e1c8 (couch_file:reader_loop/3 + 116)">>, <<"CP: 0x00000000 (invalid)">>,<<"arity = 0">>,<<>>, <<"0x126e4ce4 Return addr 0x017a2da8 (proc_lib:init_p_do_apply/3 + 28)">>, <<"y(0) 10">>,<<"y(1) \"c:/data/sasl/109.couch.14\"">>, <<"y(2) []">>,<<>>, <<"0x126e4cf4 Return addr 0x00b409b4 (<terminate process normally>)">>, <<"y(0) Catch 0x017a2db8 (proc_lib:init_p_do_apply/3 + 44)">>, <<>>]}

      ,

      {error_handler,error_handler}

      ,
      {garbage_collection,[

      {min_bin_vheap_size,46368}

      ,

      {min_heap_size,233}

      ,

      {fullsweep_after,512}

      ,

      {minor_gcs,403}

      ]},

      {heap_size,377}

      ,

      {total_heap_size,754}

      ,

      {links,[<0.12008.70>]}

      ,

      {memory,3496}

      ,

      {message_queue_len,0}

      ,

      {reductions,216588}

      ,

      {trap_exit,true}

      ]}

      [ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:<0.835.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription

      {buckets_events,<0.833.0>}

      exited with reason {shutdown,
      {gen_server,
      call,
      ['ns_vbm_new_sup-sasl',
      which_children,
      infinity]}}
      [ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
      rebalancer_pid ->
      undefined
      [ns_server:debug,2013-01-23T8:29:27.329,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [user:info,2013-01-23T8:29:27.329,ns_1@10.3.121.169:ns_memcached-sasl<0.8955.0>:ns_memcached:terminate:661]Shutting down bucket "sasl" on 'ns_1@10.3.121.169' for server shutdown
      [ns_server:debug,2013-01-23T8:29:27.344,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
      auto_failover_cfg ->
      [

      {enabled,false}

      ,

      {timeout,30}

      ,

      {max_nodes,1}

      ,

      {count,0}

      ]
      [ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:ns_config_rep<0.31635.76>:ns_config_rep:do_push_keys:317]Replicating some config keys ([auto_failover_cfg,autocompaction,buckets,
      cluster_compat_version,counters,
      dynamic_config_version]..)
      [ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:error,2013-01-23T8:29:27.360,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
      {<0.10831.67>,

      • Memcached logs at time around rebalance failed

      Wed Jan 23 08:29:27.208484 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_18 - disconnected
      Wed Jan 23 08:29:27.286609 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_20 - disconnected
      Wed Jan 23 08:29:28.145984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_17"
      Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_18"
      Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_19"
      Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_20"
      Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_21"
      Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_22"
      Wed Jan 23 08:29:28.192859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_23"
      Wed Jan 23 08:29:28.208484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_24"
      Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Shutting down tap connections!
      Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
      Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
      Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Failed to notify thread: Unknown error
      Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
      Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
      Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
      Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Failed to notify thread: Unknown error
      Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
      Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
      Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
      Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
      Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
      Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
      Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
      Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
      Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
      Wed Jan 23 08:29:42.130359 Pacific Standard Time 3: Had to wait 12 s for shutdown
      Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Shutting down tap connections!
      Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
      Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
      Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
      Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
      Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
      Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
      Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
      Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
      Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
      Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
      Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
      Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
      Wed Jan 23 08:30:16.536609 Pacific Standard Time 3: Had to wait 15 s for shutdown

      Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.1-140-rel.setup.exe.manifest.xml

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              thuan Thuan Nguyen
              thuan Thuan Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty