Loading...

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: 2.0.1
Affects Version/s: 2.0.1
Component/s: couchbase-bucket
Security Level: Public
Labels:
- system-test
Environment:
windows R2 2008 64bit

Description

Environment:

9 windows 2008 R2 64bit.
Each server has 4 CPU, 8GB RAM and SSD disk
Cluster has 2 buckets, default and sasl bucket with consistent view enable.
Load 26 million items to default bucket and 16 million items to sasl bucket. Each key has size from 128 to 512 bytes
Each bucket has one doc and 2 views for each doc.

Rebalance out 2 nodes 10.3.121.173 and 10.3.121.243

Starting rebalance, KeepNodes = ['ns_1@10.3.3.181','ns_1@10.3.121.47',
'ns_1@10.3.3.214','ns_1@10.3.3.182',
'ns_1@10.3.3.180','ns_1@10.3.121.171',
'ns_1@10.3.121.169'], EjectNodes = ['ns_1@10.3.121.173',
'ns_1@10.3.121.243'] ns_orchestrator004 ns_1@10.3.121.169 23:26:03 - Tue Jan 22, 2013

Rebalance failed due to buckets were shutting down on orchestrator node.

ns_server:debug,2013-01-23T8:29:27.672,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
rebalance_status ->

{none,<<"Rebalance stopped by janitor.">>}

[user:info,2013-01-23T8:29:26.219,ns_1@10.3.121.169:ns_memcached-default<0.968.1>:ns_memcached:terminate:661]Shutting down bucket "default" on 'ns_1@10.3.121.169' for server shutdown
[ns_server:error,2013-01-23T8:29:26.219,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
{<0.12009.70>,
[

{registered_name,[]}

,

{status,waiting}

,
{initial_call,{proc_lib,init_p,5}},

{backtrace,[<<"Program counter: 0x04e7e1c8 (couch_file:reader_loop/3 + 116)">>, <<"CP: 0x00000000 (invalid)">>,<<"arity = 0">>,<<>>, <<"0x126e4ce4 Return addr 0x017a2da8 (proc_lib:init_p_do_apply/3 + 28)">>, <<"y(0) 10">>,<<"y(1) \"c:/data/sasl/109.couch.14\"">>, <<"y(2) []">>,<<>>, <<"0x126e4cf4 Return addr 0x00b409b4 (<terminate process normally>)">>, <<"y(0) Catch 0x017a2db8 (proc_lib:init_p_do_apply/3 + 44)">>, <<>>]}

,

{error_handler,error_handler}

,
{garbage_collection,[

{min_bin_vheap_size,46368}

,

{min_heap_size,233}

,

{fullsweep_after,512}

,

{minor_gcs,403}

]},

{heap_size,377}

,

{total_heap_size,754}

,

{links,[<0.12008.70>]}

,

{memory,3496}

,

{message_queue_len,0}

,

{reductions,216588}

,

{trap_exit,true}

]}

[ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:<0.835.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription

{buckets_events,<0.833.0>}

exited with reason {shutdown,
{gen_server,
call,
['ns_vbm_new_sup-sasl',
which_children,
infinity]}}
[ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
rebalancer_pid ->
undefined
[ns_server:debug,2013-01-23T8:29:27.329,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[user:info,2013-01-23T8:29:27.329,ns_1@10.3.121.169:ns_memcached-sasl<0.8955.0>:ns_memcached:terminate:661]Shutting down bucket "sasl" on 'ns_1@10.3.121.169' for server shutdown
[ns_server:debug,2013-01-23T8:29:27.344,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
auto_failover_cfg ->
[

{enabled,false}

,

{timeout,30}

,

{max_nodes,1}

,

{count,0}

]
[ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:ns_config_rep<0.31635.76>:ns_config_rep:do_push_keys:317]Replicating some config keys ([auto_failover_cfg,autocompaction,buckets,
cluster_compat_version,counters,
dynamic_config_version]..)
[ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[ns_server:error,2013-01-23T8:29:27.360,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
{<0.10831.67>,

Memcached logs at time around rebalance failed

Wed Jan 23 08:29:27.208484 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_18 - disconnected
Wed Jan 23 08:29:27.286609 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_20 - disconnected
Wed Jan 23 08:29:28.145984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_17"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_18"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_19"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_20"
Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_21"
Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_22"
Wed Jan 23 08:29:28.192859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_23"
Wed Jan 23 08:29:28.208484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_24"
Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Shutting down tap connections!
Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
Wed Jan 23 08:29:42.130359 Pacific Standard Time 3: Had to wait 12 s for shutdown
Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Shutting down tap connections!
Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
Wed Jan 23 08:30:16.536609 Pacific Standard Time 3: Had to wait 15 s for shutdown