Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
2.0-beta-2
-
Security Level: Public
-
centos 6.2 build 2.0.0-1708
Description
Cluster information:
- Load 9 million items to both buckets
- Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Create cluster with 10 nodes installed couchbase server 2.0.0-1708
10.3.121.13
10.3.121.14
10.3.121.15
10.3.121.16
10.3.121.17
10.3.121.20
10.3.121.22
10.3.121.24
10.3.121.25
10.3.121.23
- Data path /data
- View path /data
Doing rebalanc in, out, failover.
Last failed was rebalance add 2 nodes, 23 and 24 into cluster.
During rebalance, node 23 suddenly down. Rebalance failed as expected.
Then later, memcached on node 13 was killed and restart.
[ns_server:debug,2012-09-12T17:06:05.705,ns_1@10.3.121.13:<0.28110.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
{ns_config_events,<0.28109.33>} exited with reason {timeout,{gen_server,
call,
[ns_node_disco,
nodes_wanted]}}
[ns_server:debug,2012-09-12T17:06:05.712,ns_1@10.3.121.13:<0.28166.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_config_events,<0.28165.33>} exited with reason shutdown
[ns_server:error,2012-09-12T17:06:05.718,ns_1@10.3.121.13:<0.29417.33>:ns_memcached:verify_report_long_call:274]call {stats,<<>>} took too long: 5881903 us
[ns_server:debug,2012-09-12T17:06:05.720,ns_1@10.3.121.13:<0.28132.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_node_disco_events,<0.28130.33>} exited with reason shutdown
[ns_server:debug,2012-09-12T17:06:05.732,ns_1@10.3.121.13:<0.28110.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_config_events,<0.28109.33>}
event handler: ok
[ns_server:debug,2012-09-12T17:06:05.733,ns_1@10.3.121.13:<0.28166.33>:ns_pubsub:do_subscribe_link:149]Deleting
event handler: ok
[ns_server:debug,2012-09-12T17:06:05.739,ns_1@10.3.121.13:<0.5306.465>:compaction_daemon:do_chain_compactors:500]Got exit signal from parent:
[ns_server:debug,2012-09-12T17:06:05.740,ns_1@10.3.121.13:<0.28132.33>:ns_pubsub:do_subscribe_link:149]Deleting
{ns_node_disco_events,<0.28130.33>} event handler: ok
[ns_server:debug,2012-09-12T17:06:05.740,ns_1@10.3.121.13:ns_node_disco:ns_node_disco:init:103]Initting ns_node_disco with ['ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.22',
'ns_1@10.3.121.20','ns_1@10.3.121.14',
'ns_1@10.3.121.25','ns_1@10.3.121.24']
[ns_server:debug,2012-09-12T17:06:05.741,ns_1@10.3.121.13:<0.28129.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
[ns_server:debug,2012-09-12T17:06:05.743,ns_1@10.3.121.13:<0.19269.466>:janitor_agent:new_style_query_vbucket_states_loop:111]Exception from query_vbucket_states of "saslbucket":'ns_1@10.3.121.23'
{'EXIT',{{nodedown,'ns_1@10.3.121.23'},
{gen_server,call,
[{'janitor_agent-saslbucket','ns_1@10.3.121.23'},
query_vbucket_states,infinity]}}}
[ns_server:debug,2012-09-12T17:06:05.745,ns_1@10.3.121.13:<0.5309.465>:compaction_daemon:do_chain_compactors:500]Got exit signal from parent: {'EXIT',<0.5306.465>,shutdown}
[ns_server:debug,2012-09-12T17:06:05.746,ns_1@10.3.121.13:<0.28126.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_config_events,<0.28125.33>} exited with reason shutdown
[ns_server:debug,2012-09-12T17:06:05.761,ns_1@10.3.121.13:<0.19269.466>:janitor_agent:new_style_query_vbucket_states_loop_next_step:116]Waiting for "saslbucket" on 'ns_1@10.3.121.23'
[ns_server:debug,2012-09-12T17:06:05.897,ns_1@10.3.121.13:<0.28129.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_config_events,<0.28128.33>}
event handler: ok
[ns_server:debug,2012-09-12T17:06:05.898,ns_1@10.3.121.13:<0.14047.466>:compaction_daemon:do_spawn_view_index_compactor:742]Got exit signal from parent:
[ns_server:debug,2012-09-12T17:06:05.903,ns_1@10.3.121.13:ns_cookie_manager:ns_cookie_manager:do_cookie_sync:115]ns_cookie_manager do_cookie_sync
[ns_server:debug,2012-09-12T17:06:05.903,ns_1@10.3.121.13:<0.28126.33>:ns_pubsub:do_subscribe_link:149]Deleting
event handler: ok
[couchdb:info,2012-09-12T17:06:05.906,ns_1@10.3.121.13:<0.5719.436>:couch_log:info:39]Set view `default`, replica group `_design/d1`, canceling compaction (pid <0.18999.466>)
[ns_server:debug,2012-09-12T17:06:06.000,ns_1@10.3.121.13:ns_cookie_manager:ns_cookie_manager:do_cookie_save:152]saving cookie to "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie"
[ns_server:debug,2012-09-12T17:06:06.002,ns_1@10.3.121.13:<0.28122.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
[ns_server:debug,2012-09-12T17:06:06.009,ns_1@10.3.121.13:<0.28122.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.28121.33>}
event handler: ok
[ns_server:debug,2012-09-12T17:06:06.040,ns_1@10.3.121.13:<0.28120.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
[ns_server:debug,2012-09-12T17:06:06.044,ns_1@10.3.121.13:<0.28648.34>:single_bucket_sup:top_loop:28]Delegating exit {'EXIT',<0.28117.33>,shutdown} to child supervisor: <0.28649.34>
[ns_server:debug,2012-09-12T17:06:06.047,ns_1@10.3.121.13:ns_cookie_manager:ns_cookie_manager:do_cookie_save:154]attempted to save cookie to "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie": ok
[ns_server:debug,2012-09-12T17:06:06.074,ns_1@10.3.121.13:<0.28120.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_tick_event,<0.28119.33>}
event handler: ok
[ns_server:debug,2012-09-12T17:06:06.074,ns_1@10.3.121.13:<0.19276.466>:ns_node_disco:do_nodes_wanted_updated_fun:202]ns_node_disco: nodes_wanted updated: ['ns_1@10.3.121.13','ns_1@10.3.121.14',
'ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.20',
'ns_1@10.3.121.22','ns_1@10.3.121.23',
'ns_1@10.3.121.24','ns_1@10.3.121.25'], with cookie: oetfisqvqrrxvaat
[ns_server:debug,2012-09-12T17:06:06.079,ns_1@10.3.121.13:<0.28689.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
[ns_server:debug,2012-09-12T17:06:06.080,ns_1@10.3.121.13:<0.28687.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_tick_event,<0.28686.34>} exited with reason shutdown
[ns_server:debug,2012-09-12T17:06:06.083,ns_1@10.3.121.13:<0.28689.34>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.28688.34>}
event handler: ok
[ns_server:debug,2012-09-12T17:06:06.084,ns_1@10.3.121.13:<0.28687.34>:ns_pubsub:do_subscribe_link:149]Deleting
event handler: ok
[stats:warn,2012-09-12T17:06:06.087,ns_1@10.3.121.13:<0.29425.33>:stats_collector:latest_tick:201]Dropped 7 ticks
[rebalance:info,2012-09-12T17:06:06.091,ns_1@10.3.121.13:<0.25334.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[rebalance:info,2012-09-12T17:06:06.120,ns_1@10.3.121.13:<0.23919.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[ns_server:debug,2012-09-12T17:06:06.120,ns_1@10.3.121.13:<0.1936.453>:ns_pubsub:do_subscribe_link:149]Deleting
{master_activity_events_ingress,<0.28014.33>}event handler:
{error, module_not_found}[rebalance:info,2012-09-12T17:06:06.128,ns_1@10.3.121.13:<0.23192.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[rebalance:info,2012-09-12T17:06:06.147,ns_1@10.3.121.13:<0.22947.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[rebalance:info,2012-09-12T17:06:06.201,ns_1@10.3.121.13:<0.21996.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[rebalance:info,2012-09-12T17:06:06.218,ns_1@10.3.121.13:<0.11912.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[rebalance:info,2012-09-12T17:06:06.223,ns_1@10.3.121.13:<0.10815.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[ns_server:debug,2012-09-12T17:06:06.228,ns_1@10.3.121.13:<0.28050.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
{buckets_events,<0.28049.33>} exited with reason {shutdown,{gen_server,
call,
['ns_vbm_new_sup-default',
which_children,
infinity]}}
[ns_server:debug,2012-09-12T17:06:06.239,ns_1@10.3.121.13:<0.28050.33>:ns_pubsub:do_subscribe_link:149]Deleting {buckets_events,<0.28049.33>}
event handler: ok
[user:info,2012-09-12T17:06:06.245,ns_1@10.3.121.13:'ns_memcached-default':ns_memcached:terminate:625]Shutting down bucket "default" on 'ns_1@10.3.121.13' for server shutdown
[ns_server:info,2012-09-12T17:06:06.261,ns_1@10.3.121.13:'ns_memcached-default':ns_memcached:terminate:636]This bucket shutdown is not due to bucket deletion. Doing nothing
[ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28651.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
[ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28663.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {mc_couch_events,<0.28650.34>} exited with reason shutdown
[ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28662.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_node_disco_events,<0.28650.34>} exited with reason shutdown
[ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28648.34>:single_bucket_sup:top_loop:24]per-bucket supervisor for "default" died with reason shutdown
[ns_server:debug,2012-09-12T17:06:06.279,ns_1@10.3.121.13:<0.28651.34>:ns_pubsub:do_subscribe_link:149]Deleting {ns_config_events,<0.28650.34>}
event handler: ok
[ns_server:debug,2012-09-12T17:06:06.281,ns_1@10.3.121.13:<0.28663.34>:ns_pubsub:do_subscribe_link:149]Deleting
event handler: ok
[ns_server:debug,2012-09-12T17:06:06.283,ns_1@10.3.121.13:<0.28662.34>:ns_pubsub:do_subscribe_link:149]Deleting
event handler: ok
[ns_server:debug,2012-09-12T17:06:06.283,ns_1@10.3.121.13:<0.29393.33>:single_bucket_sup:top_loop:28]Delegating exit
to child supervisor: <0.29394.33>
[ns_server:debug,2012-09-12T17:06:06.299,ns_1@10.3.121.13:<0.29428.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
{ns_stats_event,<0.29427.33>} exited with reason shutdown[ns_server:debug,2012-09-12T17:06:06.300,ns_1@10.3.121.13:<0.29426.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_tick_event,<0.29425.33>} exited with reason shutdown
[ns_server:debug,2012-09-12T17:06:06.305,ns_1@10.3.121.13:<0.29428.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.29427.33>}
event handler: ok
[ns_server:debug,2012-09-12T17:06:06.307,ns_1@10.3.121.13:<0.29426.33>:ns_pubsub:do_subscribe_link:149]Deleting
event handler: ok
[rebalance:info,2012-09-12T17:06:06.310,ns_1@10.3.121.13:<0.11978.466>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[ns_server:info,2012-09-12T17:06:06.323,ns_1@10.3.121.13:ns_port_memcached:ns_port_server:log:169]memcached<0.28094.33>: Thu Sep 13 00:06:06.117323 3: TAP (Consumer) eq_tapq:anon_509 - disconnected
memcached<0.28094.33>: Thu Sep 13 00:06:06.123981 3: TAP (Consumer) eq_tapq:anon_1652 - disconnected
memcached<0.28094.33>: Thu Sep 13 00:06:06.144392 3: TAP (Consumer) eq_tapq:anon_480 - disconnected
memcached<0.28094.33>: Thu Sep 13 00:06:06.151955 3: TAP (Consumer) eq_tapq:anon_488 - disconnected
memcached<0.28094.33>: Thu Sep 13 00:06:06.203038 3: TAP (Consumer) eq_tapq:anon_504 - disconnected
memcached<0.28094.33>: Thu Sep 13 00:06:06.221683 3: TAP (Consumer) eq_tapq:anon_719 - disconnected
memcached<0.28094.33>: Thu Sep 13 00:06:06.227307 3: TAP (Consumer) eq_tapq:anon_499 - disconnected
memcached log from node 13 at the time it was killed
Thu Sep 13 00:06:07.606021 3: Schedule cleanup of "eq_tapq:anon_20"
Thu Sep 13 00:06:07.606058 3: Schedule cleanup of "eq_tapq:anon_34"
Thu Sep 13 00:06:07.606083 3: Schedule cleanup of "eq_tapq:anon_36"
Thu Sep 13 00:06:07.606115 3: Schedule cleanup of "eq_tapq:anon_40"
Thu Sep 13 00:06:07.606142 3: Schedule cleanup of "eq_tapq:anon_730"
Thu Sep 13 00:06:07.606165 3: Schedule cleanup of "eq_tapq:anon_2425"
Thu Sep 13 00:06:21.096648 3: Had to wait 13 s for shutdown
Thu Sep 13 00:06:31.465158 3: Shutting down tap connections!
Thu Sep 13 00:06:31.465199 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.15"
Thu Sep 13 00:06:31.465278 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.16"
Thu Sep 13 00:06:31.465357 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.22"
Thu Sep 13 00:06:31.465405 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.17"
Thu Sep 13 00:06:31.465414 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.15 - Clear the tap queues by force
Thu Sep 13 00:06:31.465441 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.20"
Thu Sep 13 00:06:31.465471 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.14"
Thu Sep 13 00:06:31.465492 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.25"
Thu Sep 13 00:06:31.465517 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.24"
Thu Sep 13 00:06:31.465604 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.16 - Clear the tap queues by force
Thu Sep 13 00:06:31.467071 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.22 - Clear the tap queues by force
Thu Sep 13 00:06:31.467363 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.20 - Clear the tap queues by force
Thu Sep 13 00:06:31.467606 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.24 - Clear the tap queues by force
Thu Sep 13 00:06:31.467710 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.25 - Clear the tap queues by force
Thu Sep 13 00:06:31.467917 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.17 - Clear the tap queues by force
Thu Sep 13 00:06:31.468102 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.14 - Clear the tap queues by force
Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/10nodes-1708-memcached-killed-after-reb-failed-20120912.tgz
Link to atop of node 13 https://s3.amazonaws.com/packages.couchbase/atop-files/orange/201209/atop-node13-20120912