Details
-
Bug
-
Resolution: Fixed
-
Critical
-
2.0.1
-
Security Level: Public
-
centos 5.x 64 bit
Description
Environment:
- Both source and destination cluster are in 2.0.0 GA
- 2 nodes cluster at source with 2 buckets, one doc and 1 view for each doc
- 2 nodes cluster at source with 2 buckets, one doc and 1 view for each doc
Load 200K items to both buckets
Create xdcr from source to destination cluster
Do offline upgrade at source and destination cluster from 2.0.0-1976 to 2.0.1-156.
Add node ubu-2509 with buid 2.0.1-156 to source cluster and rebalance with load running about 1K at both cluster.
Rebalance failed due to timeout near the end of first bucket rebalanced (sasl bucket).
[stats:error,2013-02-15T2:48:46.891,ns_1@cen-2501.hq.couchbase.com:<0.26578.6>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@cen-2503.hq.couchbase.com']
[ns_server:error,2013-02-15T2:48:46.909,ns_1@cen-2501.hq.couchbase.com:<0.26448.6>:ns_single_vbucket_mover:spawn_and_wait:87]Got unexpected exit signal {'EXIT',<0.20537.5>,
{timeout,
{gen_server,call,
[ns_config,
]}}}
[stats:error,2013-02-15T2:48:46.944,ns_1@cen-2501.hq.couchbase.com:<0.26584.6>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@cen-2503.hq.couchbase.com']
[stats:error,2013-02-15T2:48:46.944,ns_1@cen-2501.hq.couchbase.com:<0.26588.6>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@cen-2503.hq.couchbase.com']
[error_logger:error,2013-02-15T2:48:46.982,ns_1@cen-2501.hq.couchbase.com:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** gen_event handler
crashed.
-
- Was installed in master_activity_events_ingress
- Last event was: {submit_custom_master_event, #Fun<master_activity_events.1.65826123>}
- When handler state == {state,#Fun<master_activity_events.2.6034187>,[]}
- Reason == {timeout,
Unknown macro: {gen_fsm,sync_send_all_state_event, [mb_master,master_node]}
}
[ns_server:debug,2013-02-15T2:48:48.887,ns_1@cen-2501.hq.couchbase.com:<0.12931.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription
{ns_node_disco_events,<0.12894.0>} exited with reason {timeout,
{gen_server,
call,
[ns_config,
]}}
[ns_server:debug,2013-02-15T2:48:48.888,ns_1@cen-2501.hq.couchbase.com:<0.12896.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription
exited with reason {timeout,
{gen_server,
call,
[ns_config,
]}}
[ns_server:debug,2013-02-15T2:48:49.628,ns_1@cen-2501.hq.couchbase.com:<0.12934.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription
exited with reason {timeout,
{gen_server,
call,
[ns_config,
]}}
[error_logger:error,2013-02-15T2:48:49.630,ns_1@cen-2501.hq.couchbase.com:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_pubsub:do_subscribe_link/4
pid: <0.12826.0>
registered_name: []
exception exit: {handler_crashed,master_activity_events_ingress,
{'EXIT',
{timeout,
{gen_fsm,sync_send_all_state_event,
[mb_master,master_node]}}}}
in function ns_pubsub:do_subscribe_link/4
ancestors: [ns_server_sup,ns_server_cluster_sup,<0.66.0>]
messages: []
links: [<0.12781.0>,<0.12825.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 115
neighbours:
[ns_server:error,2013-02-15T2:48:49.646,ns_1@cen-2501.hq.couchbase.com:<0.26665.6>:ns_orchestrator:rebalance_progress:176]Couldn't talk to orchestrator: {exit,
{timeout,
{gen_fsm,sync_send_event,
[
,
rebalance_progress,2000]}}}
[ns_server:info,2013-02-15T2:48:50.142,ns_1@cen-2501.hq.couchbase.com:<0.26236.6>:diag_handler:log_all_tap_and_checkpoint_stats:132]end of logging tap & checkpoint stats
[ns_server:error,2013-02-15T2:48:50.248,ns_1@cen-2501.hq.couchbase.com:<0.13095.0>:ns_memcached:verify_report_long_call:297]call
took too long: 44142120 us
[ns_server:info,2013-02-15T2:48:50.439,ns_1@cen-2501.hq.couchbase.com:ns_port_memcached<0.12863.0>:ns_port_server:log:171]memcached<0.12863.0>: Fri Feb 15 02:48:49.788622 PST 3: TAP (Producer) eq_tapq:replication_building_95_'ns_1@10.3.3.29' - disconnected, keep alive for 300 seconds
memcached<0.12863.0>: Fri Feb 15 02:48:49.925897 PST 3: TAP (Producer) eq_tapq:replication_building_95_'ns_1@cen-2503.hq.couchbase.com' - disconnected, keep alive for 300 seconds
[ns_server:debug,2013-02-15T2:48:50.209,ns_1@cen-2501.hq.couchbase.com:capi_set_view_manager-default<0.26696.6>:capi_set_view_manager:init:218]Usable vbuckets:
[933,622,311,0,856,545,490,179,779,724,413,102,958,647,336,25,881,570,259,204,
804,749,438,127,983,672,50,361,906,595,284,229,829,518,463,152,75,697,386,
1008,931,620,309,254,854,543,488,177,777,722,411,100,956,645,334,23,879,568,
257,202,802,747,436,125,981,670,48,359,904,593,282,227,827,516,461,150,73,
695,384,1006,929,618,307,252,852,541,486,175,98,775,720,409,954,643,332,21,
877,566,511,200,800,745,43
[error_logger:error,2013-02-15T2:48:50.834,ns_1@cen-2501.hq.couchbase.com:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.12894.0> terminating
-
- Last message in was replicate_newnodes_docs
- When Server state == {state,"default",'capi_ddoc_replication_srv-default',
['ns_1@cen-2503.hq.couchbase.com'],
[{doc,<<"_design/d3">>,
{1,<<142,152,152,32>>},
{[{<<"views">>,
{[{<<"v1">>,
{[Unknown macro: {<<"map">>, <<"function(doc,meta){\nemit(doc.num,null);\n}">>}]}}]}}]},
0,false,[]}],
1024,false,undefined,
[active,active,active,active,active,active,
active,active,active,active,active,active,
active,active,active,active,active,active,
active,active,active,active,active,active,
{[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[]}}}}
-
- Reason for termination ==
- {timeout,{gen_server,call,[ns_config,
{eval,#Fun<ns_bucket.0.52407284>}]}}
[ns_server:error,2013-02-15T2:48:51.270,ns_1@cen-2501.hq.couchbase.com:ns_memcached-sasl<0.12947.0>:ns_memcached:handle_info:630]handle_info(ensure_bucket,..) took too long: 4332674 us
[ns_server:info,2013-02-15T2:48:51.328,ns_1@cen-2501.hq.couchbase.com:mb_master<0.12822.0>:mb_master:candidate:365]Changing master from 'ns_1@cen-2501.hq.couchbase.com' to 'ns_1@10.3.3.29'
[ns_server:error,2013-02-15T2:48:51.360,ns_1@cen-2501.hq.couchbase.com:ns_memcached-default<0.12948.0>:ns_memcached:handle_info:630]handle_info(ensure_bucket,..) took too long: 2473280 us
[ns_server:error,2013-02-15T2:48:52.214,ns_1@cen-2501.hq.couchbase.com:<0.13093.0>:ns_memcached:verify_report_long_call:297]call {stats,<<"tapagg _">>} took too long: 561239 us
[stats:error,2013-02-15T2:48:54.830,ns_1@cen-2501.hq.couchbase.com:<0.26666.6>:stats_reader:log_bad_responses:191]Some nodes didn't respond: ['ns_1@10.3.3.29']
[error_logger:error,2013-02-15T2:48:54.781,ns_1@cen-2501.hq.couchbase.com:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: capi_set_view_manager:init/1
pid: <0.12894.0>
registered_name: []
exception exit: {timeout,
{gen_server,call,
[ns_config,{eval,#Fun<ns_bucket.0.52407284>}]}}
{uuid,<<"7bace5ad7988f92d0263e613c872aefd">>}
in function gen_server:init_it/6
ancestors: ['single_bucket_sup-default',<0.12879.0>]
messages: [{#Ref<0.0.173.118623>,
{ok,[,
{sasl_password,[]}
,
{num_replicas,1}
,
{replica_index,false}
,
{ram_quota,1572864000}
,
{auth_type,sasl}
,
{autocompaction,false}
,
{flush_enabled,false}
,
{type,membase}
,
{num_vbuckets,1024}
,
{servers,['ns_1@cen-2501.hq.couchbase.com', 'ns_1@cen-2503.hq.couchbase.com']}
,
{map,[['ns_1@cen-2501.hq.couchbase.com',