Loading...

Details

Type: Technical task
Resolution: Duplicate
Priority: Major
Fix Version/s: 2.0
Affects Version/s: 2.0
Component/s: ns_server
Security Level: Public
Labels:
- system-test
Environment:
centos 6.2 64 bit build 2.0.0-1946

Description

Environment: 8 nodes with 390GB SSD drive, 32GB RAM, 1 default buckets.

Create 6 nodes cluster
10.6.2.37
10.6.2.38
10.6.2.44
10.6.2.45
10.6.2.42
10.6.2.43

Load 40 million items to default bucket that push resident ratio down to around 62% (as of Pavel workload)
Create a design doc with 8 view. Let the initial index completed
Maintain load about 6K ops with 600 queries per second

Add 2 nodes and rebalance.
Rebalance failed (as expected) due to network on node 43 disconnected

Restart network on node 43 and rebalance.
Rebalance failed with error
"Rebalance exited with reason

{badmatch,wrong_rebalancer_pid}

"
Reopen bug ~~MB-7057~~

Rebalance again. Rebalance failed with the same error of bug ~~MB-7057~~
Rebalance again. Rebalance failed with error Resetting rebalance status since it's not really running

ns_server:info,2012-11-10T11:23:23.905,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:handle_call:304]Successfully changed vbucket filter on tap stream `replication_ns_1@10.6.2.37`.
[ns_server:info,2012-11-10T11:23:23.905,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:process_upstream:922]Got vbucket filter change completion message. Silencing upstream sender
[ns_server:info,2012-11-10T11:23:23.912,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:handle_info:348]Got reply from upstream silencing request. Completing state transition to a new ebucketmigrator.
[ns_server:debug,2012-11-10T11:23:23.912,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:complete_native_vb_filter_change:166]Proceeding with reading unread binaries
[ns_server:debug,2012-11-10T11:23:23.913,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:confirm_downstream:193]Going to confirm reception downstream messages
[rebalance:info,2012-11-10T11:23:23.923,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:do_confirm_sent_messages:682]Got close ack!

[ns_server:debug,2012-11-10T11:23:23.923,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:confirm_downstream:197]Confirmed upstream messages are feeded to kernel
[ns_server:debug,2012-11-10T11:23:23.923,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:reply_and_die:206]Passed old state to caller
[ns_server:debug,2012-11-10T11:23:23.924,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:reply_and_die:209]Sent out state. Preparing to die
[ns_server:debug,2012-11-10T11:23:23.924,ns_1@10.6.2.37:<0.15562.1152>:ns_vbm_new_sup:perform_vbucket_filter_change:125]Got old state from previous ebucketmigrator: <0.18812.1091>
[ns_server:debug,2012-11-10T11:23:23.924,ns_1@10.6.2.37:<0.15562.1152>:ns_vbm_new_sup:perform_vbucket_filter_change_loop:166]Sent old state to new instance
[ns_server:info,2012-11-10T11:23:23.925,ns_1@10.6.2.37:<0.15690.1152>:ns_vbm_new_sup:mk_old_state_retriever:83]Got vbucket filter change old state. Proceeding vbucket filter change operation
[ns_server:debug,2012-11-10T11:23:23.925,ns_1@10.6.2.37:<0.15690.1152>:ebucketmigrator_srv:init:450]Got old ebucketmigrator state from <0.18812.1091>:
{state,#Port<0.14094>,#Port<0.14057>,#Port<0.14099>,#Port<0.14060>,
<0.18815.1091>,<<"cut off">>,<<"cut off">>,[],1657808,false,false,0,

{1352,575403,905720}

,
completed,

{<0.15562.1152>,#Ref<0.0.2303.247796>}

,
<<"replication_ns_1@10.6.2.37">>,<0.18812.1091>,

{had_backfill,false,undefined,[]}

,
false}.
[ns_server:debug,2012-11-10T11:23:23.925,ns_1@10.6.2.37:<0.721.0>:ns_process_registry:handle_info:98]Got exit msg: {'EXIT',<0.15562.1152>,{#Ref<0.0.2303.247780>,<0.15690.1152>}}
[error_logger:info,2012-11-10T11:23:23.926,ns_1@10.6.2.37:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================PROGRESS REPORT=========================
supervisor:

{local,'ns_vbm_new_sup-default'}
started: [{pid,<0.15690.1152>},
{name,
{new_child_id,
[513,514,515,516,517,518,519,520,521,522,523,
524,525,526,527,528,529,530,534,535,536,537,
538,539,540,541,542,543,544,545,546],
'ns_1@10.6.2.40'}},
{mfargs,
{ebucketmigrator_srv,start_link,
[{"10.6.2.40",11209},
{"10.6.2.37",11209},
[{old_state_retriever, #Fun<ns_vbm_new_sup.0.113560106>},
{on_not_ready_vbuckets, #Fun<tap_replication_manager.2.88495417>},
{username,"default"},
{password,[]},
{vbuckets, [513,514,515,516,517,518,519,520,521,522, 523,524,525,526,527,528,529,530,534,535, 536,537,538,539,540,541,542,543,544,545, 546]},
{takeover,false},
{suffix,"ns_1@10.6.2.37"}]]}},
{restart_type,temporary},
{shutdown,60000},
{child_type,worker}]

[user:info,2012-11-10T11:23:23.926,ns_1@10.6.2.37:ns_config<0.711.0>:ns_janitor:maybe_stop_rebalance_status:147]Resetting rebalance status since it's not really running
[ns_server:debug,2012-11-10T11:23:23.927,ns_1@10.6.2.37:capi_set_view_manager-default<0.202.981>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[ns_server:debug,2012-11-10T11:23:23.949,ns_1@10.6.2.37:ns_config_rep<0.735.0>:ns_config_rep:do_push_keys:317]Replicating some config keys ([rebalance_status,rebalancer_pid]..)
[ns_server:debug,2012-11-10T11:23:24.014,ns_1@10.6.2.37:capi_set_view_manager-default<0.202.981>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[ns_server:debug,2012-11-10T11:23:24.014,ns_1@10.6.2.37:ns_config_log<0.717.0>:ns_config_log:log_common:111]config change:
rebalancer_pid ->
undefined

ns_server:info,2012-11-10T11:23:23.684,ns_1@10.6.2.37:<0.25048.1120>:ebucketmigrator_srv:handle_call:304]Successfully changed vbucket filter on tap stream `replication_ns_1@10.6.2.37`.
[ns_server:info,2012-11-10T11:23:23.685,ns_1@10.6.2.37:<0.25048.1120>:ebucketmigrator_srv:process_upstream:922]Got vbucket filter change completion message. Silencing upstream sender
[ns_server:info,2012-11-10T11:23:23.685,ns_1@10.6.2.37:<0.25048.1120>:ebucketmigrator_srv:handle_info:348]Got reply from upstream silencing request. Completing state transition to a new ebucketmigrator.
[rebalance:info,2012-11-10T11:23:23.686,ns_1@10.6.2.37:<0.25048.1120>:ebucketmigrator_srv:do_confirm_sent_messages:682]Got close ack!

[ns_server:info,2012-11-10T11:23:23.687,ns_1@10.6.2.37:<0.15546.1152>:ns_vbm_new_sup:mk_old_state_retriever:83]Got vbucket filter change old state. Proceeding vbucket filter change operation
[ns_server:info,2012-11-10T11:23:23.688,ns_1@10.6.2.37:tap_replication_manager-default<0.20761.0>:tap_replication_manager:change_vbucket_filter:185]Going to change replication from 'ns_1@10.6.2.40' to have
[513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,534,
535,536,537,538,539,540,541,542,543,544,545,546] ([534], [])
[error_logger:info,2012-11-10T11:23:23.688,ns_1@10.6.2.37:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================PROGRESS REPORT=========================
supervisor: {local,'ns_vbm_new_sup-default'}

started: [

{pid,<0.15546.1152>}

,
{name,
{new_child_id,
[342,343,344,345,346,347,348,349,350,351,352,
353,354,355,356,357,358,359,360,365,366,367,
368,369,370,371,372,373,374,375],
'ns_1@10.6.2.39'}},
{mfargs,
{ebucketmigrator_srv,start_link,
[

{"10.6.2.39",11209}

,

{"10.6.2.37",11209},
[{old_state_retriever, #Fun<ns_vbm_new_sup.0.113560106>},
{on_not_ready_vbuckets, #Fun<tap_replication_manager.2.88495417>},
{username,"default"},
{password,[]},
{vbuckets, [342,343,344,345,346,347,348,349,350,351, 352,353,354,355,356,357,358,359,360,365, 366,367,368,369,370,371,372,373,374, 375]},
{takeover,false},
{suffix,"ns_1@10.6.2.37"}]]}},
{restart_type,temporary},
{shutdown,60000},
{child_type,worker}]

[ns_server:info,2012-11-10T11:23:23.694,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:handle_call:271]Starting new-style vbucket filter change on stream `replication_ns_1@10.6.2.37`
[ns_server:info,2012-11-10T11:23:23.893,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:handle_call:294]Changing vbucket filter on tap stream `replication_ns_1@10.6.2.37`:
[{513,47},
{514,47},
{515,47},
{516,47},
{517,47},
{518,47},
{519,47},
{520,47},
{521,47},
{522,47},
{523,47},
{524,47},
{525,47},
{526,47},
{527,47},
{528,47},
{529,47},
{530,47},
{534,53},
{535,47},
{536,47},
{537,47},
{538,47},
{539,47},
{540,47},
{541,47},
{542,47},
{543,47},
{544,47},
{545,47},
{546,47}]
[ns_server:info,2012-11-10T11:23:23.905,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:handle_call:304]Successfully changed vbucket filter on tap stream `replication_ns_1@10.6.2.37`.
[ns_server:info,2012-11-10T11:23:23.905,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:process_upstream:922]Got vbucket filter change completion message. Silencing upstream sender
[ns_server:info,2012-11-10T11:23:23.912,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:handle_info:348]Got reply from upstream silencing request. Completing state transition to a new ebucketmigrator.
[rebalance:info,2012-11-10T11:23:23.923,ns_1@10.6.2.37:<0.18812.1091>:ebucketmigrator_srv:do_confirm_sent_messages:682]Got close ack!

[ns_server:info,2012-11-10T11:23:23.925,ns_1@10.6.2.37:<0.15690.1152>:ns_vbm_new_sup:mk_old_state_retriever:83]Got vbucket filter change old state. Proceeding vbucket filter change operation
[user:info,2012-11-10T11:23:23.926,ns_1@10.6.2.37:ns_config<0.711.0>:ns_janitor:maybe_stop_rebalance_status:147]Resetting rebalance status since it's not really running
[error_logger:info,2012-11-10T11:23:23.926,ns_1@10.6.2.37:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================PROGRESS REPORT=========================
supervisor: {local,'ns_vbm_new_sup-default'}
started: [{pid,<0.15690.1152>},
{name,
{new_child_id,
[513,514,515,516,517,518,519,520,521,522,523,
524,525,526,527,528,529,530,534,535,536,537,
538,539,540,541,542,543,544,545,546],
'ns_1@10.6.2.40'}},
{mfargs,
{ebucketmigrator_srv,start_link,
[{"10.6.2.40",11209},
{"10.6.2.37",11209}

,
[

{old_state_retriever, #Fun<ns_vbm_new_sup.0.113560106>}

,

{on_not_ready_vbuckets, #Fun<tap_replication_manager.2.88495417>}

,

{username,"default"}

,

{password,[]}

,

{vbuckets, [513,514,515,516,517,518,519,520,521,522, 523,524,525,526,527,528,529,530,534,535, 536,537,538,539,540,541,542,543,544,545, 546]}

,

{takeover,false}

,

{suffix,"ns_1@10.6.2.37"}

]]}},

{restart_type,temporary}

,

{shutdown,60000}

,

{child_type,worker}

]

Link to manifest of build 2.0.0-1946 http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1947-rel.rpm.manifest.xml

Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/8nodes-ci-1946-reb-failed-Partition-not-in-active-nor-passive-set-20121112-115625.tgz

This bug is similar with bug ~~MB-6573~~ which was closed

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[system test] Resetting rebalance status since it's not really running

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty