Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Fixed
Priority: Major
Fix Version/s: 2.1.0
Affects Version/s: 2.0.1
Component/s: ns_server
Security Level: Public
Labels:
None

Description

http://qa.hq.northscale.net/job/centos-64-2.0-upgrade/89/consoleFull

./testrunner -i /tmp/upgrade.ini get-logs=False,upgrade_version=2.0.1-169-rel,initial_vbuckets=64 -t newupgradetests.MultiNodesUpgradeTests.online_upgrade_rebalance_in_out,initial_version=2.0.0-1976-rel,items=50000,expire_time=10000,wait_expire=true,GROUP=2_0;ONLINE

steps:
1. cluster with 2 2.0.0-1976 nodes (10.3.3.11, 10.3.3.14)
2. 50000 items with expiration=10000
3. install 2.0.1-169 on 10.3.3.13, 10.3.3.16
4. after installation test slept 10000 seconds( ~ 3 hours) and then tried to add new nodes to cluster
[2013-02-28 10:49:37,885] - [basetestcase:147] INFO - sleep for 10 secs. Installation of new version is done. Wait for rebalance ...
[2013-02-28 10:49:47,896] - [basetestcase:147] INFO - sleep for 10000 secs. ...
[2013-02-28 13:36:28,396] - [task:242] INFO - adding node 10.3.3.13:8091 to cluster
[2013-02-28 13:36:28,401] - [rest_client:721] INFO - adding remote node @10.3.3.13:8091 to this cluster @10.3.3.11:8091
[2013-02-28 13:36:31,520] - [task:242] INFO - adding node 10.3.3.16:8091 to cluster
[2013-02-28 13:36:31,521] - [rest_client:721] INFO - adding remote node @10.3.3.16:8091 to this cluster @10.3.3.11:8091
[2013-02-28 13:36:32,538] - [rest_client:578] ERROR - http://10.3.3.11:8091/controller/addNode error 400 reason: unknown ["Prepare join failed. Could not connect to \"10.3.3.16\" on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers."]
[2013-02-28 13:36:32,538] - [rest_client:741] ERROR - add_node error : ["Prepare join failed. Could not connect to \"10.3.3.16\" on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers."]
ERROR

10.3.3.16 was not added and its logs contain at that time:

[ns_server:error,2013-02-28T13:36:13.437,ns_1@127.0.0.1:ns_heart<0.26082.0>:ns_heart:grab_samples_loading_tasks:328]Failed to grab samples loader tasks: {exit,
{noproc,
{gen_server,call,
[samples_loader_tasks,get_tasks,
2000]}},
[

{gen_server,call,3},
{ns_heart,grab_samples_loading_tasks,0},
{ns_heart,current_status,0},
{ns_heart,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
[ns_server:warn,2013-02-28T13:36:19.112,ns_1@127.0.0.1:mb_master<0.26095.0>:mb_master:handle_info:232]Skipped 1 heartbeats

[ns_server:warn,2013-02-28T13:36:26.792,ns_1@127.0.0.1:mb_master<0.26095.0>:mb_master:handle_info:232]Skipped 3 heartbeats

[ns_server:warn,2013-02-28T13:36:35.218,ns_1@127.0.0.1:mb_master<0.26095.0>:mb_master:handle_info:232]Skipped 2 heartbeats

[error_logger:info,2013-02-28T13:36:35.295,ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================PROGRESS REPORT=========================
supervisor: {local,ns_server_sup}
started: [{pid,<0.26105.0>},
{name,master_activity_events_keeper},
{mfargs,{master_activity_events_keeper,start_link,[]}},
{restart_type,permanent},
{shutdown,brutal_kill},
{child_type,worker}]

[ns_server:error,2013-02-28T13:36:35.359,ns_1@127.0.0.1:ns_heart<0.26082.0>:ns_heart:grab_samples_loading_tasks:328]Failed to grab samples loader tasks: {exit,
{noproc,
{gen_server,call,
[samples_loader_tasks,get_tasks,
2000]}},
[{gen_server,call,3}

{ns_heart,grab_samples_loading_tasks,0}

{ns_heart,current_status,0}

{ns_heart,handle_info,2}

{gen_server,handle_msg,5}

{proc_lib,init_p_do_apply,3}

]}

I also see that there are several crashes on the nodes, even if they are clean(should we fix it? separate bug?)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

85f0dbcb-c6d0-4203-bfea-df12cb605451-10.3.3.16-diag.txt.gz
272 kB
01/Mar/13 3:07 AM
85f0dbcb-c6d0-4203-bfea-df12cb605451-10.3.3.14-diag.txt.gz
259 kB
01/Mar/13 3:07 AM
85f0dbcb-c6d0-4203-bfea-df12cb605451-10.3.3.13-diag.txt.gz
134 kB
01/Mar/13 3:07 AM
85f0dbcb-c6d0-4203-bfea-df12cb605451-10.3.3.11-diag.txt.gz
265 kB
01/Mar/13 3:07 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Andrei Baranouski

Reporter:: Andrei Baranouski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Mar/13 3:07 AM

Updated:: 22/Apr/13 2:49 AM

Resolved:: 22/Apr/13 2:49 AM

Gerrit Reviews

There are no open Gerrit changes

online upgrade 2.0.0 -> 2.0.1: addNode that is not used for a long time after installation: Prepare join failed... ('Skipped 1/2/3 heartbeats" at that time)

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty