Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.5.0
-
4.5.0-2585
-
Untriaged
-
Windows 64-bit
-
Unknown
Description
the same test/step as in MB-19687
rebalance in 3+1 node on src cluster afte dataload more then 1 day
I see 3 issues here but most likely they are interrelated
1
Rebalance exited with reason {noproc,
|
{gen_server,call,
|
[{'janitor_agent-RevAB','ns_1@172.23.107.85'},
|
{get_dcp_docs_estimate,811,
|
['ns_1@172.23.107.48','ns_1@172.23.105.87']},
|
infinity]}}
|
ns_orchestrator 002 ns_1@172.23.105.87 12:46:18 PM Fri May 20, 2016
|
2
Service 'ns_server' exited with status 1. Restarting. Messages: [os_mon] win32 supervisor port (win32sysinfo): Erlang has closed
|
{"Kernel pid terminated",application_controller,"{application_terminated,os_mon,shutdown}"}
|
|
Crash dump was written to: erl_crash.dump
|
Kernel pid terminated (application_controller) ({application_terminated,os_mon,shutdown}) ns_log 000 ns_1@172.23.107.85 12:47:00 PM Fri May 20, 2016
|
3
Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=3
|
MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=4
|
Metadata service not available after 30 retries.
|
|
[goport] 2016/05/20 12:46:50 c:/Program Files/Couchbase/Server/bin/goxdcr.exe terminated: exit status 1 ns_log 000 ns_1@172.23.107.85 12:47:00 PM Fri May 20, 2016
|
all console logs:
Bucket "AbRegNums" loaded on node 'ns_1@172.23.107.85' in 0 seconds. ns_memcached 000 ns_1@172.23.107.85 12:47:08 PM Fri May 20, 2016
|
Bucket "MsgsCalls" loaded on node 'ns_1@172.23.107.85' in 0 seconds. ns_memcached 000 ns_1@172.23.107.85 12:47:08 PM Fri May 20, 2016
|
Bucket "RevAB" loaded on node 'ns_1@172.23.107.85' in 0 seconds. ns_memcached 000 ns_1@172.23.107.85 12:47:07 PM Fri May 20, 2016
|
Bucket "UserInfo" loaded on node 'ns_1@172.23.107.85' in 0 seconds. ns_memcached 000 ns_1@172.23.107.85 12:47:06 PM Fri May 20, 2016
|
Couchbase Server has started on web port 8091 on node 'ns_1@172.23.107.85'. Version: "4.5.0-2585-enterprise". menelaus_sup 001 ns_1@172.23.107.85 12:47:04 PM Fri May 20, 2016
|
Node 'ns_1@172.23.105.94' saw that node 'ns_1@172.23.107.85' came up. Tags: [] ns_node_disco 004 ns_1@172.23.105.94 12:47:02 PM Fri May 20, 2016
|
Node 'ns_1@172.23.107.48' saw that node 'ns_1@172.23.107.85' came up. Tags: [] ns_node_disco 004 ns_1@172.23.107.48 12:47:01 PM Fri May 20, 2016
|
Node 'ns_1@172.23.107.85' synchronized otp cookie aaqxrnpoboutibbv from cluster ns_cookie_manager 002 ns_1@172.23.107.85 12:47:00 PM Fri May 20, 2016
|
Node 'ns_1@172.23.105.87' saw that node 'ns_1@172.23.107.85' came up. Tags: [] ns_node_disco 004 ns_1@172.23.105.87 12:47:00 PM Fri May 20, 2016
|
Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=3
|
MetadataService 2016-05-20T12:46:50.581-07:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connectex: No connection could be made because the target machine actively refused it., num_of_retry=4
|
Metadata service not available after 30 retries.
|
|
[goport] 2016/05/20 12:46:50 c:/Program Files/Couchbase/Server/bin/goxdcr.exe terminated: exit status 1 ns_log 000 ns_1@172.23.107.85 12:47:00 PM Fri May 20, 2016
|
Service 'ns_server' exited with status 1. Restarting. Messages: [os_mon] win32 supervisor port (win32sysinfo): Erlang has closed
|
{"Kernel pid terminated",application_controller,"{application_terminated,os_mon,shutdown}"}
|
|
Crash dump was written to: erl_crash.dump
|
Kernel pid terminated (application_controller) ({application_terminated,os_mon,shutdown}) ns_log 000 ns_1@172.23.107.85 12:47:00 PM Fri May 20, 2016
|
Node 'ns_1@172.23.107.48' saw that node 'ns_1@172.23.107.85' went down. Details: [{nodedown_reason,
|
connection_closed}] ns_node_disco 005 ns_1@172.23.107.48 12:46:24 PM Fri May 20, 2016
|
Node 'ns_1@172.23.105.87' saw that node 'ns_1@172.23.107.85' went down. Details: [{nodedown_reason,
|
connection_closed}] ns_node_disco 005 ns_1@172.23.105.87 12:46:24 PM Fri May 20, 2016
|
Node 'ns_1@172.23.105.94' saw that node 'ns_1@172.23.107.85' went down. Details: [{nodedown_reason,
|
connection_closed}] ns_node_disco 005 ns_1@172.23.105.94 12:46:24 PM Fri May 20, 2016
|
Shutting down bucket "AbRegNums" on 'ns_1@172.23.107.85' for server shutdown ns_memcached 000 ns_1@172.23.107.85 12:46:22 PM Fri May 20, 2016
|
Shutting down bucket "MsgsCalls" on 'ns_1@172.23.107.85' for server shutdown ns_memcached 000 ns_1@172.23.107.85 12:46:19 PM Fri May 20, 2016
|
Rebalance exited with reason {noproc,
|
{gen_server,call,
|
[{'janitor_agent-RevAB','ns_1@172.23.107.85'},
|
{get_dcp_docs_estimate,811,
|
['ns_1@172.23.107.48','ns_1@172.23.105.87']},
|
infinity]}}
|
ns_orchestrator 002 ns_1@172.23.105.87 12:46:18 PM Fri May 20, 2016
|
Shutting down bucket "RevAB" on 'ns_1@172.23.107.85' for server shutdown ns_memcached 000 ns_1@172.23.107.85 12:46:17 PM Fri May 20, 2016
|
Shutting down bucket "UserInfo" on 'ns_1@172.23.107.85' for server shutdown ns_memcached 000 ns_1@172.23.107.85 12:46:15 PM Fri May 20, 2016
|
Bucket "RevAB" rebalance does not seem to be swap rebalance ns_vbucket_mover 000 ns_1@172.23.105.87 12:36:18 PM Fri May 20, 2016
|
Bucket "RevAB" loaded on node 'ns_1@172.23.107.48' in 0 seconds. ns_memcached 000 ns_1@172.23.107.48 12:36:17 PM Fri May 20, 2016
|
Started rebalancing bucket RevAB ns_rebalancer 000 ns_1@172.23.105.87 12:36:16 PM Fri May 20, 2016
|
Bucket "UserInfo" rebalance does not seem to be swap rebalance ns_vbucket_mover 000 ns_1@172.23.105.87 12:28:55 PM Fri May 20, 2016
|
Bucket "UserInfo" loaded on node 'ns_1@172.23.107.48' in 0 seconds. ns_memcached 000 ns_1@172.23.107.48 12:28:49 PM Fri May 20, 2016
|
Started rebalancing bucket UserInfo ns_rebalancer 000 ns_1@172.23.105.87 12:28:48 PM Fri May 20, 2016
|
Starting rebalance, KeepNodes = ['ns_1@172.23.105.87','ns_1@172.23.105.94',
|
'ns_1@172.23.107.48','ns_1@172.23.107.85'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
|
ns_orchestrator 004 ns_1@172.23.105.87 12:28:45 PM Fri May 20, 2016
|
|
Attachments
For Gerrit Dashboard: MB-19708 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
64662,2 | MB-19708: Add supervision of os_mon | master | ns_server | Status: ABANDONED | 0 | +1 |
64732,1 | MB-19708: Start os_mon_sup from couchdb_server_sup | master | couchdb | Status: ABANDONED | 0 | -1 |