Details
-
Bug
-
Resolution: Fixed
-
Major
-
6.5.0
-
Untriaged
-
Unknown
Description
Build : 6.5.0-2686
This is seen intermittently in the very first test of the sanity suite. The first step is to setup a cluster - add a node to an existing cluster of one node. There are no buckets at that point. This rebalance operation fails some times.
[ns_server:error,2019-03-20T16:02:08.194-07:00,ns_1@172.23.120.173:service_agent-index<0.2183.0>:service_agent:handle_info:231]Linked process <0.2471.0> died with reason {timeout,
|
{gen_server,call,
|
[<0.2220.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.102434519>},
|
60000]}}. Terminating
|
[ns_server:error,2019-03-20T16:02:08.194-07:00,ns_1@172.23.120.173:service_agent-index<0.2183.0>:service_agent:terminate:260]Terminating abnormally
|
[ns_server:error,2019-03-20T16:02:08.194-07:00,ns_1@172.23.120.173:service_agent-index<0.2183.0>:service_agent:terminate:265]Terminating json rpc connection for index: <0.2220.0>
|
[ns_server:error,2019-03-20T16:02:08.195-07:00,ns_1@172.23.120.173:service_rebalancer-index<0.4013.0>:service_rebalancer:run_rebalance:82]Agent terminated during the rebalance: {'DOWN',
|
#Ref<0.2176371542.1558970372.43169>,
|
process,<0.2183.0>,
|
{linked_process_died,<0.2471.0>,
|
{timeout,
|
{gen_server,call,
|
[<0.2220.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.102434519>},
|
60000]}}}}
|
[ns_server:error,2019-03-20T16:02:08.195-07:00,ns_1@172.23.120.173:service_agent-index<0.5226.0>:service_agent:handle_call:182]Got rebalance-only call {if_rebalance,<0.4013.0>,unset_rebalancer} that doesn't match rebalancer pid undefined
|
[ns_server:error,2019-03-20T16:02:08.196-07:00,ns_1@172.23.120.173:service_rebalancer-index<0.4013.0>:service_agent:process_bad_results:810]Service call unset_rebalancer (service index) failed on some nodes:
|
[{'ns_1@172.23.120.173',nack}]
|
[ns_server:warn,2019-03-20T16:02:08.196-07:00,ns_1@172.23.120.173:service_rebalancer-index<0.4013.0>:service_rebalancer:run_rebalance:91]Failed to unset rebalancer on some nodes:
|
{error,{bad_nodes,index,unset_rebalancer,[{'ns_1@172.23.120.173',nack}]}}
|
[ns_server:warn,2019-03-20T16:02:08.195-07:00,ns_1@172.23.120.173:<0.4034.0>:async:maybe_log_down_message:480]Monitored process <0.4027.0> terminated abnormally (reason = {linked_process_died,
|
<0.2471.0>,
|
{timeout,
|
{gen_server,
|
call,
|
[<0.2220.0>,
|
{call,
|
"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.102434519>},
|
60000]}}})
|
[ns_server:warn,2019-03-20T16:02:08.196-07:00,ns_1@172.23.120.173:<0.4035.0>:async:maybe_log_down_message:480]Monitored process <0.4027.0> terminated abnormally (reason = {linked_process_died,
|
<0.2471.0>,
|
{timeout,
|
{gen_server,
|
call,
|
[<0.2220.0>,
|
{call,
|
"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.102434519>},
|
60000]}}})
|
[ns_server:info,2019-03-20T16:02:08.196-07:00,ns_1@172.23.120.173:leader_registry<0.3547.0>:leader_registry_server:handle_down:253]Process <0.3806.0> registered as 'ns_rebalance_observer' terminated.
|
[user:error,2019-03-20T16:02:08.196-07:00,ns_1@172.23.120.173:<0.3573.0>:ns_orchestrator:do_log_rebalance_completion:1243]Rebalance exited with reason {service_rebalance_failed,index,
|
{linked_process_died,<0.2471.0>,
|
{timeout,
|
{gen_server,call,
|
[<0.2220.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.102434519>},
|
60000]}}}}. Operation Id = 1cd752172291bf432f8b439c471033c8
|
In the index logs around the same time, we see -
2019-03-20T16:02:06.827-07:00 [Info] PeriodicStats = {"cpu_utilization":0.33330000333300003,"index_not_found_errcount":0,"indexer_state":"Active","memory_free":43749376,"memory_quota":268435456,"memory_rss":32002048,"memory_total":3962036224,"memory_total_storage":4775936,"memory_used":60612608,"memory_used_queue":0,"memory_used_storage":163840,"needs_restart":false,"num_connections":0,"num_cpu_core":4,"storage_mode":"plasma","timings/stats_response":"70 6378342 665682935552","uptime":"1m0.06985698s"}
|
2019-03-20T16:02:06.848-07:00 [Info] janitor: running cleanup.
|
2019-03-20T16:02:06.848-07:00 [Info] janitor: go-routine terminates.
|
2019-03-20T16:02:06.848-07:00 [Info] builder: recovering scheduled index
|
2019-03-20T16:02:06.848-07:00 [Info] builder: go-routine terminates.
|
2019-03-20T16:02:06.848-07:00 [Info] manager ResetConnection: closing metadata repo
|
2019-03-20T16:02:06.848-07:00 [INFO][FDB] Forestdb closed database file /opt/couchbase/var/lib/couchbase/data/@2i/MetadataStore
|
2019-03-20T16:02:06.848-07:00 [Error] PeerListener.listen(): Error in accepting new connection. Error = accept tcp [::]:9100: use of closed network connection. Terminate.
|
2019-03-20T16:02:06.849-07:00 [Info] memstats {"Alloc":52625032, "TotalAlloc":54099552, "Sys":60315896, "Lookups":49, "Mallocs":26069,"Frees":14271, "HeapAlloc":52625032, "HeapSys":54427648, "HeapIdle":712704, "HeapInuse":53714944,"HeapReleased":0, "HeapObjects":11798,"GCSys":1957888, "LastGC":1553122866875326778,"PauseTotalNs":17648245, "PauseNs":[2314538, 1552393, 13781314], "NumGC":3}
|
2019-03-20T16:02:07.848-07:00 [Info] manager ResetConnection: creating new lifecycle mgr
|
2019-03-20T16:02:07.856-07:00 [INFO][FDB] Forestdb opened database file /opt/couchbase/var/lib/couchbase/data/@2i/MetadataStore
|
2019-03-20T16:02:07.856-07:00 [Info] manager ResetConnection: restarting metadadta repo
|
2019-03-20T16:02:07.856-07:00 [Info] manager ResetConnection: restarting lifecycle mgr
|
2019-03-20T16:02:07.856-07:00 [Info] handleSecurityChange: restarting mutation manager
|
2019-03-20T16:02:07.856-07:00 [Info] handleSecurityChange: restarting scan coordinator
|
2019-03-20T16:02:07.856-07:00 [Info] [Queryport ":9101"] ... stopped
|
2019-03-20T16:02:07.856-07:00 [Info] [Queryport ":9101"] ... restarting listener
|
2019-03-20T16:02:07.857-07:00 [Error] [Queryport ":9101"] failed starting listener :9101 listen tcp :9101: bind: address already in use !!
|
2019-03-20T16:02:07.857-07:00 [Error] [Queryport ":9101"] Accept() Error: accept tcp [::]:9101: use of closed network connection
|
2019-03-20T16:02:07.865-07:00 [Info] updator: updating service map. server group=Group 1, indexerVersion=4 nodeAddr 172.23.120.173:8091 clusterVersion 4 excludeNode %!(EXTRA uint64=2)
|
2019/03/20 16:02:08 revrpc: Got error (EOF) and will retry in 1s
|
2019-03-20T16:02:08.252-07:00 [Info] serviceChangeNotifier: received PoolChangeNotification
|
2019-03-20T16:02:08.857-07:00 [Error] HTTPS Server terminates on error: accept tcp [::]:19102: use of closed network connection
|
2019-03-20T16:02:09.196-07:00 [Info] ServiceMgr::GetTaskList []
|
2019-03-20T16:02:09.857-07:00 [Info] handleSecurityChange: restarting http/https server
|
2019-03-20T16:02:09.857-07:00 [Info] indexer:: SSL server started: :19102
|
Logs for both the nodes are attached.
Not marking this as a test blocker as it is not consistently reproducible.