Details
-
Bug
-
Resolution: Fixed
-
Test Blocker
-
6.5.0
-
Untriaged
-
Unknown
-
KV-Engine Mad-Hatter Beta
Description
testrunner make simple-test failing for all rebalance test of views.
[2019-07-16 10:03:50,653] - [rest_client] [140160049493760] - ERROR - {u'node': u'n_0@127.0.0.1', u'code': 0, u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.18573.0>,\n {{dcp_wait_for_data_move_failed,"default",\n 1023,\'n_0@127.0.0.1\',\n [\'n_1@127.0.0.1\'],\n {error,no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',5,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,465}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,232}]}]}}}}.\nRebalance Operation Id = 27fff0232ec662a5b2f54e27ced5c9a4', u'shortText': u'message', u'serverTime': u'2019-07-16T10:03:45.468Z', u'module': u'ns_orchestrator', u'tstamp': 1563251625468, u'type': u'critical'}
|
[2019-07-16 10:03:50,653] - [rest_client] [140160049493760] - ERROR - {u'node': u'n_0@127.0.0.1', u'code': 0, u'text': u'Worker <0.18477.0> (for action {move,{1023,\n [\'n_0@127.0.0.1\',undefined],\n [\'n_1@127.0.0.1\',\'n_0@127.0.0.1\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.18573.0>,\n {{dcp_wait_for_data_move_failed,\n "default",\n 1023,\n \'n_0@127.0.0.1\',\n [\'n_1@127.0.0.1\'],\n {error,\n no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',\n 5,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 465}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 232}]}]}}}', u'shortText': u'message', u'serverTime': u'2019-07-16T10:03:45.454Z', u'module': u'ns_vbucket_mover', u'tstamp': 1563251625454, u'type': u'critical'}
|
Test started failing from 2019-07-15 22:32
During this time there were 2 new commits on kv_engine. I tried by reverting these 2 patches and all test passed.
http://ci-views.northscale.in/views-15.07.2019-21.30.fail.html
kv_engine is at a9f305b, changes since last good build:
|
a9f305bdf MB-35053: Set correct allowedDuplicatePrepareSeqnos
|
4834fe37b MB-34873: disk snapshots - Ack snapEnd seqno once persisted
|
From info.log, ns_server reported dcp_wait_for_data_move_failed.
[ns_server:info,2019-07-16T09:59:18.250+05:30,n_0@127.0.0.1:janitor_agent-default<0.1170.0>:janitor_agent:handle_info:942]Rebalancer <0.3367.0> died with reason {unexpected_exit,
|
{'EXIT',<0.3474.0>,
|
{{dcp_wait_for_data_move_failed,
|
"default",1021,'n_0@127.0.0.1',
|
['n_3@127.0.0.1','n_2@127.0.0.1'],
|
{error,no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',5,
|
[{file,
|
"src/ns_single_vbucket_mover.erl"},
|
{line,465}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},
|
{line,232}]}]}}}. Undoing temporary vbucket states caused by rebalance
|
And after this ns_server is unable to connect to memcached port. This happened for all tests related to rebalance.
[user:info,2019-07-16T09:59:24.455+05:30,n_0@127.0.0.1:ns_memcached-default<0.1161.0>:ns_memcached:do_terminate:726]Control connection to memcached on 'n_0@127.0.0.1' disconnected: {lost_connection,
|
[{ns_memcached,
|
worker_loop,
|
3,
|
[{file,
|
"src/ns_memcached.erl"},
|
{line,
|
197}]},
|
{proc_lib,
|
init_p_do_apply,
|
3,
|
[{file,
|
"proc_lib.erl"},
|
{line,
|
247}]}]}
|
|
[ns_server:warn,2019-07-16T09:59:25.556+05:30,n_0@127.0.0.1:<0.4013.0>:ns_memcached:connect:1051]Unable to connect: {error,{badmatch,{error,econnrefused}}}, retrying.
|
[ns_server:warn,2019-07-16T09:59:25.702+05:30,n_0@127.0.0.1:<0.3944.0>:ns_memcached:connect:1051]Unable to connect: {error,{badmatch,{error,econnrefused}}}, retrying.
|
Around this time memcached reported exception. It seems like this is the reason for the rebalance operation to exit
3269:2019-07-16T09:59:18.212183+05:30 ERROR 45: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:43285 - 127.0.0.1:11999 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f2bb5c96810","ewouldblock":false,"packet":{"bodylen":8,"cas":0,"datatype":"raw","extlen":8,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":1,"opcode":"DCP_SEQNO_ACKNOWLEDGED","vbucket":1021},"refcount":1}] - closing connection ([ 127.0.0.1:43285 - 127.0.0.1:11999 (<ud>@ns_server</ud>) ]): ActiveDurabilityMonitor::State::processSeqnoAck vb:1021 seqno(58) is greater than lastTrackedSeqno(0)
|
3279:2019-07-16T09:59:18.238653+05:30 ERROR 52: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:35871 - 127.0.0.1:11999 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f2bb6245810","ewouldblock":false,"packet":{"bodylen":8,"cas":0,"datatype":"raw","extlen":8,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":1,"opcode":"DCP_SEQNO_ACKNOWLEDGED","vbucket":1021},"refcount":1}] - closing connection ([ 127.0.0.1:35871 - 127.0.0.1:11999 (<ud>@ns_server</ud>) ]): ActiveDurabilityMonitor::State::processSeqnoAck vb:1021 seqno(58) is greater than lastTrackedSeqno(0)
|
Attachments
Issue Links
- is duplicated by
-
MB-35086 rebalance failure in build sanity - dcp_wait_for_data_move_failed error with mover_crashed
- Closed
For Gerrit Dashboard: MB-35096 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
112053,6 | MB-35096: Check received seqnoAck <= highest sent item | master | kv_engine | Status: MERGED | +2 | +1 |
112200,7 | MB-35096: Restore tests checking seqnoAck above lastTrackedWrite | master | kv_engine | Status: MERGED | +2 | +1 |