Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
6.5.0, 6.5.1, 6.6.0
-
6.6.0-7880-enterprise
couchbase-transactions-1.1.0-SNAPSHOT.jar
java-client-3.0.6-SNAPSHOT.jar
-
Untriaged
-
Centos 64-bit
-
-
1
-
No
Description
Build: 6.6.0-7880-enterprise
Scenario:
- 4 node cluster, Couchbase bucket (replica=2)
- Rebalance out 1 node from the cluster
- Initiate transaction in parallel to rebalance_out operation
+----------------+-----------------+--------------+
| Nodes | Services | Status |
+----------------+-----------------+--------------+
| 172.23.107.52 | index, kv, n1ql | Cluster node |
| 172.23.123.101 | kv | --- OUT ---> |
| 172.23.123.102 | kv | Cluster node |
| 172.23.123.100 | kv | Cluster node |
+----------------+-----------------+--------------+
Observation:
Seeing rebalance failure followed by memcached crash on master node - 172.23.107.52
Service 'memcached' exited with status 134. Restarting. Messages:
|
2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213]
|
2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10]
|
2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a]
|
2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463]
|
2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0]
|
2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d]
|
2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864]
|
2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17]
|
2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5]
|
2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead]
|
|
Rebalance exited with reason {mover_crashed,
|
{unexpected_exit,
|
{'EXIT',<0.6670.0>,
|
{{{{{child_interrupted,
|
{'EXIT',<17502.2478.0>,socket_closed}},
|
[{dcp_replicator,spawn_and_wait,1,
|
[{file,"src/dcp_replicator.erl"}, {line,266}]},
|
{dcp_replicator,handle_call,3,
|
[{file,"src/dcp_replicator.erl"}, {line,121}]},
|
{gen_server,try_handle_call,4,
|
[{file,"gen_server.erl"},{line,636}]},
|
{gen_server,handle_msg,6,
|
[{file,"gen_server.erl"},{line,665}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,247}]}]},
|
{gen_server,call,
|
[<17502.2476.0>,get_partitions,infinity]}},
|
{gen_server,call,
|
['dcp_replication_manager-default',
|
{get_replicator_pid,543}, infinity]}},
|
{gen_server,call,
|
[{'janitor_agent-default',
|
'ns_1@172.23.123.102'},
|
{if_rebalance,<0.3620.0>,
|
{update_vbucket_state,979,active,paused, undefined,
|
[['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}},
|
infinity]}}}}}.
|
Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3
|
|
Worker <0.6325.0> (for action {move,{979,
|
['ns_1@172.23.123.102',
|
'ns_1@172.23.123.101',
|
'ns_1@172.23.107.52'],
|
['ns_1@172.23.107.52',
|
'ns_1@172.23.123.100',
|
'ns_1@172.23.123.102'],
|
[]}}) exited with reason {unexpected_exit,
|
{'EXIT', <0.6670.0>,
|
{{{{{child_interrupted,
|
{'EXIT', <17502.2478.0>, socket_closed}},
|
[{dcp_replicator, spawn_and_wait, 1,
|
[{file, "src/dcp_replicator.erl"}, {line, 266}]},
|
{dcp_replicator, handle_call, 3,
|
[{file, "src/dcp_replicator.erl"}, {line, 121}]},
|
{gen_server, try_handle_call, 4,
|
[{file, "gen_server.erl"}, {line, 636}]},
|
{gen_server, handle_msg, 6,
|
[{file, "gen_server.erl"}, {line, 665}]},
|
{proc_lib, init_p_do_apply, 3,
|
[{file, "proc_lib.erl"}, {line, 247}]}]},
|
{gen_server, call,
|
[<17502.2476.0>,
|
get_partitions, infinity]}},
|
{gen_server, call,
|
['dcp_replication_manager-default',
|
{get_replicator_pid, 543}, infinity]}},
|
{gen_server, call,
|
[{'janitor_agent-default',
|
'ns_1@172.23.123.102'},
|
{if_rebalance, <0.3620.0>,
|
{update_vbucket_state,
|
979, active, paused, undefined,
|
[['ns_1@172.23.123.102',
|
'ns_1@172.23.123.101',
|
'ns_1@172.23.107.52']]}},
|
infinity]}}}}
|
Test to run:
Atomicity.doc_isolation.IsolationDocTest.test_transaction_with_rebalance,nodes_init=4,replicas=2,num_items=20000,rebalance_type=out,nodes_out=1,doc_op=create,durability=PERSIST_TO_MAJORITY,services_init=kv;n1ql;index,rerun=False
|
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Description |
*Build*: 6.6.0-7880-enterprise
*Scenario*: * 4 node cluster, Couchbase bucket (replica=2) * Rebalance out 1 node from the cluster * Initiate transaction in parallel to rebalance_out operation {noformat} +----------------+-----------------+--------------+ | Nodes | Services | Status | +----------------+-----------------+--------------+ | 172.23.107.52 | index, kv, n1ql | Cluster node | | 172.23.123.101 | kv | --- OUT ---> | | 172.23.123.102 | kv | Cluster node | | 172.23.123.100 | kv | Cluster node | +----------------+-----------------+--------------+{noformat} *Observation:* Seeing rebalance failure followed by memcached crash on master node {noformat} Service 'memcached' exited with status 134. Restarting. Messages: 2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213] 2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10] 2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a] 2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463] 2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0] 2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d] 2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864] 2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17] 2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5] 2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead] Rebalance exited with reason {mover_crashed, {unexpected_exit, {'EXIT',<0.6670.0>, {{{{{child_interrupted, {'EXIT',<17502.2478.0>,socket_closed}}, [{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,266}]}, {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]}, {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,636}]}, {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,665}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,247}]}]}, {gen_server,call, [<17502.2476.0>,get_partitions,infinity]}}, {gen_server,call, ['dcp_replication_manager-default', {get_replicator_pid,543}, infinity]}}, {gen_server,call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance,<0.3620.0>, {update_vbucket_state,979,active,paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}}. Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3 Worker <0.6325.0> (for action {move,{979, ['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52'], ['ns_1@172.23.107.52', 'ns_1@172.23.123.100', 'ns_1@172.23.123.102'], []}}) exited with reason {unexpected_exit, {'EXIT', <0.6670.0>, {{{{{child_interrupted, {'EXIT', <17502.2478.0>, socket_closed}}, [{dcp_replicator, spawn_and_wait, 1, [{file, "src/dcp_replicator.erl"}, {line, 266}]}, {dcp_replicator, handle_call, 3, [{file, "src/dcp_replicator.erl"}, {line, 121}]}, {gen_server, try_handle_call, 4, [{file, "gen_server.erl"}, {line, 636}]}, {gen_server, handle_msg, 6, [{file, "gen_server.erl"}, {line, 665}]}, {proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 247}]}]}, {gen_server, call, [<17502.2476.0>, get_partitions, infinity]}}, {gen_server, call, ['dcp_replication_manager-default', {get_replicator_pid, 543}, infinity]}}, {gen_server, call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance, <0.3620.0>, {update_vbucket_state, 979, active, paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}{noformat} |
*Build*: 6.6.0-7880-enterprise
*Scenario*: * 4 node cluster, Couchbase bucket (replica=2) * Rebalance out 1 node from the cluster * Initiate transaction in parallel to rebalance_out operation {noformat} +----------------+-----------------+--------------+ | Nodes | Services | Status | +----------------+-----------------+--------------+ | 172.23.107.52 | index, kv, n1ql | Cluster node | | 172.23.123.101 | kv | --- OUT ---> | | 172.23.123.102 | kv | Cluster node | | 172.23.123.100 | kv | Cluster node | +----------------+-----------------+--------------+{noformat} *Observation:* Seeing rebalance failure followed by memcached crash on master node - 172.23.107.52 {noformat} Service 'memcached' exited with status 134. Restarting. Messages: 2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213] 2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10] 2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a] 2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463] 2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0] 2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d] 2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864] 2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17] 2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5] 2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead] Rebalance exited with reason {mover_crashed, {unexpected_exit, {'EXIT',<0.6670.0>, {{{{{child_interrupted, {'EXIT',<17502.2478.0>,socket_closed}}, [{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,266}]}, {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]}, {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,636}]}, {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,665}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,247}]}]}, {gen_server,call, [<17502.2476.0>,get_partitions,infinity]}}, {gen_server,call, ['dcp_replication_manager-default', {get_replicator_pid,543}, infinity]}}, {gen_server,call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance,<0.3620.0>, {update_vbucket_state,979,active,paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}}. Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3 Worker <0.6325.0> (for action {move,{979, ['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52'], ['ns_1@172.23.107.52', 'ns_1@172.23.123.100', 'ns_1@172.23.123.102'], []}}) exited with reason {unexpected_exit, {'EXIT', <0.6670.0>, {{{{{child_interrupted, {'EXIT', <17502.2478.0>, socket_closed}}, [{dcp_replicator, spawn_and_wait, 1, [{file, "src/dcp_replicator.erl"}, {line, 266}]}, {dcp_replicator, handle_call, 3, [{file, "src/dcp_replicator.erl"}, {line, 121}]}, {gen_server, try_handle_call, 4, [{file, "gen_server.erl"}, {line, 636}]}, {gen_server, handle_msg, 6, [{file, "gen_server.erl"}, {line, 665}]}, {proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 247}]}]}, {gen_server, call, [<17502.2476.0>, get_partitions, infinity]}}, {gen_server, call, ['dcp_replication_manager-default', {get_replicator_pid, 543}, infinity]}}, {gen_server, call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance, <0.3620.0>, {update_vbucket_state, 979, active, paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}{noformat} |
Assignee | Daniel Owen [ owend ] | Ben Huddleston [ ben.huddleston ] |
Due Date | 20/Jul/20 |
Description |
*Build*: 6.6.0-7880-enterprise
*Scenario*: * 4 node cluster, Couchbase bucket (replica=2) * Rebalance out 1 node from the cluster * Initiate transaction in parallel to rebalance_out operation {noformat} +----------------+-----------------+--------------+ | Nodes | Services | Status | +----------------+-----------------+--------------+ | 172.23.107.52 | index, kv, n1ql | Cluster node | | 172.23.123.101 | kv | --- OUT ---> | | 172.23.123.102 | kv | Cluster node | | 172.23.123.100 | kv | Cluster node | +----------------+-----------------+--------------+{noformat} *Observation:* Seeing rebalance failure followed by memcached crash on master node - 172.23.107.52 {noformat} Service 'memcached' exited with status 134. Restarting. Messages: 2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213] 2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10] 2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a] 2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463] 2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0] 2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d] 2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864] 2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17] 2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5] 2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead] Rebalance exited with reason {mover_crashed, {unexpected_exit, {'EXIT',<0.6670.0>, {{{{{child_interrupted, {'EXIT',<17502.2478.0>,socket_closed}}, [{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,266}]}, {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]}, {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,636}]}, {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,665}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,247}]}]}, {gen_server,call, [<17502.2476.0>,get_partitions,infinity]}}, {gen_server,call, ['dcp_replication_manager-default', {get_replicator_pid,543}, infinity]}}, {gen_server,call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance,<0.3620.0>, {update_vbucket_state,979,active,paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}}. Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3 Worker <0.6325.0> (for action {move,{979, ['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52'], ['ns_1@172.23.107.52', 'ns_1@172.23.123.100', 'ns_1@172.23.123.102'], []}}) exited with reason {unexpected_exit, {'EXIT', <0.6670.0>, {{{{{child_interrupted, {'EXIT', <17502.2478.0>, socket_closed}}, [{dcp_replicator, spawn_and_wait, 1, [{file, "src/dcp_replicator.erl"}, {line, 266}]}, {dcp_replicator, handle_call, 3, [{file, "src/dcp_replicator.erl"}, {line, 121}]}, {gen_server, try_handle_call, 4, [{file, "gen_server.erl"}, {line, 636}]}, {gen_server, handle_msg, 6, [{file, "gen_server.erl"}, {line, 665}]}, {proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 247}]}]}, {gen_server, call, [<17502.2476.0>, get_partitions, infinity]}}, {gen_server, call, ['dcp_replication_manager-default', {get_replicator_pid, 543}, infinity]}}, {gen_server, call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance, <0.3620.0>, {update_vbucket_state, 979, active, paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}{noformat} |
*Build*: 6.6.0-7880-enterprise
*Scenario*: * 4 node cluster, Couchbase bucket (replica=2) * Rebalance out 1 node from the cluster * Initiate transaction in parallel to rebalance_out operation {noformat}+----------------+-----------------+--------------+ | Nodes | Services | Status | +----------------+-----------------+--------------+ | 172.23.107.52 | index, kv, n1ql | Cluster node | | 172.23.123.101 | kv | --- OUT ---> | | 172.23.123.102 | kv | Cluster node | | 172.23.123.100 | kv | Cluster node | +----------------+-----------------+--------------+{noformat} *Observation:* Seeing rebalance failure followed by memcached crash on master node - 172.23.107.52 {noformat}Service 'memcached' exited with status 134. Restarting. Messages: 2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213] 2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10] 2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a] 2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463] 2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0] 2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d] 2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864] 2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17] 2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5] 2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead] Rebalance exited with reason {mover_crashed, {unexpected_exit, {'EXIT',<0.6670.0>, {{{{{child_interrupted, {'EXIT',<17502.2478.0>,socket_closed}}, [{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,266}]}, {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]}, {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,636}]}, {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,665}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,247}]}]}, {gen_server,call, [<17502.2476.0>,get_partitions,infinity]}}, {gen_server,call, ['dcp_replication_manager-default', {get_replicator_pid,543}, infinity]}}, {gen_server,call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance,<0.3620.0>, {update_vbucket_state,979,active,paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}}. Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3 Worker <0.6325.0> (for action {move,{979, ['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52'], ['ns_1@172.23.107.52', 'ns_1@172.23.123.100', 'ns_1@172.23.123.102'], []}}) exited with reason {unexpected_exit, {'EXIT', <0.6670.0>, {{{{{child_interrupted, {'EXIT', <17502.2478.0>, socket_closed}}, [{dcp_replicator, spawn_and_wait, 1, [{file, "src/dcp_replicator.erl"}, {line, 266}]}, {dcp_replicator, handle_call, 3, [{file, "src/dcp_replicator.erl"}, {line, 121}]}, {gen_server, try_handle_call, 4, [{file, "gen_server.erl"}, {line, 636}]}, {gen_server, handle_msg, 6, [{file, "gen_server.erl"}, {line, 665}]}, {proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 247}]}]}, {gen_server, call, [<17502.2476.0>, get_partitions, infinity]}}, {gen_server, call, ['dcp_replication_manager-default', {get_replicator_pid, 543}, infinity]}}, {gen_server, call, [{'janitor_agent-default', 'ns_1@172.23.123.102'}, {if_rebalance, <0.3620.0>, {update_vbucket_state, 979, active, paused, undefined, [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}}, infinity]}}}}{noformat} *Test to run:* {noformat} Atomicity.doc_isolation.IsolationDocTest.test_transaction_with_rebalance,nodes_init=4,replicas=2,num_items=20000,rebalance_type=out,nodes_out=1,doc_op=create,durability=PERSIST_TO_MAJORITY,services_init=kv;n1ql;index,rerun=False {noformat} |
Summary | [Doc isolation] Seeing rebalance failure with reason "mover crashed" followed by memcached crash | [Doc isolation] failed as no HashTable item found with key:<ud>cid:0x0:test_docs-00020907</ud> prepare_seqno:29, commit_seqno: 30 |
Summary | [Doc isolation] failed as no HashTable item found with key:<ud>cid:0x0:test_docs-00020907</ud> prepare_seqno:29, commit_seqno: 30 | [Doc isolation] failed as no HashTable item found with key:.... prepare_seqno:29, commit_seqno: 30 |
Due Date | 20/Jul/20 | 24/Jul/20 |
Labels | 6.6.0 Transactions functional-test | 6.6.0 Transactions approved-for-6.6.0 functional-test |
Link | This issue blocks MB-38724 [ MB-38724 ] |
Attachment | Screenshot 2020-07-21 at 09.12.47.png [ 101663 ] |
Attachment | Screenshot 2020-07-21 at 09.58.48.png [ 101666 ] |
Affects Version/s | 6.5.1 [ 16622 ] | |
Affects Version/s | 6.5.0 [ 15037 ] |
Summary | [Doc isolation] failed as no HashTable item found with key:.... prepare_seqno:29, commit_seqno: 30 | Non-complete, unpersisted, "deleted" prepare can be removed from HashTable by the persistence of previous abort |
Assignee | Ben Huddleston [ ben.huddleston ] | Ashwin Govindarajulu [ ashwin.govindarajulu ] |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
VERIFICATION STEPS |
Ran the same test few time and not hitting this issue.
Job: http://qa.sc.couchbase.com/job/oel6-4node-rebalance_in_jython/1128/console |
|
Status | Resolved [ 5 ] | Closed [ 6 ] |
On 172.23.107.52 seeing:
2020-07-14T23:30:47.535365-07:00 ERROR 92: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.52->ns_1@172.23.123.101:default - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":124,"opcode":"DCP_COMMIT","status":"Not found"}
2020-07-14T23:30:49.918099-07:00 ERROR 62: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.52->ns_1@172.23.123.101:default - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":124,"opcode":"DCP_COMMIT","status":"Not found"}
2020-07-14T23:30:54.336833-07:00 ERROR (default) VBucket::abort (vb:243) - active failed as no HashTableitem found with key:<ud>cid:0x0:test_docs-00007567</ud>