Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40480

Non-complete, unpersisted, "deleted" prepare can be removed from HashTable by the persistence of previous abort

    XMLWordPrintable

Details

    Description

      Build: 6.6.0-7880-enterprise

      Scenario:

      • 4 node cluster, Couchbase bucket (replica=2)
      • Rebalance out 1 node from the cluster
      • Initiate transaction in parallel to rebalance_out operation

        +----------------+-----------------+--------------+
        | Nodes          | Services        | Status       |
        +----------------+-----------------+--------------+
        | 172.23.107.52  | index, kv, n1ql | Cluster node |
        | 172.23.123.101 | kv              | --- OUT ---> |
        | 172.23.123.102 | kv              | Cluster node |
        | 172.23.123.100 | kv              | Cluster node |
        +----------------+-----------------+--------------+

      Observation:

      Seeing rebalance failure followed by memcached crash on master node - 172.23.107.52

      Service 'memcached' exited with status 134. Restarting. Messages:
      2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213]
      2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10]
      2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a]
      2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463]
      2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0]
      2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d]
      2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864]
      2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17]
      2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5]
      2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead]
       
      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.6670.0>,
      {{{{{child_interrupted,
      {'EXIT',<17502.2478.0>,socket_closed}},
      [{dcp_replicator,spawn_and_wait,1,
      [{file,"src/dcp_replicator.erl"}, {line,266}]},
      {dcp_replicator,handle_call,3,
      [{file,"src/dcp_replicator.erl"}, {line,121}]},
      {gen_server,try_handle_call,4,
      [{file,"gen_server.erl"},{line,636}]},
      {gen_server,handle_msg,6,
      [{file,"gen_server.erl"},{line,665}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"},{line,247}]}]},
      {gen_server,call,
      [<17502.2476.0>,get_partitions,infinity]}},
      {gen_server,call,
      ['dcp_replication_manager-default',
      {get_replicator_pid,543}, infinity]}},
      {gen_server,call,
      [{'janitor_agent-default',
      'ns_1@172.23.123.102'},
      {if_rebalance,<0.3620.0>,
      {update_vbucket_state,979,active,paused, undefined,
      [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}},
      infinity]}}}}}.
      Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3 
       
      Worker <0.6325.0> (for action {move,{979,
      ['ns_1@172.23.123.102',
      'ns_1@172.23.123.101',
      'ns_1@172.23.107.52'],
      ['ns_1@172.23.107.52',
      'ns_1@172.23.123.100',
      'ns_1@172.23.123.102'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT', <0.6670.0>,
      {{{{{child_interrupted,
      {'EXIT', <17502.2478.0>, socket_closed}},
      [{dcp_replicator, spawn_and_wait, 1,
      [{file, "src/dcp_replicator.erl"}, {line, 266}]},
      {dcp_replicator, handle_call, 3,
      [{file, "src/dcp_replicator.erl"}, {line, 121}]},
      {gen_server, try_handle_call, 4,
      [{file, "gen_server.erl"}, {line, 636}]},
      {gen_server, handle_msg, 6,
      [{file, "gen_server.erl"}, {line, 665}]},
      {proc_lib, init_p_do_apply, 3,
      [{file, "proc_lib.erl"}, {line, 247}]}]},
      {gen_server, call,
      [<17502.2476.0>,
      get_partitions, infinity]}},
      {gen_server, call,
      ['dcp_replication_manager-default',
      {get_replicator_pid, 543}, infinity]}},
      {gen_server, call,
      [{'janitor_agent-default',
      'ns_1@172.23.123.102'},
      {if_rebalance, <0.3620.0>,
      {update_vbucket_state,
      979, active, paused, undefined,
      [['ns_1@172.23.123.102',
      'ns_1@172.23.123.101',
      'ns_1@172.23.107.52']]}},
      infinity]}}}}

      Test to run:

      Atomicity.doc_isolation.IsolationDocTest.test_transaction_with_rebalance,nodes_init=4,replicas=2,num_items=20000,rebalance_type=out,nodes_out=1,doc_op=create,durability=PERSIST_TO_MAJORITY,services_init=kv;n1ql;index,rerun=False
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty