Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40480

Non-complete, unpersisted, "deleted" prepare can be removed from HashTable by the persistence of previous abort

    XMLWordPrintable

Details

    Description

      Build: 6.6.0-7880-enterprise

      Scenario:

      • 4 node cluster, Couchbase bucket (replica=2)
      • Rebalance out 1 node from the cluster
      • Initiate transaction in parallel to rebalance_out operation

        +----------------+-----------------+--------------+
        | Nodes          | Services        | Status       |
        +----------------+-----------------+--------------+
        | 172.23.107.52  | index, kv, n1ql | Cluster node |
        | 172.23.123.101 | kv              | --- OUT ---> |
        | 172.23.123.102 | kv              | Cluster node |
        | 172.23.123.100 | kv              | Cluster node |
        +----------------+-----------------+--------------+

      Observation:

      Seeing rebalance failure followed by memcached crash on master node - 172.23.107.52

      Service 'memcached' exited with status 134. Restarting. Messages:
      2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213]
      2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10]
      2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a]
      2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463]
      2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0]
      2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d]
      2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864]
      2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17]
      2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5]
      2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead]
       
      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.6670.0>,
      {{{{{child_interrupted,
      {'EXIT',<17502.2478.0>,socket_closed}},
      [{dcp_replicator,spawn_and_wait,1,
      [{file,"src/dcp_replicator.erl"}, {line,266}]},
      {dcp_replicator,handle_call,3,
      [{file,"src/dcp_replicator.erl"}, {line,121}]},
      {gen_server,try_handle_call,4,
      [{file,"gen_server.erl"},{line,636}]},
      {gen_server,handle_msg,6,
      [{file,"gen_server.erl"},{line,665}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"},{line,247}]}]},
      {gen_server,call,
      [<17502.2476.0>,get_partitions,infinity]}},
      {gen_server,call,
      ['dcp_replication_manager-default',
      {get_replicator_pid,543}, infinity]}},
      {gen_server,call,
      [{'janitor_agent-default',
      'ns_1@172.23.123.102'},
      {if_rebalance,<0.3620.0>,
      {update_vbucket_state,979,active,paused, undefined,
      [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}},
      infinity]}}}}}.
      Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3 
       
      Worker <0.6325.0> (for action {move,{979,
      ['ns_1@172.23.123.102',
      'ns_1@172.23.123.101',
      'ns_1@172.23.107.52'],
      ['ns_1@172.23.107.52',
      'ns_1@172.23.123.100',
      'ns_1@172.23.123.102'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT', <0.6670.0>,
      {{{{{child_interrupted,
      {'EXIT', <17502.2478.0>, socket_closed}},
      [{dcp_replicator, spawn_and_wait, 1,
      [{file, "src/dcp_replicator.erl"}, {line, 266}]},
      {dcp_replicator, handle_call, 3,
      [{file, "src/dcp_replicator.erl"}, {line, 121}]},
      {gen_server, try_handle_call, 4,
      [{file, "gen_server.erl"}, {line, 636}]},
      {gen_server, handle_msg, 6,
      [{file, "gen_server.erl"}, {line, 665}]},
      {proc_lib, init_p_do_apply, 3,
      [{file, "proc_lib.erl"}, {line, 247}]}]},
      {gen_server, call,
      [<17502.2476.0>,
      get_partitions, infinity]}},
      {gen_server, call,
      ['dcp_replication_manager-default',
      {get_replicator_pid, 543}, infinity]}},
      {gen_server, call,
      [{'janitor_agent-default',
      'ns_1@172.23.123.102'},
      {if_rebalance, <0.3620.0>,
      {update_vbucket_state,
      979, active, paused, undefined,
      [['ns_1@172.23.123.102',
      'ns_1@172.23.123.101',
      'ns_1@172.23.107.52']]}},
      infinity]}}}}

      Test to run:

      Atomicity.doc_isolation.IsolationDocTest.test_transaction_with_rebalance,nodes_init=4,replicas=2,num_items=20000,rebalance_type=out,nodes_out=1,doc_op=create,durability=PERSIST_TO_MAJORITY,services_init=kv;n1ql;index,rerun=False
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          drigby Dave Rigby added a comment -

          This is most likely a latent bug with SyncDeletes that has come up now as changing transactions to store staged mutations as deleted has increased the likelihood of hitting it.

          Can you confirm, and if so update the affectsVersion to include 6.5.0 and 6.5.1 please?

          drigby Dave Rigby added a comment - This is most likely a latent bug with SyncDeletes that has come up now as changing transactions to store staged mutations as deleted has increased the likelihood of hitting it. Can you confirm, and if so update the affectsVersion to include 6.5.0 and 6.5.1 please?

          Affects 6.5.0 and 6.5.1.

          ben.huddleston Ben Huddleston added a comment - Affects 6.5.0 and 6.5.1.

          Build couchbase-server-6.6.0-7888 contains kv_engine commit 6842000 with commit message:
          MB-40480: Compare seqno at VBucket::deletedOnDiskCbk

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7888 contains kv_engine commit 6842000 with commit message: MB-40480 : Compare seqno at VBucket::deletedOnDiskCbk

          Not seeing this issue in latest build. Verified using 6.6.0-7891.

          Hence closing this ticket.

          ashwin.govindarajulu Ashwin Govindarajulu added a comment - Not seeing this issue in latest build. Verified using 6.6.0-7891. Hence closing this ticket.

          Build couchbase-server-7.0.0-2712 contains kv_engine commit 6842000 with commit message:
          MB-40480: Compare seqno at VBucket::deletedOnDiskCbk

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2712 contains kv_engine commit 6842000 with commit message: MB-40480 : Compare seqno at VBucket::deletedOnDiskCbk

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty