Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40480

Non-complete, unpersisted, "deleted" prepare can be removed from HashTable by the persistence of previous abort

    XMLWordPrintable

    Details

      Description

      Build: 6.6.0-7880-enterprise

      Scenario:

      • 4 node cluster, Couchbase bucket (replica=2)
      • Rebalance out 1 node from the cluster
      • Initiate transaction in parallel to rebalance_out operation

        +----------------+-----------------+--------------+
        | Nodes          | Services        | Status       |
        +----------------+-----------------+--------------+
        | 172.23.107.52  | index, kv, n1ql | Cluster node |
        | 172.23.123.101 | kv              | --- OUT ---> |
        | 172.23.123.102 | kv              | Cluster node |
        | 172.23.123.100 | kv              | Cluster node |
        +----------------+-----------------+--------------+

      Observation:

      Seeing rebalance failure followed by memcached crash on master node - 172.23.107.52

      Service 'memcached' exited with status 134. Restarting. Messages:
      2020-07-14T23:30:54.406403-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2e4bcfd000+0x8f213]
      2020-07-14T23:30:54.406414-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xccc10]
      2020-07-14T23:30:54.406426-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xc805a]
      2020-07-14T23:30:54.406434-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xca463]
      2020-07-14T23:30:54.406441-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x18f5a0]
      2020-07-14T23:30:54.406447-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0xcf98d]
      2020-07-14T23:30:54.406454-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f2e466a5000+0x12b864]
      2020-07-14T23:30:54.406459-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f2e4ddac000+0x8f17]
      2020-07-14T23:30:54.406467-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f2e4b5c8000+0x7dd5]
      2020-07-14T23:30:54.406499-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2e4b1fb000+0xfdead]
       
      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.6670.0>,
      {{{{{child_interrupted,
      {'EXIT',<17502.2478.0>,socket_closed}},
      [{dcp_replicator,spawn_and_wait,1,
      [{file,"src/dcp_replicator.erl"}, {line,266}]},
      {dcp_replicator,handle_call,3,
      [{file,"src/dcp_replicator.erl"}, {line,121}]},
      {gen_server,try_handle_call,4,
      [{file,"gen_server.erl"},{line,636}]},
      {gen_server,handle_msg,6,
      [{file,"gen_server.erl"},{line,665}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"},{line,247}]}]},
      {gen_server,call,
      [<17502.2476.0>,get_partitions,infinity]}},
      {gen_server,call,
      ['dcp_replication_manager-default',
      {get_replicator_pid,543}, infinity]}},
      {gen_server,call,
      [{'janitor_agent-default',
      'ns_1@172.23.123.102'},
      {if_rebalance,<0.3620.0>,
      {update_vbucket_state,979,active,paused, undefined,
      [['ns_1@172.23.123.102', 'ns_1@172.23.123.101', 'ns_1@172.23.107.52']]}},
      infinity]}}}}}.
      Rebalance Operation Id = 322d92a2335598e144eb0bb97f14f1a3 
       
      Worker <0.6325.0> (for action {move,{979,
      ['ns_1@172.23.123.102',
      'ns_1@172.23.123.101',
      'ns_1@172.23.107.52'],
      ['ns_1@172.23.107.52',
      'ns_1@172.23.123.100',
      'ns_1@172.23.123.102'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT', <0.6670.0>,
      {{{{{child_interrupted,
      {'EXIT', <17502.2478.0>, socket_closed}},
      [{dcp_replicator, spawn_and_wait, 1,
      [{file, "src/dcp_replicator.erl"}, {line, 266}]},
      {dcp_replicator, handle_call, 3,
      [{file, "src/dcp_replicator.erl"}, {line, 121}]},
      {gen_server, try_handle_call, 4,
      [{file, "gen_server.erl"}, {line, 636}]},
      {gen_server, handle_msg, 6,
      [{file, "gen_server.erl"}, {line, 665}]},
      {proc_lib, init_p_do_apply, 3,
      [{file, "proc_lib.erl"}, {line, 247}]}]},
      {gen_server, call,
      [<17502.2476.0>,
      get_partitions, infinity]}},
      {gen_server, call,
      ['dcp_replication_manager-default',
      {get_replicator_pid, 543}, infinity]}},
      {gen_server, call,
      [{'janitor_agent-default',
      'ns_1@172.23.123.102'},
      {if_rebalance, <0.3620.0>,
      {update_vbucket_state,
      979, active, paused, undefined,
      [['ns_1@172.23.123.102',
      'ns_1@172.23.123.101',
      'ns_1@172.23.107.52']]}},
      infinity]}}}}

      Test to run:

      Atomicity.doc_isolation.IsolationDocTest.test_transaction_with_rebalance,nodes_init=4,replicas=2,num_items=20000,rebalance_type=out,nodes_out=1,doc_op=create,durability=PERSIST_TO_MAJORITY,services_init=kv;n1ql;index,rerun=False
      

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-7.0.0-2712 contains kv_engine commit 6842000 with commit message:
          MB-40480: Compare seqno at VBucket::deletedOnDiskCbk

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2712 contains kv_engine commit 6842000 with commit message: MB-40480 : Compare seqno at VBucket::deletedOnDiskCbk
          Hide
          ashwin.govindarajulu Ashwin Govindarajulu added a comment -

          Not seeing this issue in latest build. Verified using 6.6.0-7891.

          Hence closing this ticket.

          Show
          ashwin.govindarajulu Ashwin Govindarajulu added a comment - Not seeing this issue in latest build. Verified using 6.6.0-7891. Hence closing this ticket.
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-6.6.0-7888 contains kv_engine commit 6842000 with commit message:
          MB-40480: Compare seqno at VBucket::deletedOnDiskCbk

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7888 contains kv_engine commit 6842000 with commit message: MB-40480 : Compare seqno at VBucket::deletedOnDiskCbk
          Hide
          ben.huddleston Ben Huddleston added a comment -

          Affects 6.5.0 and 6.5.1.

          Show
          ben.huddleston Ben Huddleston added a comment - Affects 6.5.0 and 6.5.1.
          Hide
          drigby Dave Rigby added a comment -

          This is most likely a latent bug with SyncDeletes that has come up now as changing transactions to store staged mutations as deleted has increased the likelihood of hitting it.

          Can you confirm, and if so update the affectsVersion to include 6.5.0 and 6.5.1 please?

          Show
          drigby Dave Rigby added a comment - This is most likely a latent bug with SyncDeletes that has come up now as changing transactions to store staged mutations as deleted has increased the likelihood of hitting it. Can you confirm, and if so update the affectsVersion to include 6.5.0 and 6.5.1 please?

            People

            Assignee:
            ashwin.govindarajulu Ashwin Govindarajulu
            Reporter:
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Due:
              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty