Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44199

ADM::updateHighCompletedSeqno - invariant failed: new value (2235) breaks invariant on current value (5933)

    XMLWordPrintable

Details

    Description

      Build: 7.0.0-4390

      Scenario:

      1. 5 node cluster
      2. Ephemeral bucket with 0 replicas and initial load for 50K docs
      3. Update replicas from 0 -> 3 and again from 3 -> 0 in sequential way

      Observation:

      During replica update of 2 -> 3, rebalance failed due to memcached crash

      {'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.', 'status': 'none'} - rebalance failed
      Latest logs from UI on 172.23.122.0:
      {'code': 0, 'module': 'ns_memcached', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861326343L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:06.343Z', 'text': 'Bucket "default" loaded on node 'ns_1@172.23.122.0' in 0 seconds.'}
      {'code': 0, 'module': 'ns_orchestrator', 'type': 'critical', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325543L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.543Z', 
      'text': 'Rebalance exited with reason
       {mover_crashed, {unexpected_exit, {'EXIT',<0.26741.5>,
        {{{{{child_interrupted,{'EXIT',<21750.15290.0>,socket_closed}},
        [{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,265}]},
         {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]},
         {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,661}]},
         {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,690}]},
         {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,249}]}]},
         {gen_server,call, [<21750.15288.0>, {setup_replication, [103,104,105,106,107,108,109,110,111,112,113,114,115,143,144,145,146,147,149,150,151,152,153,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,409,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,564,918,919,920,972,989,990,991,992,993,997,998,999,1000,1001,1002,1003,1004,1005,1012]},infinity]}},{gen_server,call,['replication_manager-default',{change_vbucket_replication,1012,'ns_1@172.23.122.0'},infinity]}},{gen_server,call,[{'janitor_agent-default','ns_1@172.23.121.93'},{if_rebalance,<0.22841.5>,{update_vbucket_state,478,active,paused,undefined,[['ns_1@172.23.121.93','ns_1@172.23.121.94','ns_1@172.23.122.0'],['ns_1@172.23.122.0','ns_1@172.23.121.94','ns_1@172.23.121.93','ns_1@172.23.121.89']]}},infinity]}}}}}.
         Rebalance Operation Id = 516c27e651d6752ba09189b3a77c09d5'}
      {'code': 0, 'module': 'ns_vbucket_mover', 'type': 'critical', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325327L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.327Z', 
      'text': 'Worker <0.26180.5> 
      (for action {move,{478,['ns_1@172.23.121.93','ns_1@172.23.121.94','ns_1@172.23.122.0'],['ns_1@172.23.122.0','ns_1@172.23.121.94','ns_1@172.23.121.93','ns_1@172.23.121.89'],[]}})
      exited with reason 
      {unexpected_exit,
       {'EXIT',<0.26741.5>,
        {{{{{child_interrupted,{'EXIT',<21750.15290.0>,socket_closed}},
         [{dcp_replicator,spawn_and_wait,1,[{file,"src/dcp_replicator.erl"},{line,265}]},
          {dcp_replicator,handle_call,3,[{file,"src/dcp_replicator.erl"},{line,121}]},
          {gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},
          {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},
          {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}
         ]},
         {gen_server,call,[<21750.15288.0>,{setup_replication,[103,104,105,106,107,108,109,110,111,112,113,114,115,143,144,145,146,147,149,150,151,152,153,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,409,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,564,918,919,920,972,989,990,991,992,993,997,998,999,1000,1001,1002,1003,1004,1005,1012]},infinity]}},
         {gen_server,call,['replication_manager-default',{change_vbucket_replication,1012,'ns_1@172.23.122.0'},infinity]}},
         {gen_server,call,[{'janitor_agent-default','ns_1@172.23.121.93'},{if_rebalance,<0.22841.5>,{update_vbucket_state,478,active,paused,undefined,[['ns_1@172.23.121.93','ns_1@172.23.121.94','ns_1@172.23.122.0'],['ns_1@172.23.122.0',ns_1@172.23.121.94','ns_1@172.23.121.93','ns_1@172.23.121.89']]}},infinity]
      }}}}'}
      {'code': 0, 'module': 'ns_log', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325310L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.310Z', 'text': u"Service 'memcached' exited with status 134. 
      Restarting. Messages:
      2021-02-09T01:02:05.244758-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f8d7d09a000+0x8efd1]
      2021-02-09T01:02:05.244770-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x16ee03]
      2021-02-09T01:02:05.244781-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x168e62]
      2021-02-09T01:02:05.244793-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x2e97f6]
      2021-02-09T01:02:05.244821-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x2d1cea]
      2021-02-09T01:02:05.244831-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x2ec919]
      2021-02-09T01:02:05.244840-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f8d809cb000+0x1672a3]
      2021-02-09T01:02:05.244867-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f8d7d09a000+0xb9dcf]
      2021-02-09T01:02:05.244879-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f8d7c965000+0x7e65]
      2021-02-09T01:02:05.244909-08:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f8d7c597000+0xfe88d]"}
      {'code': 0, 'module': 'ns_memcached', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861325299L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:05.299Z', 'text': u"Control connection to memcached on 'ns_1@172.23.122.0' disconnected. Check logs for details."}
      {'code': 0, 'module': 'ns_vbucket_mover', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861322977L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:02.977Z', 'text': 'Bucket "default" rebalance does not seem to be swap rebalance'}
      {'code': 0, 'module': 'ns_rebalancer', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861322331L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:02.331Z', 'text': 'Started rebalancing bucket default'}
      {'code': 0, 'module': 'ns_orchestrator', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861321935L, 'shortText': 'message', 'serverTime': '2021-02-09T01:02:01.935Z', 'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.121.94','ns_1@172.23.121.89', 'ns_1@172.23.121.93','ns_1@172.23.122.157','ns_1@172.23.122.0'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 516c27e651d6752ba09189b3a77c09d5"}
      {'code': 0, 'module': 'menelaus_web_buckets', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861308637L, 'shortText': 'message', 'serverTime': '2021-02-09T01:01:48.637Z', 'text': 'Updated bucket "default" (of type ephemeral) properties:[{num_replicas,3},{ram_quota,2660237312},{storage_mode,ephemeral}]'}
      {'code': 0, 'module': 'ns_orchestrator', 'type': 'info', 'node': 'ns_1@172.23.122.0', 'tstamp': 1612861162239L, 'shortText': 'message', 'serverTime': '2021-02-09T00:59:22.239Z', 'text': 'Rebalance completed successfully.Rebalance Operation Id = 6c277267a1792518345685f9cf925dc6'}
      Rebalance Failed: {'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.', 'status': 'none'} - rebalance failed
      

      Attached Memcached back trace with the ticket

      TAF test:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.21850.ini bucket_type=ephemeral,num_items=50000,durability=MAJORITY,sdk_timeout=60,num_items=500000,GROUP=P0,upgrade_version=7.0.0-4390 -t bucket_param_tests.bucket_param_update.BucketParamTest.test_replica_update,bucket_type=ephemeral,upgrade_version=7.0.0-4390,rerun=True,GROUP=P0,doc_ops=update;create;delete,sdk_timeout=60,get-cbcollect-info=False,replicas=0,durability=MAJORITY,log_level=debug,nodes_init=5,num_items=500000,infra_log_level=critical'
      

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            ashwin.govindarajulu Ashwin Govindarajulu created issue -
            owend Daniel Owen added a comment -

            On node 172.23.122.0 see

            2021-02-09T01:02:05.002691-08:00 ERROR (default) (vb:409) ActiveDurabilityMonitor::State:::updateHighCompletedSeqno: Monotonic<l> (ActiveDM(vb:409)::State::highCompletedSeqno) invariant failed: new value (2235) breaks invariant on current value (5933)
            #trackedWrites:0 highPreparedSeqno:5934 highCompletedSeqno:5933 lastTrackedSeqno:5933 lastCommittedSeqno:2235 lastAbortedSeqno:0 trackedWrites:[
            ]
            firstChain: Chain[0x7f8cedb84d10] name:First majority:3 active:ns_1@172.23.122.0 maxAllowedReplicas:2 positions:[
                ns_1@172.23.121.94: {lastAck:5933 lastWrite:2235 it:<end>}
                ns_1@172.23.122.157: {lastAck:0 lastWrite:0 it:<end>}
                ns_1@172.23.122.0: {lastAck:0 lastWrite:5934 it:<end>}
                ns_1@172.23.121.93: {lastAck:5933 lastWrite:2235 it:<end>}
            ]
            secondChain: <null>
            

            Marking as dup of MB-44079

            owend Daniel Owen added a comment - On node 172.23.122.0 see 2021-02-09T01:02:05.002691-08:00 ERROR (default) (vb:409) ActiveDurabilityMonitor::State:::updateHighCompletedSeqno: Monotonic<l> (ActiveDM(vb:409)::State::highCompletedSeqno) invariant failed: new value (2235) breaks invariant on current value (5933) #trackedWrites:0 highPreparedSeqno:5934 highCompletedSeqno:5933 lastTrackedSeqno:5933 lastCommittedSeqno:2235 lastAbortedSeqno:0 trackedWrites:[ ] firstChain: Chain[0x7f8cedb84d10] name:First majority:3 active:ns_1@172.23.122.0 maxAllowedReplicas:2 positions:[ ns_1@172.23.121.94: {lastAck:5933 lastWrite:2235 it:<end>} ns_1@172.23.122.157: {lastAck:0 lastWrite:0 it:<end>} ns_1@172.23.122.0: {lastAck:0 lastWrite:5934 it:<end>} ns_1@172.23.121.93: {lastAck:5933 lastWrite:2235 it:<end>} ] secondChain: <null> Marking as dup of MB-44079
            owend Daniel Owen made changes -
            Field Original Value New Value
            Link This issue duplicates MB-44079 [ MB-44079 ]
            owend Daniel Owen made changes -
            Summary Rebalance failed during "change_vbucket_replication::update_vbucket_state" ADM::updateHighCompletedSeqno - invariant failed: new value (2235) breaks invariant on current value (5933)
            owend Daniel Owen made changes -
            Assignee Daniel Owen [ owend ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]

            Closing this ticket.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Closing this ticket.
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            paolo.cocchi Paolo Cocchi added a comment - - edited

            For reference, it would be useful to get a reproduction on builds>=7.0.0-4412 + collecting pcaps (same as MB-44079). Thanks

            paolo.cocchi Paolo Cocchi added a comment - - edited For reference, it would be useful to get a reproduction on builds>=7.0.0-4412 + collecting pcaps (same as MB-44079 ). Thanks
            lynn.straus Lynn Straus made changes -
            Fix Version/s 7.0.0 [ 17233 ]
            lynn.straus Lynn Straus made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty