Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40370

[Doc_Isolation]: xattr::utils::check_len(2634346613) exceeds 287

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • No

    Description

       

      Build: 6.6.0-7861

      Scenario:

      1. Single node cluster, Couchbase bucket (replica=0)
      2. Start two parallel transactions (one will succeed and other will rollback)
      3. Rebalance in 3 nodes into the cluster in parallel to the transactions

      Observation:

      Seeing rebalance failure with reason "mover crashed - bulk_set_vbucket_state_failed".

      cbcollect logs:
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_failure/collectinfo-2020-07-09T180702-ns_1%40172.23.105.155.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_failure/collectinfo-2020-07-09T180702-ns_1%40172.23.105.159.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_failure/collectinfo-2020-07-09T180702-ns_1%40172.23.105.205.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_failure/collectinfo-2020-07-09T180702-ns_1%40172.23.105.206.zip
      Rebalance failure prints:

      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.23194.0>,
      {{bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.105.206',
      {'EXIT',
      {{{{{badmatch, [{<17504.5562.0>,
      {done,exit, {socket_closed,
      {gen_server,call,
      [<17504.5215.0>,
      {maybe_close_stream,905}, infinity]}},
      [{gen_server,call,3,
      [{file,"gen_server.erl"}, {line,214}]},
      {dcp_replicator, '-handle_call/3-fun-1-',2, 
      [{file, "src/dcp_replicator.erl"}, {line,128}]},
      {dcp_replicator, '-spawn_and_wait/1-fun-0-',1,
      [{file, "src/dcp_replicator.erl"}, {line,243}]}]}}]},
      [{misc, sync_shutdown_many_i_am_trapping_exits, 1,
      [{file,"src/misc.erl"}, {line,1374}]},
      {dcp_replicator,spawn_and_wait,1,
      [{file,"src/dcp_replicator.erl"}, {line,265}]},
      {dcp_replicator,handle_call,3,
      [{file,"src/dcp_replicator.erl"}, {line,127}]},
      {gen_server,try_handle_call,4,
      [{file,"gen_server.erl"}, {line,636}]},
      {gen_server,handle_msg,6,
      [{file,"gen_server.erl"}, {line,665}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"}, {line,247}]}]},
      {gen_server,call,
      [<17504.5214.0>,get_partitions, infinity]}},
      {gen_server,call,
      ['dcp_replication_manager-default',
      {get_replicator_pid,903},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default',
      'ns_1@172.23.105.206'},
      {if_rebalance,<0.3370.0>,
      {update_vbucket_state,900,pending,
      passive,'ns_1@172.23.105.155'}},
      infinity]}}}}]},
      [{janitor_agent,bulk_set_vbucket_state,4,
      [{file,"src/janitor_agent.erl"}, {line,403}]},
      {proc_lib,init_p,3,
      [{file,"proc_lib.erl"},{line,232}]}]}}}}.
      Rebalance Operation Id = 7a1aaa2b7b2c47644b91cb745f74d3be
       
      Worker <0.16520.0> (for action 
      {move,{433,
      ['ns_1@172.23.105.155'],
      ['ns_1@172.23.105.159'],
      []}}) 
      exited with reason {unexpected_exit,
      {'EXIT', <0.17484.0>,
      {{{{{badmatch, [{<25299.4924.0>,
      {done, exit,
      {socket_closed,
      {gen_server, call,
      [<25299.4462.0>, {maybe_close_stream, 433}, infinity]}},
      [{gen_server, call, 3,
      [{file, "gen_server.erl"}, {line, 214}]},
      {dcp_replicator, '-handle_call/3-fun-1-', 2,
      [{file, "src/dcp_replicator.erl"}, {line, 128}]},
      {dcp_replicator, '-spawn_and_wait/1-fun-0-', 1,
      [{file, "src/dcp_replicator.erl"}, {line, 243}]}]}}]},
      [{misc, sync_shutdown_many_i_am_trapping_exits, 1,
      [{file, "src/misc.erl"}, {line, 1374}]},
      {dcp_replicator, spawn_and_wait, 1,
      [{file, "src/dcp_replicator.erl"}, {line, 265}]},
      {dcp_replicator, handle_call, 3,
      [{file, "src/dcp_replicator.erl"}, {line, 127}]},
      {gen_server, try_handle_call, 4,
      [{file, "gen_server.erl"}, {line, 636}]},
      {gen_server, handle_msg, 6,
      [{file, "gen_server.erl"}, {line, 665}]},
      {proc_lib, init_p_do_apply, 3,
      [{file, "proc_lib.erl"}, {line, 247}]}]},
      {gen_server, call,
      [<25299.4461.0>,
      get_partitions, infinity]}},
      {gen_server, call,
      ['dcp_replication_manager-default',
      {get_replicator_pid, 431}, infinity]}},
      {gen_server, call,
      [{'janitor_agent-default', 'ns_1@172.23.105.159'},
      {if_rebalance, <0.4111.0>,
      {dcp_takeover, 'ns_1@172.23.105.155', 433}}, infinity]}}}} 

      Testcase:

      ./testrunner -i /tmp/5-centos-nodes-jython.ini rerun=False,get-cbcollect-info=False -t Atomicity.doc_isolation.IsolationDocTest.test_transaction_with_rebalance,nodes_init=1,replicas=0,rebalance_type=in,nodes_in=3,doc_op=create,GROUP=P1
      

       

       

      Attachments

        1. __dcp_prepare.png
          __dcp_prepare.png
          355 kB
        2. __subdoc.png
          __subdoc.png
          298 kB
        3. pcaps+test_log.zip
          8.85 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty