Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42780

[Upgrade] Rebalance_in failed with reason "bulk_set_vbucket_state_failed :: sync_shutdown_many_i_am_trapping_exits"

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 6.6.0
    • Fix Version/s: 6.6.1
    • Component/s: couchbase-bucket
    • Environment:
      Enterprise Edition 6.0.1 build 2037 (Existing 4 node cluster)
      Enterprise Edition 6.6.1 build 9182 (New node coming in)

      Description

       Scenario:

      1. 4 Node cluster (6.0.1 build 2037)
      2. Create couchbase-bucket with replica=1, size=100M
      3. Load bucket into DGM using the cbworkloadgen,

        ./cbworkloadgen -n 172.23.105.155:8091 -b default -u Administrator -p password --max-items=1200000 -r .95 -l

      4. Rebalance_in 6.6.1-9182 node into the cluster (172.23.105.244)

      Observation:

      Rebalance failed with following reason,

      Worker <0.3486.0> (for action {move,{577,
      ['ns_1@172.23.105.212',
      'ns_1@172.23.105.155'],
      ['ns_1@172.23.105.155',
      'ns_1@172.23.105.244'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT', <0.3512.0>,
      {{bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.105.244',
      {'EXIT',
      {{{{{badmatch,
      [{<0.3659.0>,
      {done, exit, {socket_closed,
      {gen_server, call, [<0.2333.0>,
      {setup_streams, [83, 339]}, infinity]}},
      [{gen_server, call, 3,
      [{file, "gen_server.erl"}, {line, 214}]},
      {dcp_replicator, '-spawn_and_wait/1-fun-0-', 1,
      [{file, "src/dcp_replicator.erl"}, {line, 243}]}]}}]},
      [{misc, sync_shutdown_many_i_am_trapping_exits, 1, 
      [{file, "src/misc.erl"}, {line, 1374}]},
      {dcp_replicator, spawn_and_wait, 1,
      [{file, "src/dcp_replicator.erl"}, {line, 265}]},
      {dcp_replicator, handle_call, 3,
      [{file, "src/dcp_replicator.erl"}, {line, 121}]},
      {gen_server, try_handle_call, 4,
      [{file, "gen_server.erl"}, {line, 636}]},
      {gen_server, handle_msg, 6,
      [{file, "gen_server.erl"}, {line, 665}]},
      {proc_lib, init_p_do_apply, 3,
      [{file, "proc_lib.erl"}, {line, 247}]}]},
      {gen_server, call,
      [<0.2332.0>, get_partitions, infinity]}},
      {gen_server, call,
      ['dcp_replication_manager-default',
      {get_replicator_pid, 83}, infinity]}},
      {gen_server, call,
      [{'janitor_agent-default', 'ns_1@172.23.105.244'},
      {if_rebalance, <0.2246.0>,
      {update_vbucket_state, 577, replica,
      undefined, 'ns_1@172.23.105.212'}}, infinity]}}}}]},
      [{janitor_agent, bulk_set_vbucket_state, 4,
      [{file, "src/janitor_agent.erl"}, {line, 403}]},
      {proc_lib, init_p,3,
      [{file, "proc_lib.erl"}, {line, 232}]}]}}} 

      Note: Hit this while trying to validate MB-41283

      CC Richard deMellow

       

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            ashwin.govindarajulu Ashwin Govindarajulu created issue -
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Field Original Value New Value
            Description  

            *Scenario*:
             # 4 Node cluster (6.0.1 build 2037)
             # Create couchbase-bucket with replica=1, size=100M
             # Load bucket into DGM using the cbworkloadgen,
            {noformat}
            ./cbworkloadgen -n 172.23.105.155:8091 -b default -u Administrator -p password --max-items=1200000 -r .95 -l{noformat}

             # Rebalance_in 6.6.1-9182 node into the cluster ({color:#000000}172.23.105.244{color})

            *Observation*:

            Rebalance failed with following reason,
            {noformat}
            Worker <0.3486.0> (for action {move,{577,
            ['ns_1@172.23.105.212',
            'ns_1@172.23.105.155'],
            ['ns_1@172.23.105.155',
            'ns_1@172.23.105.244'],
            []}}) exited with reason {unexpected_exit,
            {'EXIT', <0.3512.0>,
            {{bulk_set_vbucket_state_failed,
            [{'ns_1@172.23.105.244',
            {'EXIT',
            {{{{{badmatch,
            [{<0.3659.0>,
            {done, exit, {socket_closed,
            {gen_server, call, [<0.2333.0>,
            {setup_streams, [83, 339]}, infinity]}},
            [{gen_server, call, 3,
            [{file, "gen_server.erl"}, {line, 214}]},
            {dcp_replicator, '-spawn_and_wait/1-fun-0-', 1,
            [{file, "src/dcp_replicator.erl"}, {line, 243}]}]}}]},
            [{misc, sync_shutdown_many_i_am_trapping_exits, 1,
            [{file, "src/misc.erl"}, {line, 1374}]},
            {dcp_replicator, spawn_and_wait, 1,
            [{file, "src/dcp_replicator.erl"}, {line, 265}]},
            {dcp_replicator, handle_call, 3,
            [{file, "src/dcp_replicator.erl"}, {line, 121}]},
            {gen_server, try_handle_call, 4,
            [{file, "gen_server.erl"}, {line, 636}]},
            {gen_server, handle_msg, 6,
            [{file, "gen_server.erl"}, {line, 665}]},
            {proc_lib, init_p_do_apply, 3,
            [{file, "proc_lib.erl"}, {line, 247}]}]},
            {gen_server, call,
            [<0.2332.0>, get_partitions, infinity]}},
            {gen_server, call,
            ['dcp_replication_manager-default',
            {get_replicator_pid, 83}, infinity]}},
            {gen_server, call,
            [{'janitor_agent-default', 'ns_1@172.23.105.244'},
            {if_rebalance, <0.2246.0>,
            {update_vbucket_state, 577, replica,
            undefined, 'ns_1@172.23.105.212'}}, infinity]}}}}]},
            [{janitor_agent, bulk_set_vbucket_state, 4,
            [{file, "src/janitor_agent.erl"}, {line, 403}]},
            {proc_lib, init_p,3,
            [{file, "proc_lib.erl"}, {line, 232}]}]}}} {noformat}
            *Note*: Hit this while trying to validate MB-41283

            CC [~richard.demellow]

             
             *Scenario*:
             # 4 Node cluster (6.0.1 build 2037)
             # Create couchbase-bucket with replica=1, size=100M
             # Load bucket into DGM using the cbworkloadgen,
            {noformat}./cbworkloadgen -n 172.23.105.155:8091 -b default -u Administrator -p password --max-items=1200000 -r .95 -l{noformat}
             # Rebalance_in 6.6.1-9182 node into the cluster ({color:#000000}172.23.105.244{color})

            *Observation*:

            Rebalance failed with following reason,
            {noformat}Worker <0.3486.0> (for action {move,{577,
            ['ns_1@172.23.105.212',
            'ns_1@172.23.105.155'],
            ['ns_1@172.23.105.155',
            'ns_1@172.23.105.244'],
            []}}) exited with reason {unexpected_exit,
            {'EXIT', <0.3512.0>,
            {{bulk_set_vbucket_state_failed,
            [{'ns_1@172.23.105.244',
            {'EXIT',
            {{{{{badmatch,
            [{<0.3659.0>,
            {done, exit, {socket_closed,
            {gen_server, call, [<0.2333.0>,
            {setup_streams, [83, 339]}, infinity]}},
            [{gen_server, call, 3,
            [{file, "gen_server.erl"}, {line, 214}]},
            {dcp_replicator, '-spawn_and_wait/1-fun-0-', 1,
            [{file, "src/dcp_replicator.erl"}, {line, 243}]}]}}]},
            [{misc, sync_shutdown_many_i_am_trapping_exits, 1,
            [{file, "src/misc.erl"}, {line, 1374}]},
            {dcp_replicator, spawn_and_wait, 1,
            [{file, "src/dcp_replicator.erl"}, {line, 265}]},
            {dcp_replicator, handle_call, 3,
            [{file, "src/dcp_replicator.erl"}, {line, 121}]},
            {gen_server, try_handle_call, 4,
            [{file, "gen_server.erl"}, {line, 636}]},
            {gen_server, handle_msg, 6,
            [{file, "gen_server.erl"}, {line, 665}]},
            {proc_lib, init_p_do_apply, 3,
            [{file, "proc_lib.erl"}, {line, 247}]}]},
            {gen_server, call,
            [<0.2332.0>, get_partitions, infinity]}},
            {gen_server, call,
            ['dcp_replication_manager-default',
            {get_replicator_pid, 83}, infinity]}},
            {gen_server, call,
            [{'janitor_agent-default', 'ns_1@172.23.105.244'},
            {if_rebalance, <0.2246.0>,
            {update_vbucket_state, 577, replica,
            undefined, 'ns_1@172.23.105.212'}}, infinity]}}}}]},
            [{janitor_agent, bulk_set_vbucket_state, 4,
            [{file, "src/janitor_agent.erl"}, {line, 403}]},
            {proc_lib, init_p,3,
            [{file, "proc_lib.erl"}, {line, 232}]}]}}} {noformat}
            *Note*: Hit this while trying to validate MB-41283

            CC [~richard.demellow]

             
            paolo.cocchi Paolo Cocchi made changes -
            Assignee Daniel Owen [ owend ] Paolo Cocchi [ paolo.cocchi ]
            owend Daniel Owen made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            paolo.cocchi Paolo Cocchi made changes -
            Affects Version/s 6.6.0 [ 16787 ]
            paolo.cocchi Paolo Cocchi made changes -
            Link This issue is triggering MB-42805 [ MB-42805 ]
            owend Daniel Owen made changes -
            Due Date 19/Nov/20
            owend Daniel Owen made changes -
            Labels functional-test approved-for-6.6.1 functional-test
            owend Daniel Owen made changes -
            Is this a Regression? Unknown [ 10452 ] Yes [ 10450 ]
            owend Daniel Owen made changes -
            Link This issue blocks MB-40528 [ MB-40528 ]
            richard.demellow Richard deMellow made changes -
            Link This issue relates to MB-41283 [ MB-41283 ]
            richard.demellow Richard deMellow made changes -
            Link This issue causes CBSE-9239 [ CBSE-9239 ]
            paolo.cocchi Paolo Cocchi made changes -
            Triage Untriaged [ 10351 ] Triaged [ 10350 ]
            Assignee Paolo Cocchi [ paolo.cocchi ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            drigby Dave Rigby made changes -
            Affects Version/s 6.6.1 [ 17002 ]
            richard.demellow Richard deMellow made changes -
            Link This issue relates to CBSE-9397 [ CBSE-9397 ]
            James Flather James Flather made changes -
            Link This issue blocks CBSE-9711 [ CBSE-9711 ]

              People

              Assignee:
              ashwin.govindarajulu Ashwin Govindarajulu
              Reporter:
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Due:
                Created:
                Updated:
                Resolved:

                  PagerDuty