Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-55582

CDC: Graceful failover+delta recovery rebalance failed with error 'sync_shutdown_many_i_am_trapping_exits'

    XMLWordPrintable

Details

    Description

      Build: 7.2.0-5161

      Scenario:

      • 5 node cluster

        +----------------+----------+-----------------+-----------+-----------+----------------------+
        | Node           | Services | CPU_utilization | Mem_total | Mem_free  | Swap_mem_used        |
        +----------------+----------+-----------------+-----------+-----------+----------------------+
        | 172.23.107.217 | kv       | 1.22186325735   | 23.36 GiB | 20.74 GiB | 0.0 Byte / 3.50 GiB  |
        | 172.23.107.222 | kv       | 1.45520788984   | 23.36 GiB | 20.40 GiB | 2.00 MiB / 3.50 GiB  |
        | 172.23.107.102 | kv       | 0.40489700185   | 23.36 GiB | 21.29 GiB | 0.0 Byte / 3.50 GiB  |
        | 172.23.107.99  | kv       | 0.427595256557  | 23.36 GiB | 20.85 GiB | 56.69 MiB / 3.50 GiB |
        | 172.23.107.223 | kv       | 1.76710509235   | 23.36 GiB | 21.18 GiB | 0.0 Byte / 0.0 Byte  |
        +----------------+----------+-----------------+-----------+-----------+----------------------+
        

      • 3 buckets (2 magma & 1 couchstore), load data into all buckets

        +---------+-----------+-----------------+----------+
        | Bucket  | Type      | Storage Backend | Replicas |
        +---------+-----------+-----------------+----------+
        | bucket1 | couchbase | couchstore      | 2        |
        | bucket2 | couchbase | magma           | 2        |
        | default | couchbase | magma           | 2        |
        +---------+-----------+-----------------+----------+

      • Perform graceful failover of node '172.23.107.102'. Failover succeeded

        Graceful failover completed successfully.
        Rebalance Operation Id = 75fd16e61c926a649a9052058fa18271 [6:06:56 PM 13 Feb, 2023]



        * Start delta recovery rebalance operation

      Rebalance operation failed with error,

       

      [rebalance:debug,2023-02-13T19:08:33.023-08:00,ns_1@172.23.107.222:<0.29660.327>:janitor_agent:bulk_set_vbucket_state:370]bulk vbucket state change failed for:
      [{'ns_1@172.23.107.99',
           {'EXIT',
               {{{{{badmatch,
                       [{<33312.6965.225>,
                         {done,exit,
                             {socket_closed,
                                 {gen_server,call,
                                     [<33312.31609.224>,
                                      {setup_streams,[149,150,151,152,153]},
                                      infinity]}},
                             [{gen_server,call,3,
                                  [{file,"gen_server.erl"},{line,247}]},
                              {dcp_replicator,'-spawn_and_wait/1-fun-0-',1,
                                  [{file,"src/dcp_replicator.erl"},{line,336}]}]}}]},
                   [{misc,sync_shutdown_many_i_am_trapping_exits,1,
                        [{file,"src/misc.erl"},{line,1458}]},
                    {dcp_replicator,spawn_and_wait,1,
                        [{file,"src/dcp_replicator.erl"},{line,357}]},
                    {dcp_replicator,handle_call,3,
                        [{file,"src/dcp_replicator.erl"},{line,146}]},
                    {gen_server,try_handle_call,4,
                        [{file,"gen_server.erl"},{line,721}]},
                    {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},
                    {proc_lib,init_p_do_apply,3,
                        [{file,"proc_lib.erl"},{line,226}]}]},
                  {gen_server,call,[<33312.31597.224>,get_partitions,infinity]}},
                 {gen_server,call,
                     ['dcp_replication_manager-default',
                      {get_replicator_pid,146},
                      infinity]}},
                {gen_server,call,
                    [{'janitor_agent-default','ns_1@172.23.107.99'},
                     {if_rebalance,<0.21548.326>,
                         {update_vbucket_state,149,replica,undefined,
                             'ns_1@172.23.107.102'}},
                     infinity]}}}}]
       
      [rebalance:debug,2023-02-13T19:08:33.024-08:00,ns_1@172.23.107.222:<0.30437.327>:janitor_agent:bulk_set_vbucket_state:370]bulk vbucket state change failed for:
      [{'ns_1@172.23.107.99',
           {'EXIT',
               {{{{{badmatch,
                       [{<33312.6965.225>,
                         {done,exit,
                             {socket_closed,
                                 {gen_server,call,
                                     [<33312.31609.224>,
                                      {setup_streams,[149,150,151,152,153]},
                                      infinity]}},
                             [{gen_server,call,3,
                                  [{file,"gen_server.erl"},{line,247}]},
                              {dcp_replicator,'-spawn_and_wait/1-fun-0-',1,
                                  [{file,"src/dcp_replicator.erl"},{line,336}]}]}}]},
                   [{misc,sync_shutdown_many_i_am_trapping_exits,1,
                        [{file,"src/misc.erl"},{line,1458}]},
                    {dcp_replicator,spawn_and_wait,1,
                        [{file,"src/dcp_replicator.erl"},{line,357}]},
                    {dcp_replicator,handle_call,3,
                        [{file,"src/dcp_replicator.erl"},{line,146}]},
                    {gen_server,try_handle_call,4,
                        [{file,"gen_server.erl"},{line,721}]},
                    {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},
                    {proc_lib,init_p_do_apply,3,
                        [{file,"proc_lib.erl"},{line,226}]}]},
                  {gen_server,call,[<33312.31597.224>,get_partitions,infinity]}},
                 {gen_server,call,
                     ['dcp_replication_manager-default',
                      {get_replicator_pid,146},
                      infinity]}},
                {gen_server,call,
                    [{'janitor_agent-default','ns_1@172.23.107.99'},
                     {if_rebalance,<0.21548.326>,
                         {update_vbucket_state,656,replica,undefined,
                             'ns_1@172.23.107.223'}},
                     infinity]}}}}]

       

      Crash report from ns_server_debug.log

       

      [error_logger:error,2023-02-13T19:08:33.024-08:00,ns_1@172.23.107.222:<0.21586.326>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: ns_single_vbucket_mover:'-wait_dcp_data_move/5-fun-0-'/0
          pid: <0.21586.326>
          registered_name: []
          exception exit: {{{{{badmatch,
                               [{<33312.6965.225>,
                                 {done,exit,
                                  {socket_closed,
                                   {gen_server,call,
                                    [<33312.31609.224>,
                                     {setup_streams,[149,150,151,152,153]},
                                     infinity]}},
                                  [{gen_server,call,3,
                                    [{file,"gen_server.erl"},{line,247}]},
                                   {dcp_replicator,'-spawn_and_wait/1-fun-0-',1,
                                    [{file,"src/dcp_replicator.erl"},
                                     {line,336}]}]}}]},
                              [{misc,sync_shutdown_many_i_am_trapping_exits,1,
                                [{file,"src/misc.erl"},{line,1458}]},
                               {dcp_replicator,spawn_and_wait,1,
                                [{file,"src/dcp_replicator.erl"},{line,357}]},
                               {dcp_replicator,handle_call,3,
                                [{file,"src/dcp_replicator.erl"},{line,146}]},
                               {gen_server,try_handle_call,4,
                                [{file,"gen_server.erl"},{line,721}]},
                               {gen_server,handle_msg,6,
                                [{file,"gen_server.erl"},{line,750}]},
                               {proc_lib,init_p_do_apply,3,
                                [{file,"proc_lib.erl"},{line,226}]}]},
                             {gen_server,call,
                              [<33312.31597.224>,get_partitions,infinity]}},
                            {gen_server,call,
                             ['dcp_replication_manager-default',
                              {get_replicator_pid,146},
                              infinity]}},
                            {gen_server,call,
                            [{'janitor_agent-default','ns_1@172.23.107.99'},
                             {if_rebalance,<0.21548.326>,
                              {wait_dcp_data_move,
                               ['ns_1@172.23.107.102','ns_1@172.23.107.223'],
                               204}},
                             infinity]}}
            in function  gen_server:call/3 (gen_server.erl, line 247)
            in call from ns_single_vbucket_mover:'-wait_dcp_data_move/5-fun-0-'/5 (src/ns_single_vbucket_mover.erl, line 444)
          ancestors: [<0.20826.326>,<0.21548.326>,<0.12591.324>]
          message_queue_len: 0
          messages: []
          links: [<0.20826.326>]
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 1598
          stack_size: 29
          reductions: 3478
        neighbours:

       

       

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty