Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56256

CDC: flusher is deduplicating abort, history backfill then can produce prepare, prepare without an interleaving abort

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 0
    • Unknown

    Description

      Build: 7.2.0-5284

      Steps:

      • 5 node kv cluster

        +----------------+-----------------+-----------+-----------+
        | Node           | CPU_utilization | Mem_total | Mem_free  |
        +----------------+-----------------+-----------+-----------+
        | 172.23.107.217 | 1.11707161163   | 23.36 GiB | 21.81 GiB |
        | 172.23.107.222 | 1.2470319506    | 23.36 GiB | 21.87 GiB |
        | 172.23.107.102 | 2.46339231162   | 23.36 GiB | 22.26 GiB |
        | 172.23.107.99  | 3.04053523462   | 23.36 GiB | 22.15 GiB |
        | 172.23.107.223 | 1.21592981087   | 23.36 GiB | 22.15 GiB |
        +----------------+-----------------+-----------+-----------+
        

      • 3 buckets (2 Magma & 1 Couchstore) with replicas=2

        +---------+-----------------+----------+---------+-----------+------------+------------+---------------+
        | Bucket  | Storage Backend | Replicas | Items   | RAM Quota | RAM Used   | Disk Used  | ARR           |
        +---------+-----------------+----------+---------+-----------+------------+------------+---------------+
        | bucket1 | couchstore      | 2        | 100000  | 9.77 GiB  | 319.14 MiB | 230.10 MiB | 100           |
        | bucket2 | magma           | 2        | 50000   | 4.88 GiB  | 449.42 MiB | 365.12 MiB | 100           |
        | default | magma           | 2        | 8140000 | 2.50 GiB  | 1.65 GiB   | 17.03 GiB  | 20.9565356265 |
        +---------+-----------------+----------+---------+-----------+------------+------------+---------------+
        

      • Initial load + load history data using cont. upserts (All with durability=MAJORITY)
      • Graceful failover the node '172.23.107.102' (Success)
      • Add back the node using 'Delta' recovery and trigger rebalance

        Operation Id = 5b2a77f71136e936a2128d26358ee30b

      Observation:

      From ns_server logs

       

      [error_logger:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.25969.34>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: ns_single_vbucket_mover:'-wait_dcp_data_move/5-fun-0-'/0
          pid: <0.25969.34>
          registered_name: []
          exception error: {dcp_wait_for_data_move_failed,"default",457,
                               'ns_1@172.23.107.222',
                               ['ns_1@172.23.107.102','ns_1@172.23.107.99'],
                               {error,no_stats_for_this_vbucket}}
            in function  ns_single_vbucket_mover:'-wait_dcp_data_move/5-fun-0-'/5 (src/ns_single_vbucket_mover.erl, line 451)
          ancestors: [<0.26092.34>,<0.25649.34>,<0.6191.33>]
          message_queue_len: 0
          messages: []
          links: [<0.26092.34>]
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 987
          stack_size: 29
          reductions: 3479
        neighbours:
       
      [ns_server:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.26092.34>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.25969.34>,
                                     {{dcp_wait_for_data_move_failed,"default",457,
                                          'ns_1@172.23.107.222',
                                          ['ns_1@172.23.107.102',
                                           'ns_1@172.23.107.99'],
                                          {error,no_stats_for_this_vbucket}},
                                      [{ns_single_vbucket_mover,
                                           '-wait_dcp_data_move/5-fun-0-',5,
                                           [{file,"src/ns_single_vbucket_mover.erl"},
                                            {line,451}]},
                                       {proc_lib,init_p,3,
                                           [{file,"proc_lib.erl"},{line,211}]}]}}
      [ns_server:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.26092.34>:misc:sync_shutdown_many_i_am_trapping_exits:1456]Shutdown of the following failed: [{<0.25969.34>,
                                          {{dcp_wait_for_data_move_failed,
                                            "default",457,'ns_1@172.23.107.222',
                                            ['ns_1@172.23.107.102',
                                             'ns_1@172.23.107.99'],
                                            {error,no_stats_for_this_vbucket}},
                                           [{ns_single_vbucket_mover,
                                             '-wait_dcp_data_move/5-fun-0-',5,
                                             [{file,
                                               "src/ns_single_vbucket_mover.erl"},
                                              {line,451}]},
                                            {proc_lib,init_p,3,
                                             [{file,"proc_lib.erl"},{line,211}]}]}}]
      [ns_server:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.26092.34>:misc:try_with_maybe_ignorant_after:1491]Eating exception from ignorant after-block:
      {error,
          {badmatch,
              [{<0.25969.34>,
                {{dcp_wait_for_data_move_failed,"default",457,
                     'ns_1@172.23.107.222',
                     ['ns_1@172.23.107.102','ns_1@172.23.107.99'],
                     {error,no_stats_for_this_vbucket}},
                 [{ns_single_vbucket_mover,'-wait_dcp_data_move/5-fun-0-',5,
                      [{file,"src/ns_single_vbucket_mover.erl"},{line,451}]},
                  {proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,211}]}]}}]},
          [{misc,sync_shutdown_many_i_am_trapping_exits,1,
               [{file,"src/misc.erl"},{line,1458}]},
           {misc,try_with_maybe_ignorant_after,2,
               [{file,"src/misc.erl"},{line,1489}]},
           {ns_single_vbucket_mover,mover,6,
               [{file,"src/ns_single_vbucket_mover.erl"},{line,49}]},
           {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
      

      [error_logger:error,2023-03-31T01:06:14.803-07:00,ns_1@172.23.107.222:<0.26092.34>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: ns_single_vbucket_mover:mover/6
          pid: <0.26092.34>
          registered_name: []
          exception exit: {unexpected_exit,
                              {'EXIT',<0.25969.34>,
                                  {{dcp_wait_for_data_move_failed,"default",457,
                                       'ns_1@172.23.107.222',
                                       ['ns_1@172.23.107.102','ns_1@172.23.107.99'],
                                       {error,no_stats_for_this_vbucket}},
                                   [{ns_single_vbucket_mover,
                                        '-wait_dcp_data_move/5-fun-0-',5,
                                        [{file,"src/ns_single_vbucket_mover.erl"},
                                         {line,451}]},
                                    {proc_lib,init_p,3,
                                        [{file,"proc_lib.erl"},{line,211}]}]}}}
            in function  ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80)
            in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 152)
            in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52)
            in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487)
            in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49)
          ancestors: [<0.25649.34>,<0.6191.33>]
          message_queue_len: 0
          messages: []
          links: [<0.25649.34>]
          dictionary: [{cleanup_list,[<0.25969.34>]}]
          trap_exit: true
          status: running
          heap_size: 6772
          stack_size: 29
          reductions: 19619
        neighbours:
      

      [rebalance:error,2023-03-31T01:06:14.803-07:00,ns_1@172.23.107.222:<0.25649.34>:ns_vbucket_mover:handle_info:212]Worker <0.26092.34> (for action {move,{457,                                        ['ns_1@172.23.107.222',                                         'ns_1@172.23.107.99',                                         'ns_1@172.23.107.102'],                                        ['ns_1@172.23.107.222',                                         'ns_1@172.23.107.102',                                         'ns_1@172.23.107.99'],                                        []}}) exited with reason {unexpected_exit,                                                                  {'EXIT',                                                                   <0.25969.34>,                                                                   {{dcp_wait_for_data_move_failed,                                                                     "default",                                                                     457,                                                                     'ns_1@172.23.107.222',                                                                     ['ns_1@172.23.107.102',                                                                      'ns_1@172.23.107.99'],                                                                     {error,                                                                      no_stats_for_this_vbucket}},                                                                    [{ns_single_vbucket_mover,                                                                      '-wait_dcp_data_move/5-fun-0-',                                                                      5,                                                                      [{file,                                                                        "src/ns_single_vbucket_mover.erl"},                                                                       {line,                                                                        451}]},                                                                     {proc_lib,                                                                      init_p,3,                                                                      [{file,                                                                        "proc_lib.erl"},                                                                       {line,                                                                        211}]}]}}}
      

      Note: Same test passes with very low amount of historical data

      TAF test:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i node.ini -p rerun=False,get-cbcollect-info=False,skip_cluster_reset=True,upgrade_version=7.2.0-5284 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_recovery,doc_size=512,data_load_stage=during,bucket_history_retention_seconds=300,bucket_spec=magma_dgm.20_percent_dgm.5_node_2_replica_magma_512,bucket_history_retention_bytes=2147483648,nodes_failover=1,durability=MAJORITY,recovery_type=delta,skip_validations=False,nodes_init=5,default_history_retention_for_collections=false,disk_optimized_thread_settings=True,autoCompactionDefined=true,randomize_value=True,override_spec_params=durability,disk_optimized_thread_settings=True,get-cbcollect-info=False,autoCompactionDefined=true,dedupe_update_itrs=3000,log_level=debug,infra_log_level=debug'

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty