Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40053

[BucketDurability]: KeyEexists received from DCP Consumer for DCP_COMMIT

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Unknown

    Description

      Build: 7.0.0-2351

      Scenario:

      1. Four Node cluster, Couchbase bucket (replica=0, min_durability_level=persistToMajority)
      2. Update replica from '0 --> 1 --> 2' in incremental way with doc_loading after each replica update rebalance
      3. Bring down replica back to zero i.e. 2 --> 1 --> 0 in incremental way again with doc_loading

      Observation:

      While updating replica from 1 --> 0 in step#3, saw the below rebalance failure

      'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.', 'status': 'none' - rebalance failed
      Latest logs from UI on 172.23.123.125:
      'code': 0, 'module': 'ns_orchestrator', 'type': 'critical', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561059861L, 'shortText': 'message', 'serverTime': '2020-06-19T03:04:19.861Z', 'text': 'Rebalance exited with reason {mover_crashed,
                        nexpected_exit,
                         {\'EXIT\',<0.11370.7>,
                          {{{{{child_interrupted,
                               {\'EXIT\',<0.21116.1>,socket_closed}},
                              [{dcp_replicator,spawn_and_wait,1,
                                [{file,"src/dcp_replicator.erl"},
                                 {line,266}]},
                               {dcp_replicator,handle_call,3,
                                [{file,"src/dcp_replicator.erl"},
                                 {line,121}]},
                               {gen_server,try_handle_call,4,
                                [{file,"gen_server.erl"},{line,636}]},
                               {gen_server,handle_msg,6,
                                [{file,"gen_server.erl"},{line,665}]},
                               {proc_lib,init_p_do_apply,3,
                                [{file,"proc_lib.erl"},{line,247}]}]},
                             {gen_server,call,
                              [<0.21071.1>,
                               {setup_replication,
                                [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,
                                 19,20,21,22,23,24,25,26,27,28,29,30,31,
                                 32,33,34,35,36,37,38,39,40,41,42,43,44,
                                 45,46,47,48,49,50,51,52,53,54,55,56,57,
                                 58,59,60,61,62,63,64,65,66,67,68,69,70,
                                 71,72,73,74,75,76,77,78,79,80,81,82,83,
                                 84]},
                               infinity]}},
                            {gen_server,call,
                             [\'replication_manager-default\',
                              {change_vbucket_replication,682,undefined},
                              infinity]}},
                           {gen_server,call,
                            [{\'janitor_agent-default\',
                              \'ns_1@172.23.123.125\'},
                             {if_rebalance,<0.2074.7>,
                              pdate_vbucket_state,849,active,
                               undefined,undefined,
                               [[\'ns_1@172.23.123.125\',
                                 \'ns_1@172.23.123.121\'],
                                [\'ns_1@172.23.123.125\']]}},
                             infinity]}}}}}.
      Rebalance Operation Id = 1a5386077bf37ecdd570807354619b6e'}
      'code': 0, 'module': 'ns_vbucket_mover', 'type': 'critical', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561059856L, 'shortText': 'message', 'serverTime': '2020-06-19T03:04:19.856Z', 'text': 'Worker <0.11303.7> (for action {move,{849,
                                [\'ns_1@172.23.123.125\',
                                 \'ns_1@172.23.123.121\'],
                                [\'ns_1@172.23.123.125\'],
                                []}}) exited with reason nexpected_exit,
                                      {\'EXIT\',
                                       <0.11370.7>,
                                       {{{{{child_interrupted,
                                            {\'EXIT\',
                                             <0.21116.1>,
                                             socket_closed}},
                                           [{dcp_replicator,
                                             spawn_and_wait,
                                             1,
                                             [{file,
                                               "src/dcp_replicator.erl"},
                                              {line,
                                               266}]},
                                            {dcp_replicator,
                                             handle_call,
                                             3,
                                             [{file,
                                               "src/dcp_replicator.erl"},
                                              {line,
                                               121}]},
                                            {gen_server,
                                             try_handle_call,
                                             4,
                                             [{file,
                                               "gen_server.erl"},
                                              {line,
                                               636}]},
                                            {gen_server,
                                             handle_msg,
                                             6,
                                             [{file,
                                               "gen_server.erl"},
                                              {line,
                                               665}]},
                                            {proc_lib,
                                             init_p_do_apply,
                                             3,
                                             [{file,
                                               "proc_lib.erl"},
                                              {line,
                                               247}]}]},
                                          {gen_server,
                                           call,
                                           [<0.21071.1>,
                                            {setup_replication,
                                             [4,5,6,
                                              7,8,9,
                                              10,11,
                                              12,13,
                                              14,15,
                                              16,17,
                                              18,19,
                                              20,21,
                                              22,23,
                                              24,25,
                                              26,27,
                                              28,29,
                                              30,31,
                                              32,33,
                                              34,35,
                                              36,37,
                                              38,39,
                                              40,41,
                                              42,43,
                                              44,45,
                                              46,47,
                                              48,49,
                                              50,51,
                                              52,53,
                                              54,55,
                                              56,57,
                                              58,59,
                                              60,61,
                                              62,63,
                                              64,65,
                                              66,67,
                                              68,69,
                                              70,71,
                                              72,73,
                                              74,75,
                                              76,77,
                                              78,79,
                                              80,81,
                                              82,83,
                                              84]},
                                            infinity]}},
                                         {gen_server,
                                          call,
                                          [\'replication_manager-default\',
                                           {change_vbucket_replication,
                                            682,
                                            undefined},
                                           infinity]}},
                                        {gen_server,
                                         call,
                                         [{\'janitor_agent-default\',
                                           \'ns_1@172.23.123.125\'},
                                          {if_rebalance,
                                           <0.2074.7>,
                                           pdate_vbucket_state,
                                            849,
                                            active,
                                            undefined,
                                            undefined,
                                            [[\'ns_1@172.23.123.125\',
                                              \'ns_1@172.23.123.121\'],
                                             [\'ns_1@172.23.123.125\']]}},
                                          infinity]}}}}'
      'code': 0, 'module': 'ns_vbucket_mover', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561057634L, 'shortText': 'message', 'serverTime': '2020-06-19T03:04:17.634Z', 'text': 'Bucket "default" rebalance does not seem to be swap rebalance'
      'code': 0, 'module': 'ns_rebalancer', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561057562L, 'shortText': 'message', 'serverTime': '2020-06-19T03:04:17.562Z', 'text': 'Started rebalancing bucket default'
      'code': 0, 'module': 'ns_orchestrator', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561057439L, 'shortText': 'message', 'serverTime': '2020-06-19T03:04:17.439Z', 'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.123.124','ns_1@172.23.123.125',
                                       'ns_1@172.23.123.119','ns_1@172.23.123.121'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 1a5386077bf37ecdd570807354619b6e"
      'code': 0, 'module': 'menelaus_web_buckets', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561053371L, 'shortText': 'message', 'serverTime': '2020-06-19T03:04:13.371Z', 'text': 'Updated bucket "default" (of type couchbase) properties:
      [{num_replicas,0},{ram_quota,1493172224},{storage_mode,couchstore}]'
      'code': 0, 'module': 'ns_orchestrator', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561045087L, 'shortText': 'message', 'serverTime': '2020-06-19T03:04:05.087Z', 'text': 'Rebalance completed successfully.
      Rebalance Operation Id = e4e7e898047642367b4bfecef7f52fc8'
      'code': 0, 'module': 'ns_vbucket_mover', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561036461L, 'shortText': 'message', 'serverTime': '2020-06-19T03:03:56.461Z', 'text': 'Bucket "default" rebalance does not seem to be swap rebalance (repeated 1 times, last seen 0.752024 secs ago)'
      'code': 0, 'module': 'ns_rebalancer', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561036461L, 'shortText': 'message', 'serverTime': '2020-06-19T03:03:56.461Z', 'text': 'Started rebalancing bucket default (repeated 1 times, last seen 0.855567 secs ago)'
      'code': 0, 'module': 'menelaus_web_buckets', 'type': 'info', 'node': 'ns_1@172.23.123.125', 'tstamp': 1592561036461L, 'shortText': 'message', 'serverTime': '2020-06-19T03:03:56.461Z', 'text': 'Updated bucket "default" (of type couchbase) properties:
      [{num_replicas,1},{ram_quota,1493172224},{storage_mode,couchstore}] (repeated 1 times, last seen 5.654143 secs ago)'
      Rebalance Failed: 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.', 'status': 'none' - rebalance failed
      

      cbcollect_logs:
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_fail/collectinfo-2020-06-19T200246-ns_1%40172.23.105.155.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_fail/collectinfo-2020-06-19T200246-ns_1%40172.23.105.159.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_fail/collectinfo-2020-06-19T200246-ns_1%40172.23.105.205.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/rebalance_fail/collectinfo-2020-06-19T200246-ns_1%40172.23.105.206.zip
      Test execution link: http://qa.sc.couchbase.com/job/oel6-4node-rebalance_in_jython/983/console

       

      Attachments

        1. test.log
          317 kB
          Ashwin Govindarajulu

        Issue Links

          Activity

            People

              owend Daniel Owen
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty