Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60576

[Upgrade] : Rebalance exited with reason {mover_crashed,{unexpected_exit,{'EXIT',<0.2089.12>,{{dcp_wait_for_data_move_failed,"bucket-0"

    XMLWordPrintable

Details

    Description

      Steps to reproduce

      1. Created a 3 kv node cluster
      2. Created 2 magma buckets "bucket1" and "bucket0"
      3. Loaded bucket-0 with 100000000 documents such that it reaches 1% Active Resident Ratio
      4. This was followed by upserting docs to increase fragmentation value
      5. A new node 172.23.217.193 with 7.6.0-2054 was added to the cluster
      6. A swap rebalance was attempted by removing 172.23.217.190
      7. Node 172.23.217.190 was reset and 7.6.0-2054 was installed on it
      8. This node was added in and rebalance was attempted - Rebalance Fails

      2024-01-28T22:54:38.436-08:00, ns_vbucket_mover:0:critical:message(ns_1@172.23.217.193) - Worker <0.1978.12> (for action {move,{131,                                      ['ns_1@172.23.217.193',                                       'ns_1@172.23.217.192',                                       'ns_1@172.23.217.191'],                                      ['ns_1@172.23.217.190',                                       'ns_1@172.23.217.192',                                       'ns_1@172.23.217.193'],                                      []}}) exited with reason {unexpected_exit,                                                                {'EXIT',                                                                 <0.2089.12>,                                                                 {{dcp_wait_for_data_move_failed,                                                                   "bucket-0",                                                                   131,                                                                   'ns_1@172.23.217.193',                                                                   ['ns_1@172.23.217.190',                                                                    'ns_1@172.23.217.192'],                                                                   {error,                                                                    {unexpected_status,                                                                     <<"connection_does_not_exist">>},                                                                    "Error getting dcp stats on 'ns_1@172.23.217.193' for bucket \"bucket-0\", partition 131, connection \"replication:ns_1@172.23.217.193->ns_1@172.23.217.190:bucket-0\": {unexpected_status,\n                                                                                                                                                                    <<\"connection_does_not_exist\">>}"}},                                                                  [{ns_single_vbucket_mover,                                                                    '-wait_dcp_data_move/5-fun-0-',                                                                    5,                                                                    [{file,                                                                      "src/ns_single_vbucket_mover.erl"},                                                                     {line,                                                                      453}]},                                                                   {proc_lib,                                                                    init_p,3,                                                                    [{file,                                                                      "proc_lib.erl"},                                                                     {line,                                                                      225}]}]}}}2024-01-28T22:54:38.490-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.217.193) - Rebalance exited with reason {mover_crashed,                              {unexpected_exit,                               {'EXIT',<0.2089.12>,                                {{dcp_wait_for_data_move_failed,"bucket-0",                                  131,'ns_1@172.23.217.193',                                  ['ns_1@172.23.217.190',                                   'ns_1@172.23.217.192'],                                  {error,                                   {unexpected_status,                                    <<"connection_does_not_exist">>},                                   "Error getting dcp stats on 'ns_1@172.23.217.193' for bucket \"bucket-0\", partition 131, connection \"replication:ns_1@172.23.217.193->ns_1@172.23.217.190:bucket-0\": {unexpected_status,\n                                                                                                                                                                    <<\"connection_does_not_exist\">>}"}},                                 [{ns_single_vbucket_mover,                                   '-wait_dcp_data_move/5-fun-0-',5,                                   [{file,"src/ns_single_vbucket_mover.erl"},                                    {line,453}]},                                  {proc_lib,init_p,3,                                   [{file,"proc_lib.erl"},{line,225}]}]}}}}.Rebalance Operation Id = 2388851ecdd861129881e4ea1d5f3b52 

      Observing ERROR lines in memcached logs on node 172.23.217.193

      2024-01-28T22:54:38.424663-08:00 ERROR 21637: Exception occurred during packet execution. Closing connection [ {"ip":"172.23.217.190","port":51480} - {"ip":"172.23.217.193","port":11206} (System, @ns_server) ]: ThrowExceptionUnderflowPolicy current:3576435 arg:3580981. Cookies: [{"aiostat":"success","ewouldblock":false,"packet":{"bodylen":4,"cas":0,"datatype":"raw","extlen":4,"extras":{"buffer_bytes":3580981},"keylen":0,"magic":"ClientRequest","opaque":1898,"opcode":"DCP_BUFFER_ACKNOWLEDGEMENT","vbucket":0},"refcount":1,"started":"1212755907350281 (9120 us ago)","throttled":false}] Exception thrown from: ["#0  /opt/couchbase/bin/memcached() [0x400000+0x1440ff]","#1  /opt/couchbase/bin/memcached() [0x400000+0x4b5767]","#2  /opt/couchbase/bin/memcached() [0x400000+0x4b60cd]","#3  /opt/couchbase/bin/memcached() [0x400000+0x4b62b5]","#4  /opt/couchbase/bin/memcached() [0x400000+0x4b65a5]","#5  /opt/couchbase/bin/memcached() [0x400000+0x342e04]","#6  /opt/couchbase/bin/memcached() [0x400000+0x1d307d]","#7  /opt/couchbase/bin/memcached() [0x400000+0x2a6bf0]","#8  /opt/couchbase/bin/memcached() [0x400000+0x22217b]","#9  /opt/couchbase/bin/memcached() [0x400000+0x2088d8]","#10 /opt/couchbase/bin/memcached() [0x400000+0x20d735]","#11 /opt/couchbase/bin/memcached() [0x400000+0x217c67]","#12 /opt/couchbase/bin/../lib/libevent_core-2.1.so.7() [0x7f364d000000+0xf84e]","#13 /opt/couchbase/bin/../lib/libevent_core-2.1.so.7() [0x7f364d000000+0x18b99]","#14 /opt/couchbase/bin/../lib/libevent_core-2.1.so.7(event_base_loop+0x357) [0x7f364d000000+0x19287]","#15 /opt/couchbase/bin/memcached() [0x400000+0x981436]","#16 /opt/couchbase/bin/memcached() [0x400000+0x98193e]","#17 /opt/couchbase/bin/memcached() [0x400000+0x983e08]","#18 /opt/couchbase/bin/memcached() [0x400000+0x231769]","#19 /opt/couchbase/bin/memcached() [0x400000+0x8d0f0c]","#20 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f364ba00000+0xe4aa3]","#21 /lib/x86_64-linux-gnu/libc.so.6() [0x7f364e81f000+0x89044]","#22 /lib/x86_64-linux-gnu/libc.so.6() [0x7f364e81f000+0x10961c]"] 


       

      TAF Script to reproduce

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.110966.ini doc_size=1024,randomize_value=True,bucket_spec=upgrade_test.1_percent_multi,dur_level=majority,rebalance_op=all,alternate_load=True,magma_upgrade=True,GROUP=P0;rebalance_7_1_1,upgrade_version=7.6.0-2054,sirius_url=http://172.23.120.103:4000 -t upgrade.durability_upgrade.UpgradeTests.test_upgrade,doc_size=1024,retry_get_process_num=500,upgrade_version=7.6.0-2054,skip_buckets_handle=True,sdk_client_pool=True,GROUP=P0;rebalance_7_1_1,upgrade_chain=7.1.1,upgrade_with_data_load=True,bucket_spec=upgrade_test.1_percent_multi,alternate_load=True,magma_upgrade=True,update_nodes=kv,enable_tls=True,rebalance_op=all,get-cbcollect-info=True,log_level=info,tls_level=all,upgrade_type=online_rebalance_in_out,dur_level=majority,nodes_init=3,sirius_url=http://172.23.120.103:4000,randomize_value=True,infra_log_level=info'

      Job name : debian-magma_online_upgrade_rebalance_in_multi_bucket_1DGM_7.1.1_P0

      Job ref : http://qe-jenkins1.sc.couchbase.com/job/test_suite_executor-TAF/47652/consoleText

      Attachments

        Issue Links

          Activity

            People

              raghav.sk Raghav S K
              raghav.sk Raghav S K
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty