Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56178

[System Test] : Analytics swap rebalance fails

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 0
    • Yes
    • Analytics Sprint 16

    Description

      Steps to Repro

      ./sequoia -client 172.23.104.27:2375 -provider file:centos_pine.yml -test tests/integration/7.2/test_7.2.yml -scope tests/integration/7.2/scope_7.2_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.2.0-5275 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
      

      Started longevity test on 7.2.0-5275. After 5 hours when the test reached swap rebalance of analytics node it failed. Similar issue was seen in MB-56060. It was mentioned there issue is fixed so that we won't see it in 7.2. Error message seems similar to it.

      172.23.108.103 3:25:57 AM 28 Mar, 2023

      Starting rebalance, KeepNodes = ['ns_1@172.23.104.155','ns_1@172.23.104.157',
      'ns_1@172.23.104.5','ns_1@172.23.104.67',
      'ns_1@172.23.104.69','ns_1@172.23.104.70',
      'ns_1@172.23.105.107','ns_1@172.23.105.111',
      'ns_1@172.23.106.100','ns_1@172.23.106.188',
      'ns_1@172.23.108.103','ns_1@172.23.120.107',
      'ns_1@172.23.120.245','ns_1@172.23.121.117',
      'ns_1@172.23.123.28','ns_1@172.23.96.148',
      'ns_1@172.23.96.192','ns_1@172.23.96.252',
      'ns_1@172.23.96.253','ns_1@172.23.97.119',
      'ns_1@172.23.97.121','ns_1@172.23.97.122',
      'ns_1@172.23.97.239','ns_1@172.23.99.11',
      'ns_1@172.23.99.20','ns_1@172.23.99.21',
      'ns_1@172.23.99.25'], EjectNodes = ['ns_1@172.23.104.137'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = a5cacdd0d6356aca105ea2f3e8e125d9
      

      172.23.104.137 3:27:27 AM 28 Mar, 2023

      Analytics Service unable to successfully rebalance ee5d415339b47cd7ceacde0af4e59251 due to 'CBAS0001: Analytics collections in different partitions have different DCP states. Mutations needed to catch up = 13216. User action: Try again later'; see analytics_info.log for details
      

      172.23.108.103 3:27:28 AM 28 Mar, 2023

      Rebalance exited with reason {service_rebalance_failed,cbas,
      {worker_died,
      {'EXIT',<0.3409.175>,
      {rebalance_failed,
      {service_error,
      <<"Rebalance ee5d415339b47cd7ceacde0af4e59251 failed: CBAS0001: Analytics collections in different partitions have different DCP states. Mutations needed to catch up = 13216. User action: Try again later">>}}}}}.
      Rebalance Operation Id = a5cacdd0d6356aca105ea2f3e8e125d9
      

      cbcollect_info attached. This was not seen on the run we had on 7.2.0-5263.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty