Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59937

[System Test] :- Analytics Service unable to successfully rebalance bfb4defe8d86335dcce63c184d5d5a8f due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [5b6cd7a602641448fb013ef6ee363711]

    XMLWordPrintable

Details

    • Untriaged
    • Linux x86_64
    • 0
    • Unknown
    • Analytics Sprint 32

    Description

      Script to repro

      ./sequoia -client 172.23.110.181:2375 -provider file:debian_pine.yml -test tests/integration/7.6/test_7.6.yml -scope tests/integration/7.6/scope_7.6_magma.yml -scale 2 -repeat 0 -log_level 0 -version 7.6.0-1878 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=1209600 -show_topology=true
      

      Saw multiple rebalance failures like below.

      172.23.121.87 9:41:59 PM 2 Dec, 2023

      Analytics Service unable to successfully rebalance bfb4defe8d86335dcce63c184d5d5a8f due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [5b6cd7a602641448fb013ef6ee363711], state: ACTIVE)'; see analytics_info.log for details
      

      172.23.96.203 9:41:59 PM 2 Dec, 2023

      Rebalance exited with reason {service_rebalance_failed,cbas,
      {worker_died,
      {'EXIT',<0.19887.1165>,
      {task_failed,rebalance,
      {service_error,
      <<"Rebalance bfb4defe8d86335dcce63c184d5d5a8f failed: timed out waiting for all nodes to join & cluster active (missing nodes: [172.23.104.227:8091 (5b6cd7a602641448fb013ef6ee363711)], state: ACTIVE)">>}}}}}.
      Rebalance Operation Id = a65aec09c53ec5a3841925d472342496
      

      172.23.121.87 10:02:21 PM 2 Dec, 2023

      Analytics Service unable to successfully rebalance affefd78be32cab3bc3b22dfa9f09cdc due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [5b6cd7a602641448fb013ef6ee363711], state: ACTIVE)'; see analytics_info.log for details
      

      172.23.96.203 10:02:22 PM 2 Dec, 2023

      Rebalance exited with reason {service_rebalance_failed,cbas,
      {worker_died,
      {'EXIT',<0.29729.1172>,
      {task_failed,rebalance,
      {service_error,
      <<"Rebalance affefd78be32cab3bc3b22dfa9f09cdc failed: timed out waiting for all nodes to join & cluster active (missing nodes: [172.23.104.227:8091 (5b6cd7a602641448fb013ef6ee363711)], state: ACTIVE)">>}}}}}.
      Rebalance Operation Id = 9f8a6651a09b277dbc351039ef7035f6
      

      MB-59824 and MB-59802 looks similar but were fixed in 7.6.0-1857 and we are hitting this in 7.6.0-1878. Cbcollect_info attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty