Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41910

[CX] Rebalance reports success even though it failed

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      Unknown
    • Sprint:
      CX Sprint 221, CX Sprint 222

      Description

      This is also related to MB-30766

      As shown in the output, the rebalance operation is reported as successful even though the rebalance fails (as indicated in the logs)

      2020-10-07T14:59:34.803-07:00 INFO UpgradeTestBase [main] Recreating 'cc' node (image = [build-docker.couchbase.com/couchbase/server-internal:6.6.1-9104], forceNewIp = false)... 2020-10-07T14:59:34.807-07:00 INFO DockerTestBase [main] Stopping docker container: fa468d77d00a94a66511700c4f7266e9ca84cb987bfe71149531511f3296ee01 2020-10-07T14:59:37.769-07:00 INFO DockerTestBase [main] Stopped docker container: fa468d77d00a94a66511700c4f7266e9ca84cb987bfe71149531511f3296ee01 2020-10-07T14:59:37.769-07:00 INFO DockerTestBase [main] Removing docker container: fa468d77d00a94a66511700c4f7266e9ca84cb987bfe71149531511f3296ee01 2020-10-07T14:59:37.847-07:00 INFO DockerTestBase [main] Removed docker container: fa468d77d00a94a66511700c4f7266e9ca84cb987bfe71149531511f3296ee01 2020-10-07T14:59:37.858-07:00 INFO DockerTestBase [main] Creating docker container (test dir = Couchbase/mad_hatter/analytics/cbas/cbas-test/cbas-docker-test/target/com.couchbase.analytics.test.docker.upgrade.misc.SecondaryCompositeIndexITD/upgradeOfflineAndDowngrade/0005_cc_build-docker.couchbase.com_couchbase_server-internal_6.6.1-9104, network = { name = cbas-docker-test-0001-2e43f5f6-162e-4c6e-98a6-66f298732ada, id = 43d9db2cdcc2c89d07dd3ecc64164010673062af4b93ed4bf3307dca7dd03b94 }) 2020-10-07T14:59:39.705-07:00 INFO DockerTestBase [main] Container information for d34c7c8298f47f1762575b7711fcc0b31f2a717f50f8cbd284f80a179d6e0f0d: host: cc.couchbase.host, ip: 192.168.176.3, 127.0.0.1:34477->192.168.176.3:8091, 127.0.0.1:34473->192.168.176.3:8095 2020-10-07T14:59:59.110-07:00 INFO DockerTestBase [main] Waiting for NCs == 2 & Cluster state in {REBALANCE_REQUIRED} for up to 360s... 2020-10-07T15:00:56.210-07:00 INFO UpgradeTestBase [main] Recreating 'nc' node (image = [build-docker.couchbase.com/couchbase/server-internal:6.6.1-9104], forceNewIp = false)... 2020-10-07T15:00:56.210-07:00 INFO DockerTestBase [main] Stopping docker container: c26c376ad6c897b3a659d0fb8d6f443b3ceac4347d754019d44c3836d6322ce5 2020-10-07T15:00:58.148-07:00 INFO DockerTestBase [main] Stopped docker container: c26c376ad6c897b3a659d0fb8d6f443b3ceac4347d754019d44c3836d6322ce5 2020-10-07T15:00:58.148-07:00 INFO DockerTestBase [main] Removing docker container: c26c376ad6c897b3a659d0fb8d6f443b3ceac4347d754019d44c3836d6322ce5 2020-10-07T15:00:58.210-07:00 INFO DockerTestBase [main] Removed docker container: c26c376ad6c897b3a659d0fb8d6f443b3ceac4347d754019d44c3836d6322ce5 2020-10-07T15:00:58.211-07:00 INFO DockerTestBase [main] Creating docker container (test dir = Couchbase/mad_hatter/analytics/cbas/cbas-test/cbas-docker-test/target/com.couchbase.analytics.test.docker.upgrade.misc.SecondaryCompositeIndexITD/upgradeOfflineAndDowngrade/0006_nc_build-docker.couchbase.com_couchbase_server-internal_6.6.1-9104, network = { name = cbas-docker-test-0001-2e43f5f6-162e-4c6e-98a6-66f298732ada, id = 43d9db2cdcc2c89d07dd3ecc64164010673062af4b93ed4bf3307dca7dd03b94 }) 2020-10-07T15:01:00.573-07:00 INFO DockerTestBase [main] Container information for 2297e4111fef57a230409d2be28d8f98ad4c4cffb5b76a08f5475686422a03c5: host: nc.couchbase.host, ip: 192.168.176.4, 127.0.0.1:34492->192.168.176.4:8091, 127.0.0.1:34488->192.168.176.4:8095 2020-10-07T15:01:26.858-07:00 INFO DockerTestBase [main] Waiting for NCs == 2 & Cluster state in {REBALANCE_REQUIRED} for up to 360s... 2020-10-07T15:01:26.978-07:00 INFO DockerTestBase [main] Waiting for 'compatibilityVersion promotion to 6.6.x' condition for up to 60 seconds... 2020-10-07T15:01:30.309-07:00 INFO UpgradeTestBase [main] Running 'rebalance' 2020-10-07T15:01:30.309-07:00 INFO DockerTestBase [main] Executing the command: docker exec d091656842ead482a83b56b484da13309007b20a79c41a8f05b4150815b4a5f8 /opt/couchbase/bin/couchbase-cli rebalance -c 192.168.176.5:8091 -u couchbase -p couchbase 2020-10-07T15:07:37.022-07:00 INFO DockerTestBase [main+] >> Unable to display progress bar on this os 2020-10-07T15:07:37.022-07:00 INFO DockerTestBase [main+] >> SUCCESS: Rebalance complete 2020-10-07T15:07:37.547-07:00 INFO DockerTestBase [main] Waiting for 'rebalance completion' condition for up to 60 seconds... 2020-10-07T15:07:37.617-07:00 INFO UpgradeTestBase [main] ... rebalance succeeded! 2020-10-07T15:07:37.618-07:00 INFO DockerTestBase [main] Waiting for NCs == 2 & Cluster state in {ACTIVE} for up to 360s... 2020-10-07T15:13:37.652-07:00 WARN DockerTestBase [main] Timed out waiting for condition: NCs == 2 (last count was 2) & Cluster [ACTIVE] (last state was REBALANCE_REQUIRED), attemptCount: 330
      

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9139 contains cbas commit 0a3ce96 with commit message:
            MB-41910: propagate rebalance failure when REBALANCE_REQUIRED

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9139 contains cbas commit 0a3ce96 with commit message: MB-41910 : propagate rebalance failure when REBALANCE_REQUIRED
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-3545 contains cbas commit 0a3ce96 with commit message:
            MB-41910: propagate rebalance failure when REBALANCE_REQUIRED

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3545 contains cbas commit 0a3ce96 with commit message: MB-41910 : propagate rebalance failure when REBALANCE_REQUIRED
            Hide
            michael.blow Michael Blow added a comment -

            Ali Alsuliman, please close if you can verify, otherwise I think in order to repro you would need to:

            • have a cluster of at least two nodes
            • have a secondary index which will be required to be rebuilt on something earlier than 6.6.1 (like 6.5.1)
            • perform offline upgrade of nodes to 6.6.1
            • after cluster starts, observe analytics service is in REBALANCE_REQUIRED state
            • start a rebalance and before it completes, kill the NC (not the CC) – NOTE: this will only work if ns_server didn't choose the NC as the rebalance coordinator
            • observe that rebalance fails, and in the UI log doesn't indicate it was due to the coordinator crashing, I think it will say REBALANCE_REQUIRED in the message, but off the top of my head I am not certain
            Show
            michael.blow Michael Blow added a comment - Ali Alsuliman , please close if you can verify, otherwise I think in order to repro you would need to: have a cluster of at least two nodes have a secondary index which will be required to be rebuilt on something earlier than 6.6.1 (like 6.5.1) perform offline upgrade of nodes to 6.6.1 after cluster starts, observe analytics service is in REBALANCE_REQUIRED state start a rebalance and before it completes, kill the NC (not the CC) – NOTE: this will only work if ns_server didn't choose the NC as the rebalance coordinator observe that rebalance fails, and in the UI log doesn't indicate it was due to the coordinator crashing, I think it will say REBALANCE_REQUIRED in the message, but off the top of my head I am not certain
            Hide
            ali.alsuliman Ali Alsuliman added a comment -

            Hi Umang,

            See the steps that Mike outlined. Can you try and check? Before the fix, you should get that the rebalance operation is successful, but Analytics is still in the REBALANCED_REQUIRED state because rebalancing Analytics didn't succeed. After the fix, you should get that the rebalance operation is not successful.

            Show
            ali.alsuliman Ali Alsuliman added a comment - Hi  Umang , See the steps that Mike outlined. Can you try and check? Before the fix, you should get that the rebalance operation is successful, but Analytics is still in the REBALANCED_REQUIRED state because rebalancing Analytics didn't succeed. After the fix, you should get that the rebalance operation is not successful.
            Hide
            umang.agrawal Umang added a comment -

            Verified with couchbase-server build 6.6.1-9194

            Following error is observed in the UI - 

            Rebalance exited with reason {service_rebalance_failed,cbas,{agent_died,<14078.442.0>,

            Show
            umang.agrawal Umang added a comment - Verified with couchbase-server build 6.6.1-9194 Following error is observed in the UI -  Rebalance exited with reason {service_rebalance_failed,cbas,{agent_died,<14078.442.0>,

              People

              Assignee:
              umang.agrawal Umang
              Reporter:
              ali.alsuliman Ali Alsuliman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty