Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3590

[Upgrade] : Rebalance failures and Failover during upgrade causes infinite pod creation and addition to cluster

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 2.7.0
    • 2.4.0, 2.4.1, 2.5.0, 2.6.0, 2.4.2, 2.4.3, 2.5.1, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.5.2
    • operator
    • Initial Cluster version : 7.0.5-7659
      Upgrade Cluster version : 7.2.3-6705
      Kubernetes Version : v1.30.0
      CAO and operator : 2.7.0 built locally
      Environment : Kind cluster
    • 15 - First Frontier
    • 0

    Description

      Cluster Setup

      • Kind cluster locally run on Mac
      • 2 nodes with all services
      • 1 bucket
      • Initial Cluster version : 7.0.5-7659
      • Upgrade Cluster version : 7.2.3-6705

      Steps taken in the scenario

      • Created a cluster
      • Created 1 bucket
      • Issued an upgrade from 7.0.5-7659 to 7.2.3-6705 using swap rebalance
      • Swap rebalance fails 
      • One of the nodes in version 7.0.5 is failed over by the operator due to memcached crash
      • Further a new pod is spun up and added to cluster. - Rebalance fails
      • New pods are created infinitely in a loop and added to cluster and rebalance continues to fail.

       


      Operator logs : 

       https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/cbopinfo-20240722T132821+0530.tar.gz

      Cluster logs : 
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0000.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0001.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0002.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0003.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0004.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0005.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0006.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/upgrade_pod_loop/collectinfo-2024-07-22T075306-ns_1%40cb-example-0007.cb-example.default.svc.zip


       The cao tool and operator images were built locally on this commit

      commit c2e920ddbcfa9b4819d47ad81d0a35c359dd1dc6 (HEAD -> master, origin/master, origin/HEAD)
      Author: usamah jassat <usamah.jassat@couchbase.com>
      Date:   Wed Jul 17 15:11:19 2024 +0100    K8S-3581: don't attempt backend migration when rebalance required
          
          Change-Id: I2d2b6d6d4f8dbb0a30db5bd54a05631d17631eee
          Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/212890
          Reviewed-by: Yusuf Ramzan <yusuf.ramzan@couchbase.com>
          Tested-by: Build Bot <build@couchbase.com>

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              raghav.sk Raghav S K
              raghav.sk Raghav S K
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty