Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3608

New pod added before failed over node is ejected and cluster is stabilized

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • 2.8.0
    • 2.7.0
    • operator
    • Initial Cluster version : 7.0.5-7659
      Upgrade Cluster version : 7.2.3-6705
      Kubernetes Version : v1.30.0
      CAO and operator : 2.7.0 built locally
      Environment : Kind cluster
    • 18 -Lost to Eternity
    • 2

    Description

      Cluster Setup

      • Kind cluster locally run on Mac
      • 2 nodes with kv,index,n1ql
      • 1 bucket
      • Initial Cluster version : 7.0.5-7659
      • Upgrade Cluster version : 7.2.3-6705

      Steps taken in the scenario

      • Created a cluster
      • Created 1 bucket
      • Issued an upgrade from 7.0.5-7659 to 7.2.3-6705 using swap rebalance
      • Swap rebalance fails when cb-example-0002 tries to replace cb-example-0001
      • cb-example-0001 in version 7.0.5 is failed over by the operator due to memcached crash.
      • The pod is recovered and is inactiveAdded state.
      • Further a new pod is spun up and added to cluster - cb-example-0003
      • Rebalance attempts to remove  cb-example-0001 and cb-example-0000 which is basically both failed over pod and existing pod.

      Issue

      • When the first swap rebalance has failed, that should be rectified and cluster should be in balanced state when new pod is added.
      • Trying to add new pods to unbalanced clusters can cause multitude of failures and is not the recommended practice.

      Operator logs : https://cb-engineering.s3.amazonaws.com/K8S-3590/cbopinfo-20240802T100337+0530.tar.gz

      Cluster logs : 
      https://cb-engineering.s3.amazonaws.com/K8S-3590/collectinfo-2024-08-02T035818-ns_1%40cb-example-0000.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/K8S-3590/collectinfo-2024-08-02T035818-ns_1%40cb-example-0001.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/K8S-3590/collectinfo-2024-08-02T035818-ns_1%40cb-example-0002.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/K8S-3590/collectinfo-2024-08-02T035818-ns_1%40cb-example-0003.cb-example.default.svc.zip
      Cluster deployment : https://cb-engineering.s3.amazonaws.com/K8S-3590/couchbase-cluster.yaml

      Cluster upgrade deployment : https://cb-engineering.s3.amazonaws.com/K8S-3590/couchbase-cluster-upgraded.yaml

      Bucket deployment : https://cb-engineering.s3.amazonaws.com/K8S-3590/couchbase-buckets.yaml


       The cao tool and operator images were built locally on this commit

      commit 16a1e4fe24d2a7791836b4add7a24962e3c56b8a(HEAD -> 2.7.x, origin/2.7.x)
      Author: usamah jassat <usamah.jassat@couchbase.com>
      Date:   Thu Aug 1 11:21:00 2024 +0100
       
          K8S-3602: Fix openshift tests
          
          Change-Id: I1f5ee7f1a2b7668cf8cfbcc04e3e5f8b2addb73f
          Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/213577
          Tested-by: Build Bot <build@couchbase.com>
          Reviewed-by: Justin Ashworth <justin.ashworth@couchbase.com>
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            justin.ashworth Justin Ashworth
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty