Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3581

A rebalance failure during couchstore to magma migration with swap rebalance creates new pods and nodes on a loop

    XMLWordPrintable

Details

    • 14 - Last Shot
    • 3

    Description

      Cluster Setup

      • Kind cluster locally run on Mac
      • 5 nodes with all services
      • 1 bucket
      • Cluster version : 7.6.1

      Steps taken in the scenario

      • Created a cluster
      • Created a bucket
      • Issued a deployment to change the bucket storage backend from couchstore to magma
      • After the first few swap rebalances for 4 nodes, failed rebalance manually by killing memcached on one pod
      • Instead of retrying the rebalance, operator creates new pod and adds to the cluster.
      • The rebalance was failed manually in a loop by killing memcached and with each rebalance failure, a new pod was created and added
      • Around 15+ pods are created and added to cluster

      Operator logs:

      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/cbopinfo-20240717T150023+0530.tar.gz

      Cluster logs:
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0004.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0005.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0006.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0007.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0008.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0009.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0010.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0011.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0012.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0013.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/couchstore_magma_issue/collectinfo-2024-07-17T092619-ns_1%40cb-example-0014.cb-example.default.svc.zip


      The cao tool and operator images were built locally on this commit

      commit e00cf70597dbc0a7422c82f0efd0a1a28f75bfcd (HEAD -> master, origin/master, origin/HEAD)
      Author: usamah jassat <usamah.jassat@couchbase.com>
      Date:   Thu Jul 11 15:55:19 2024 +0100    K8S-3564: fix TestServerGroupRescheduling when more SGs
          
          Change-Id: I13dabc775ad8f47e6f9f89b3445a19a4dd28112e
          Reviewed-on: [https://review.couchbase.org/c/couchbase-operator/+/212585]
          Reviewed-by: Justin Ashworth <justin.ashworth@couchbase.com>
          Tested-by: Build Bot <build@couchbase.com>

      Attachments

        Activity

          People

            raghav.sk Raghav S K
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty