Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3586

Operator ejects wrong pod if swap rebalance fails during couchstore to magma migration

    XMLWordPrintable

Details

    • 19 - A Rock and a Hard Place
    • 5

    Description

      Cluster Setup

      • Kind cluster locally run on Mac
      • 5 nodes with all services
      • 2 buckets
      • Cluster version : 7.6.0-2176

      Steps taken in the scenario

      • Created a cluster
      • Created 2 buckets
      • Changed the storage backend of one of the buckets in cluster from couchstore to magma
      • Swap rebalance was successfully completed for 4 pods.
      • During swap rebalance of 5th pod (cb-example-0000.cb-example.default.svc replaced with cb-example-0009.cb-example.default.svc), killed memcached on a pod, rebalance fails
      • Post this, it retries the rebalance by ejecting cb-example-0009.cb-example.default.svc instead of ejecting cb-example-0000.cb-example.default.svc
      • New pod is created ( cb-example-0010.cb-example.default.svc)
      • During swap rebalance of 5th pod (cb-example-0000.cb-example.default.svc replaced with cb-example-0010.cb-example.default.svc), killed memcached on a pod, rebalance fails
      • Post this, it retries the rebalance by ejecting cb-example-0010.cb-example.default.svc instead of ejecting cb-example-0000.cb-example.default.svc
      • New pod is created ( cb-example-0011.cb-example.default.svc)
      • Swap rebalance of this succeeds as no chaos action occurs(memcached kill is not manually done this time).

      Issue

      • When swap rebalance fails, it should retry the swap rebalance and not eject the newly added pod

      Operator logs:

       https://cb-engineering.s3.amazonaws.com/K8S-3586/cbopinfo-20240718T080208+0530.tar.gz

      Cluster logs:
      https://cb-engineering.s3.amazonaws.com/K8S-3586/collectinfo-2024-07-18T023307-ns_1%40cb-example-0005.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/K8S-3586/collectinfo-2024-07-18T023307-ns_1%40cb-example-0006.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/K8S-3586/collectinfo-2024-07-18T023307-ns_1%40cb-example-0007.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/K8S-3586/collectinfo-2024-07-18T023307-ns_1%40cb-example-0008.cb-example.default.svc.zip
      https://cb-engineering.s3.amazonaws.com/K8S-3586/collectinfo-2024-07-18T023307-ns_1%40cb-example-0011.cb-example.default.svc.zip


      The cao tool and operator images were built locally on this commit

      commit c2e920ddbcfa9b4819d47ad81d0a35c359dd1dc6 (HEAD -> master, origin/master, origin/HEAD)
      Author: usamah jassat <usamah.jassat@couchbase.com>
      Date:   Wed Jul 17 15:11:19 2024 +0100    K8S-3581: don't attempt backend migration when rebalance required
          
          Change-Id: I2d2b6d6d4f8dbb0a30db5bd54a05631d17631eee
          Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/212890
          Reviewed-by: Yusuf Ramzan <yusuf.ramzan@couchbase.com>
          Tested-by: Build Bot <build@couchbase.com>

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            justin.ashworth Justin Ashworth
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty