Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3503

[operator:2.6.4-113] Rebalance failure during K8S upgrade on worker nodes [1.25->1.26]].

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.6.4
    • 2.6.4
    • operator
    • None
    • 11 - Race to Crashpoint Tower
    • 2

    Description

      Couchbase Cluster Description

      • Set up the cluster as per the required specifications
      • Each node is an m5.4xlarge instance. (16 vCPUs and 64GB RAM)
      • 6 Data Service, 4 Index Service and Query Service Nodes.
      • 10 Buckets (with 1 replica), Full Eviction and Auto-failover set to 5s.
      • ~100 data per bucket → ~1TB data loaded onto cluster.
      • 50 Primary Indexes with 1 Replica each. (Total 100 Indexes)
      • Continuous data and query workload on all buckets during the update process.

      Task : Upgrade EKS 1.25 -> 1.26

      Observation - 

      Rebalance exited with reason {prepare_delta_recovery_failed, ["bucket7","bucket4","bucket1","bucket8", "bucket5","bucket2","bucket9","bucket6", "bucket3","bucket0"], {error, {failed_nodes, [{'ns_1@cb-example-0004.cb-example.default.svc', {error, {exit, {{nodedown, 'ns_1@cb-example-0004.cb-example.default.svc'}, {gen_server,call, [{rebalance_agent, 'ns_1@cb-example-0004.cb-example.default.svc'}, {prepare_delta_recovery,<0.32181.8>, ["bucket7","bucket4","bucket1", "bucket8","bucket5","bucket2", "bucket9","bucket6","bucket3", "bucket0"]}, infinity]}}}}}]}}}. Rebalance Operation Id = 31215c9647541fcf981c9f1989644c13 hidens_orchestrator 000ns_1@cb-example-0000.cb-example.default.svc 7:39:45 AM 22 May, 2024

      k8s-3486-after-control-plane-upgrade → http://supportal.couchbase.com/snapshot/234b418a152b49ac9612e8c8272189fb::0([http://supportal.couchbase.com/snapshot/234b418a152b49ac9612e8c8272189fb::0])

      Logs were successfully uploaded to the following URLs:

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0000.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0000.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0001.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0001.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0002.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0002.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0003.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0003.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0004.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0004.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0005.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0005.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0006.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0006.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0007.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0007.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0008.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0008.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0009.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-after-control-plane-upgrade/collectinfo-2024-05-22T062020-ns_1%40cb-example-0009.cb-example.default.svc.zip])

       

       

       

      CB logs during worker node upgrades :

      Rebalance exited with reason {prepare_delta_recovery_failed, ["bucket7","bucket4","bucket1","bucket8", "bucket5","bucket2","bucket9","bucket6", "bucket3","bucket0"], {error, {failed_nodes, [{'ns_1@cb-example-0004.cb-example.default.svc', {error, {exit, {{nodedown, 'ns_1@cb-example-0004.cb-example.default.svc'}, {gen_server,call, [
      {rebalance_agent, 'ns_1@cb-example-0004.cb-example.default.svc'}
      , {prepare_delta_recovery,<0.32181.8>, ["bucket7","bucket4","bucket1", "bucket8","bucket5","bucket2", "bucket9","bucket6","bucket3", "bucket0"]}, infinity]}}}}}]}}}. Rebalance Operation Id = 31215c9647541fcf981c9f1989644c13 hidens_orchestrator 000ns_1@cb-example-0000.cb-example.default.svc
      7:39:45 AM 22 May, 2024
      

       

      Logs were successfully uploaded to the following URLs:
      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0000.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0000.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0001.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0001.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0002.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0002.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0003.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0003.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0004.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0004.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0005.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0005.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0006.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0006.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0007.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0007.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0008.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0008.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0009.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-during-eks-worker-node-upgrade/collectinfo-2024-05-22T073106-ns_1%40cb-example-0009.cb-example.default.svc.zip])

       

       

       

      CB logs post worker node upgrade :

      cb log - https://supportal.couchbase.com/snapshot/234b418a152b49ac9612e8c8272189fb::0

      Logs were successfully uploaded to the following URLs:

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0000.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0000.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0001.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0001.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0002.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0002.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0003.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0003.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0004.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0004.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0005.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0005.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0006.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0006.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0007.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0007.cb-example.default.svc.zip])

      https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0008.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0008.cb-example.default.svc.zip])

      [https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0009.cb-example.default.svc.zip([https://cb-engineering.s3.amazonaws.com/K8S-3486-post-worker-node-upgdrade/collectinfo-2024-05-22T084916-ns_1%40cb-example-0009.cb-example.default.svc.zip])]

      Operator logs after EKS control plane nodes upgrade:-
      cbopinfo-2.6.4-113-after-eks-control-plane-upgrade.tar.gz

      Operator logs after EKS worker nodes upgrade:-

      cbopinfo-after-eks-worker-node-upgrade.tar.gz

      Attachments

        Issue Links

          For Gerrit Dashboard: K8S-3503
          # Subject Branch Project Status CR V

          Activity

            People

              manik.mahajan Manik Mahajan
              manik.mahajan Manik Mahajan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty