Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62099

Node registration hang in storage cleanup leads to rebalance failures

    XMLWordPrintable

Details

    Description

      2024-05-30T01:11:27.216+00:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-1] Rebalance 358853a7f9e4928b7424959b0434aace failed
      java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [svc-da-node-003.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com:8091 (a6ec405867e0cfefb2d902dab5f21473)], state: UNUSABLE)
              at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:631) ~[columnar-server.jar:1.0.0-2117]
              at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:805) ~[columnar-server.jar:1.0.0-2117]
              at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:242) ~[columnar-server.jar:1.0.0-2117]
              at com.couchbase.analytics.control.rebalance.Rebalance.runRebalance(Rebalance.java:202) ~[columnar-server.jar:1.0.0-2117]
              at com.couchbase.analytics.util.LockedCallable$2.doCall(LockedCallable.java:62) ~[columnar-common.jar:1.0.0-2117]
              at com.couchbase.analytics.util.LockedCallable.call(LockedCallable.java:75) ~[columnar-common.jar:1.0.0-2117]
              at com.couchbase.analytics.util.LockedCallable$1.doCall(LockedCallable.java:42) ~[columnar-common.jar:1.0.0-2117]
              at com.couchbase.analytics.util.LockedCallable.call(LockedCallable.java:75) ~[columnar-common.jar:1.0.0-2117]
              at com.couchbase.analytics.control.rebalance.Rebalance.lambda$start$11(Rebalance.java:518) ~[columnar-server.jar:1.0.0-2117]
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
              at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      2024-05-30T01:11:27.221+00:00 WARN CBAS.cbas Error making cluster state request: Get "https://svc-da-node-001.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com:9111/analytics/cluster": tls: failed to verify certificate: x509: certificate signed by unknown authority
      2024-05-30T01:11:27.221+00:00 INFO CBAS.cbas requesting isBalanced for 358853a7f9e4928b7424959b0434aace from driver
      2024-05-30T01:11:27.222+00:00 INFO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-2] +post request: {"nodes":[{"nodeId":"447f164df3f2fff2d7057fb0c7f9962d","priority":281474976710912,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"0","host":"svc-da-node-001.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"0","svc-http-port":"8095"}},{"nodeId":"b674db8e501e016ace5c4abdc266c906","priority":281474976710656,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"1","host":"svc-da-node-002.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"16","svc-http-port":"8095"}},{"nodeId":"a6ec405867e0cfefb2d902dab5f21473","priority":281474976710656,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"2","host":"svc-da-node-003.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"32","svc-http-port":"8095"}},{"nodeId":"46c7268352b256be81a47adff8aaa9a8","priority":281474976710656,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"3","host":"svc-da-node-004.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"48","svc-http-port":"8095"}},{"nodeId":"c3c5cd54665b926211e1428e41f5f01b","priority":281474976710656,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"4","host":"svc-da-node-005.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"64","svc-http-port":"8095"}},{"nodeId":"658a188705407574b90c67a556934a96","priority":281474976710656,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"5","host":"svc-da-node-006.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"80","svc-http-port":"8095"}},{"nodeId":"1e8fe380d5729c2b0d55de0afd85a400","priority":281474976710656,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"6","host":"svc-da-node-007.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"96","svc-http-port":"8095"}},{"nodeId":"8a8bae256861e865ee0252704600c56b","priority":281474976710656,"opaque":{"cbas-version":"1.0.0-2117","cc-http-port":"9111","controller-id":"7","host":"svc-da-node-008.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com","ns-server-port":"8091","num-iodevices":"16","server-group-uri":"/pools/default/serverGroups/e849f755c4fcdd3adfdf719d5621a1b6","starting-partition-id":"112","svc-http-port":"8095"}}],"id":"358853a7f9e4928b7424959b0434aace","type":"topology-change-rebalance","ccNodeId":"447f164df3f2fff2d7057fb0c7f9962d","metadataNodeId":"447f164df3f2fff2d7057fb0c7f9962d","metadataPartition":-1,"rev":9,"configVersion":1,"balanceState":"unknown","keepNodesUpdated":false,"keepNodes":["447f164df3f2fff2d7057fb0c7f9962d","b674db8e501e016ace5c4abdc266c906","a6ec405867e0cfefb2d902dab5f21473","46c7268352b256be81a47adff8aaa9a8","c3c5cd54665b926211e1428e41f5f01b","658a188705407574b90c67a556934a96","1e8fe380d5729c2b0d55de0afd85a400","8a8bae256861e865ee0252704600c56b"],"inPlaceNumReplicas":0,"balanceStateClusterCompat":458758,"orchestratorNodeId":"447f164df3f2fff2d7057fb0c7f9962d","serverGroupsVersion":67168263,"ejectNodes":[]}
       
      2024-05-30T01:11:27.221+00:00 WARN CBAS.server.HttpServerHandler [nioEventLoopGroup-3-5] Failure handling HTTP Request
      io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate
      

      2024-05-30T04:20:50.693+00:00 WARN CBAS.server.QueryServiceServlet [HttpExecutor(port:18095)-7] handleException: ASX0032: Cannot execute request, cluster is UNUSABLE: {"host":"svc-da-node-001.gsrvnvzgw1-2vk8m.sandbox.nonprod-project-avengers.com:18095","path":"/analytics/service","statement":"<ud>select count(*) cnt from remote_2LeC9_volCollection_0_smfjs;</ud>","pretty":false,"mode":"immediate","clientContextID":"2d399898-27d9-4998-a914-e8eb15850c49","clientType":"ASTERIX","dataverse":null,"format":"CLEAN_JSON","timeout":36000000,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"profile":"counts","signature":true,"multiStatement":false,"parseOnly":false,"readOnly":false,"maxWarnings":0,"sqlCompat":false,"source":null,"scanConsistency":"not_bounded","scanWait":null}
      

      cc: Shelby Ramsey

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty