Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
7.6.0
-
Operating System : Debian
Initial Version : Couchbase Enterprise Edition build 7.1.1-3175
Upgrade Version : Couchbase Enterprise Edition build 7.6.0-2149
-
Untriaged
-
Linux x86_64
-
-
0
-
Yes
Description
Steps to reproduce
- Created a cluster on Couchbase Enterprise Edition build 7.1.1-3175 with the following setup
- 172.23.122.233 - cbas
- 172.23.122.222 - index, kv, n1ql
- 172.23.122.195 - index, kv, n1ql
- 172.23.122.207 - cbas
- 172.23.122.232 - cbas
- Created a bucket called "bucket-0"
- Loaded 10000 items onto it
- Created dataverses, links, datasets, synonyms, indexes
- Upgraded the whole cluster to 7.6.0-2149 by swap rebalancing
- The cluster at the end of upgrade is
- 172.23.122.194 - index, kv, n1ql
- 172.23.122.222- cbas
- 172.23.122.195 - cbas
- 172.23.122.207 - index, kv, n1ql
- 172.23.122.232 - cbas
- Started a rebalance post upgrade - Rebalance succeeds
Post that requests are getting rejected
Logs 172.23.122.222 - ns_server.analytics_access.log
172.23.106.205 - Administrator [19/Feb/2024:02:16:50 -0800] "GET /analytics/cluster HTTP/1.1" 503 0 - "python-requests/2.24.0"172.23.106.205 - Administrator [19/Feb/2024:02:16:52 -0800] "POST /analytics/service HTTP/1.1" 503 448 - "python-requests/2.24.0" |
Observing on 172.23.122.222 - ns_server.analytics_error.log
2024-02-19T02:16:22.512-08:00 ERRO CBAS.rebalance.Rebalance [Rebalancer (3872218fb1c66d12a827c641947edc1f)] Rebalance 3872218fb1c66d12a827c641947edc1f failedjava.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [172.23.122.195:8091 (4d1ac180b136386c3a43feb317990027)], state: UNUSABLE) at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:580) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:748) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:240) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:201) [cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:88) [cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.util.WriteLockCallable.call(WriteLockCallable.java:40) [cbas-common-7.6.0-2149.jar:7.6.0-2149] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.base/java.lang.Thread.run(Thread.java:840) [?:?]2024-02-19T02:16:23.216-08:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-4] Rebalance 3872218fb1c66d12a827c641947edc1f failedjava.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [172.23.122.195:8091 (4d1ac180b136386c3a43feb317990027)], state: UNUSABLE) at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:580) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:748) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:240) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:201) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:88) ~[cbas-server-7.6.0-2149.jar:7.6.0-2149] at com.couchbase.analytics.util.WriteLockCallable.call(WriteLockCallable.java:40) ~[cbas-common-7.6.0-2149.jar:7.6.0-2149] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
|
Also want to understand how rebalance succeeded when we see failed in the above logs
Marking this as a blocker since this is affecting all upgrade tests involving cbas irrespective of initial version. It is a regression since this behaviour was not observed during runs for RC4 - 7.6.0-2119
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /data/workspace/debian-p0-analytics-vset00-00-analytics_upgrade_from_7.1.1_with_collections/testexec.5802.ini -p GROUP=7_1_1;online_upgrade,kv_quota_percent=70,bucket_storage=couchstore,key=test_collections,get-cbcollect-info=True,upgrade_version=7.6.0-2149,aws_access_key=xxxxxxx,aws_secret_key=xxxxxx,sirius_url=http://172.23.120.103:4000 -t upgrade.cbas_upgrade.UpgradeTests.test_upgrade,upgrade_chain=7.1.1,upgrade_type=online_swap,update_nodes=kv;cbas,nodes_init=5,services_init=kv:index:n1ql-kv:index:n1ql-cbas-cbas-cbas,pre_update_no_of_dv=2,pre_update_ds_per_dv=4,pre_update_no_of_synonym=5,pre_update_no_of_index=3,replica_num=3,override_spec_params=num_buckets;num_scopes;num_collections;replicas;num_items,num_items=10000,num_buckets=3,num_scopes=5,num_collections=5,no_of_dv=10,ds_per_dv=3,no_of_synonym=10,no_of_index=5,GROUP=7_1_1;online_upgrade'
Job : debian-analytics-analytics_upgrade_from_7.1.1_with_collections
Attachments
Issue Links
- duplicates
-
MB-60840 Intermittent failure during authentication handshake failed initial rebalance
- Closed