Details
-
Bug
-
Resolution: User Error
-
Critical
-
7.2.4
-
Initial Version : Couchbase Community Edition build 7.2.0-5325
Upgrade Version : Couchbase Enterprise Edition build 7.2.4-7059
-
Untriaged
-
Linux x86_64
-
-
0
-
Yes
Description
- Installed Couchbase Community Edition 7.2.2-6401 and created a 3 node couchbase cluster with the following services
- Node 1 - 172.23.121.192 - kv, index,n1ql
- Node 2 - 172.23.121.100 - kv, index, n1ql, fts
- Node 3 - 172.23.121.189 - kv, index, n1ql, fts
- Created a couchstore bucket named default
- Created a fulltext-index index_default on 172.23.122.189
- Ran many fts queries using the fts index
- A new node 172.23.121.180 was installed and initialized with Couchbase Enterprise Edition 7.2.4-7059
- Started a swap rebalance on the cluster removing 172.23.121.192
- Cleaned and installed Couchbase Enterprise Edition 7.2.4-7059 on 172.23.121.192
- Started a swap rebalance on the cluster removing 172.23.121.100
Rebalance fails
2023-12-12T00:49:02.342-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.121.100) - Rebalance exited with reason {service_rebalance_failed,fts, {worker_died, {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Content-Length\":[]string{\"50\"}, \"Content-Type\":[]string{\"text/plain; charset=utf-8\"}, \"Date\":[]string{\"Tue, 12 Dec 2023 08:49:02 GMT\"}}, Body:(*http.bodyEOFSignal)(0xc00060fc80), ContentLength:50, TransferEncoding:[]string(nil), Close:false, Uncompressed:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc006ab2c00), TLS:(*tls.ConnectionState)(0xc000138840)}, urlUUID: monitor.UrlUUID{Url:\"https://172.23.121.192:18094\", UUID:\"3e30e19303ea45e6204de316e5dba41a\"}, kind: /api/stats?partitions=true, err: <nil>">>}}}}}.Rebalance Operation Id = e862c38ea50b8c690e43318f8d49c2b1 |
Observing these CRASH REPORTS on 172.23.121.100
[error_logger:error,2023-12-12T00:49:02.340-08:00,ns_1@172.23.121.100:service_rebalancer-fts-worker<0.3522.3>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: service_rebalancer:'-run_rebalance_worker/1-fun-0-'/0 pid: <0.3522.3> registered_name: 'service_rebalancer-fts-worker' exception exit: {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Conten"...>>}} in function service_rebalancer:wait_for_rebalance_completion_loop/2 (src/service_rebalancer.erl, line 178) ancestors: ['service_rebalancer-fts',<0.3710.2>] message_queue_len: 0 messages: [] links: [<0.3502.3>] dictionary: [] trap_exit: false status: running heap_size: 2586 stack_size: 29 reductions: 14875 neighbours: |
[ns_server:error,2023-12-12T00:49:02.340-08:00,ns_1@172.23.121.100:service_rebalancer-fts<0.3502.3>:service_rebalancer:run_rebalance_worker:119]Worker terminated abnormally: {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Content-Length\":[]string{\"50\"}, \"Content-Type\":[]string{\"text/plain; charset=utf-8\"}, \"Date\":[]string{\"Tue, 12 Dec 2023 08:49:02 GMT\"}}, Body:(*http.bodyEOFSignal)(0xc00060fc80), ContentLength:50, TransferEncoding:[]string(nil), Close:false, Uncompressed:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc006ab2c00), TLS:(*tls.ConnectionState)(0xc000138840)}, urlUUID: monitor.UrlUUID{Url:\"https://172.23.121.192:18094\", UUID:\"3e30e19303ea45e6204de316e5dba41a\"}, kind: /api/stats?partitions=true, err: <nil>">>}}}[error_logger:error,2023-12-12T00:49:02.341-08:00,ns_1@172.23.121.100:service_rebalancer-fts<0.3502.3>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: misc:'-spawn_monitor/1-fun-0-'/0 pid: <0.3502.3> registered_name: 'service_rebalancer-fts' exception exit: {worker_died, {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header"...>>}}}} in function service_rebalancer:run_rebalance/1 (src/service_rebalancer.erl, line 73) ancestors: [<0.3710.2>] message_queue_len: 0 messages: [] links: [] dictionary: [] trap_exit: false status: running heap_size: 6772 stack_size: 29 reductions: 14749 neighbours: |
[ns_server:error,2023-12-12T00:49:02.341-08:00,ns_1@172.23.121.100:service_agent-fts<0.3452.0>:service_agent:handle_info:234]Rebalancer <0.3502.3> died unexpectedly: {worker_died, {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Content-Length\":[]string{\"50\"}, \"Content-Type\":[]string{\"text/plain; charset=utf-8\"}, \"Date\":[]string{\"Tue, 12 Dec 2023 08:49:02 GMT\"}}, Body:(*http.bodyEOFSignal)(0xc00060fc80), ContentLength:50, TransferEncoding:[]string(nil), Close:false, Uncompressed:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc006ab2c00), TLS:(*tls.ConnectionState)(0xc000138840)}, urlUUID: monitor.UrlUUID{Url:\"https://172.23.121.192:18094\", UUID:\"3e30e19303ea45e6204de316e5dba41a\"}, kind: /api/stats?partitions=true, err: <nil>">>}}}}[ns_server:info,2023-12-12T00:49:02.341-08:00,ns_1@172.23.121.100:rebalance_agent<0.3084.0>:rebalance_agent:handle_down:290]Rebalancer process <0.3710.2> died (reason {service_rebalance_failed,fts, {worker_died, {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Content-Length\":[]string{\"50\"}, \"Content-Type\":[]string{\"text/plain; charset=utf-8\"}, \"Date\":[]string{\"Tue, 12 Dec 2023 08:49:02 GMT\"}}, Body:(*http.bodyEOFSignal)(0xc00060fc80), ContentLength:50, TransferEncoding:[]string(nil), Close:false, Uncompressed:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc006ab2c00), TLS:(*tls.ConnectionState)(0xc000138840)}, urlUUID: monitor.UrlUUID{Url:\"https://172.23.121.192:18094\", UUID:\"3e30e19303ea45e6204de316e5dba41a\"}, kind: /api/stats?partitions=true, err: <nil>">>}}}}}).[ns_server:debug,2023-12-12T00:49:02.341-08:00,ns_1@172.23.121.100:leader_activities<0.3056.0>:leader_activities:handle_activity_down:450]Activity terminated with reason {shutdown, {async_died, {raised, {exit, {service_rebalance_failed,fts, {worker_died, {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Content-Length\":[]string{\"50\"}, \"Content-Type\":[]string{\"text/plain; charset=utf-8\"}, \"Date\":[]string{\"Tue, 12 Dec 2023 08:49:02 GMT\"}}, Body:(*http.bodyEOFSignal)(0xc00060fc80), ContentLength:50, TransferEncoding:[]string(nil), Close:false, Uncompressed:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc006ab2c00), TLS:(*tls.ConnectionState)(0xc000138840)}, urlUUID: monitor.UrlUUID{Url:\"https://172.23.121.192:18094\", UUID:\"3e30e19303ea45e6204de316e5dba41a\"}, kind: /api/stats?partitions=true, err: <nil>">>}}}}}, [{ns_rebalancer, '-rebalance_topology_aware_service/4-fun-1-', 5, [{file,"src/ns_rebalancer.erl"}, {line,395}]}, {misc,with_trap_exit,1, [{file,"src/misc.erl"},{line,2714}]}, {ns_rebalancer, '-rebalance_topology_aware_services/4-fun-1-', 4, [{file,"src/ns_rebalancer.erl"}, {line,360}]}, {lists,filtermap,2, [{file,"lists.erl"},{line,1321}]}, {ns_rebalancer,rebalance_services,2, [{file,"src/ns_rebalancer.erl"}, {line,279}]}, {ns_rebalancer,rebalance_body,5, [{file,"src/ns_rebalancer.erl"}, {line,526}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"}, {line,191}]}]}}}}. Activity:{activity,<0.3709.2>,#Ref<0.1630041773.2127298564.210993>,default, <<"8b80d44e1e68411e2044cbdcb74c5f22">>, [rebalance], majority,[]}[error_logger:error,2023-12-12T00:49:02.342-08:00,ns_1@172.23.121.100:<0.3706.2>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.3706.2> registered_name: [] exception exit: {service_rebalance_failed,fts, {worker_died, {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMin"...>>}}}}} in function ns_rebalancer:'-rebalance_topology_aware_service/4-fun-1-'/5 (src/ns_rebalancer.erl, line 395) in call from misc:with_trap_exit/1 (src/misc.erl, line 2714) in call from ns_rebalancer:'-rebalance_topology_aware_services/4-fun-1-'/4 (src/ns_rebalancer.erl, line 360) in call from lists:filtermap/2 (lists.erl, line 1321) in call from ns_rebalancer:rebalance_services/2 (src/ns_rebalancer.erl, line 279) in call from ns_rebalancer:rebalance_body/5 (src/ns_rebalancer.erl, line 526) in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 191) ancestors: [<0.29804.1>,ns_orchestrator_child_sup,ns_orchestrator_sup, mb_master_sup,mb_master,leader_registry_sup, leader_services_sup,<0.3053.0>,ns_server_sup, ns_server_nodes_sup,<0.2744.0>,ns_server_cluster_sup, root_sup,<0.146.0>] message_queue_len: 0 messages: [] links: [<0.29804.1>] dictionary: [] trap_exit: false status: running heap_size: 1598 stack_size: 29 reductions: 1381 neighbours: |
[user:error,2023-12-12T00:49:02.342-08:00,ns_1@172.23.121.100:<0.29804.1>:ns_orchestrator:log_rebalance_completion:1433]Rebalance exited with reason {service_rebalance_failed,fts, {worker_died, {'EXIT',<0.3522.3>, {rebalance_failed, {service_error, <<"nodes: sample res.StatusCode not 200, res: &http.Response{Status:\"503 Service Unavailable\", StatusCode:503, Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Content-Length\":[]string{\"50\"}, \"Content-Type\":[]string{\"text/plain; charset=utf-8\"}, \"Date\":[]string{\"Tue, 12 Dec 2023 08:49:02 GMT\"}}, Body:(*http.bodyEOFSignal)(0xc00060fc80), ContentLength:50, TransferEncoding:[]string(nil), Close:false, Uncompressed:false, Trailer:http.Header(nil), Request:(*http.Request)(0xc006ab2c00), TLS:(*tls.ConnectionState)(0xc000138840)}, urlUUID: monitor.UrlUUID{Url:\"https://172.23.121.192:18094\", UUID:\"3e30e19303ea45e6204de316e5dba41a\"}, kind: /api/stats?partitions=true, err: <nil>">>}}}}}.Rebalance Operation Id = e862c38ea50b8c690e43318f8d49c2b1[ns_server:debug,2023-12-12T00:49:02.342-08:00,ns_1@172.23.121.100:<0.29804.1>:auto_rebalance:retry_rebalance:58]Retry rebalance is not enabled. Failed Rebalance with Id e862c38ea50b8c690e43318f8d49c2b1 will not be retried.[ns_server:debug,2023-12-12T00:49:02.343-08:00,ns_1@172.23.121.100:service_agent-fts<0.3452.0>:service_agent:cleanup_service:496]Cleaning up stale tasks:[[{<<"rev">>,<<"MTQ=">>}, {<<"id">>,<<"prepare:1e45761aa655cce953347f2d516ad8c5">>}, {<<"type">>,<<"task-prepared">>}, {<<"status">>,<<"task-running">>}, {<<"isCancelable">>,true}, {<<"progress">>,100}, {<<"description">>,<<"prepare topology change">>}, {<<"extra">>, {[{<<"topologyChange">>, {[{<<"id">>,<<"1e45761aa655cce953347f2d516ad8c5">>}, {<<"currentTopologyRev">>,null}, {<<"type">>,<<"topology-change-rebalance">>}, {<<"keepNodes">>, [{[{<<"nodeInfo">>, {[{<<"nodeId">>,<<"22d6c0d40bf022fd886ea82513d24795">>}, {<<"priority">>,0}, {<<"opaque">>,null}]}}, {<<"recoveryType">>,<<"recovery-full">>}]}, {[{<<"nodeInfo">>, {[{<<"nodeId">>,<<"3e30e19303ea45e6204de316e5dba41a">>}, {<<"priority">>,0}, {<<"opaque">>,null}]}}, {<<"recoveryType">>,<<"recovery-full">>}]}]}, {<<"ejectNodes">>, [{[{<<"nodeId">>,<<"722e13ec968b63a09feb6eae765bb6e8">>}, {<<"priority">>,0}, {<<"opaque">>,null}]}]}]}}]}}]][ns_server:debug,2023-12-12T00:49:02.366-08:00,ns_1@172.23.121.100:chronicle_kv_log<0.2812.0>:chronicle_kv_log:log:59]update (key: counters, rev: {<<"2a1e08cda0a3efb5ebfe0e5a718fbc43">>,525})[{rebalance_fail,{1702370942,1}}, {rebalance_start,{1702370732,4}}, {rebalance_success,{1702370578,3}}][ns_server:debug,2023-12-12T00:49:02.368-08:00,ns_1@172.23.121.100:<0.3001.0>:terse_cluster_info_uploader:handle_info:53]Refreshing terse cluster info with <<"{\"rev\":642,\"nodesExt\":[{\"services\":{\"capi\":8092,\"fts\":8094,\"ftsGRPC\":9130,\"indexAdmin\":9100,\"indexHttp\":9102,\"indexScan\":9101,\"indexStreamCatchup\":9104,\"indexStreamInit\":9103,\"indexStreamMaint\":9105,\"kv\":11210,\"mgmt\":8091,\"n1ql\":8093,\"projector\":9999},\"thisNode\":true,\"hostname\":\"172.23.121.100\"},{\"services\":{\"capi\":8092,\"capiSSL\":18092,\"indexAdmin\":9100,\"indexHttp\":9102,\"indexHttps\":19102,\"indexScan\":9101,\"indexStreamCatchup\":9104,\"indexStreamInit\":9103,\"indexStreamMaint\":9105,\"kv\":11210,\"kvSSL\":11207,\"mgmt\":8091,\"mgmtSSL\":18091,\"n1ql\":8093,\"n1qlSSL\":18093,\"projector\":9999},\"hostname\":\"172.23.121.180\"},{\"services\":{\"capi\":8092,\"fts\":8094,\"ftsGRPC\":9130,\"indexAdmin\":9100,\"indexHttp\":9102,\"indexScan\":9101,\"indexStreamCatchup\":9104,\"indexStreamInit\":9103,\"indexStreamMaint\":9105,\"kv\":11210,\"mgmt\":8091,\"n1ql\":8093,\"projector\":9999},\"hostname\":\"172.23.121.189\"},{\"services\":{\"capi\":8092,\"capiSSL\":18092,\"fts\":8094,\"ftsGRPC\":9130,\"ftsGRPCSSL\":19130,\"ftsSSL\":18094,\"kv\":11210,\"kvSSL\":11207,\"mgmt\":8091,\"mgmtSSL\":18091,\"projector\":9999},\"hostname\":\"172.23.121.192\"}],\"clusterCapabilitiesVer\":[1,0],\"clusterCapabilities\":{\"n1ql\":[\"enhancedPreparedStatements\"]},\"revEpoch\":1}">>[ns_server:debug,2023-12-12T00:49:02.480-08:00,ns_1@172.23.121.100:ns_config_log<0.266.0>:ns_config_log:log_common:275]config change:rebalance_reports ->[{'_vclock',[{<<"722e13ec968b63a09feb6eae765bb6e8">>,{1,63869589778}}, {<<"89c1bd5de68261e148ca1df22ef17432">>,{2,63869588669}}, {<<"af6906da7a32633087b639bb2cea7758">>,{1,63869590142}}]}, {<<"c42c72db63ae4573233135d5309d3719">>, [{node,'ns_1@172.23.121.180'}, {filename,"rebalance_report_20231212T084902.json"}]}, {<<"3fd428aa8e79f487eada7e1c683dcd1c">>, [{node,'ns_1@172.23.121.100'}, {filename,"rebalance_report_20231212T084258.json"}]}, {<<"af2369f00c39d6ce7d5a78929979bf4d">>, [{node,'ns_1@172.23.121.192'}, {filename,"rebalance_report_20231212T082429.json"}]}, {<<"61e0dbc06ea364d258466ff8eb809554">>, [{node,'ns_1@172.23.121.192'}, {filename,"rebalance_report_20231212T082338.json"}]}][ns_server:debug,2023-12-12T00:49:02.480-08:00,ns_1@172.23.121.100:ns_config_log<0.266.0>:ns_config_log:log_common:275]config change:{local_changes_count,<<"af6906da7a32633087b639bb2cea7758">>} ->[{'_vclock',[{<<"af6906da7a32633087b639bb2cea7758">>,{15,63869590142}}]}][ns_server:info,2023-12-12T00:49:02.499-08:00,ns_1@172.23.121.100:<0.10564.3>:diag_handler:log_all_dcp_stats:199]logging dcp stats[ns_server:debug,2023-12-12T00:49:02.499-08:00,ns_1@172.23.121.100:chronicle_kv_log<0.2812.0>:chronicle_kv_log:log:59]update (key: rebalance_status, rev: {<<"2a1e08cda0a3efb5ebfe0e5a718fbc43">>, 526}){none,<<"Rebalance failed. See logs for detailed reason. You can try again.">>} |
Marking this as a regression since this issue was not seen in 7.2.4-7014
Testrunner script to reproduce
./testrunner -i /data/workspace/debian-p0-upgrade-vset00-00-ce_ee-new-from-72x-2b/testexec.101666.ini -p skip_init_check_cbserver=true,released_upgrade_version=7.2.4-7059,get-cbcollect-info=false,data_analysis=True,initial_build_type=community,ce_to_ee=true,skip_log_scan=true,upgrade_test=true,get-cbcollect-info=True,bucket_storage=couchstore,sirius_url=http://172.23.120.103:4000 -t upgrade.upgrade_tests.UpgradeTests.test_upgrade,items=20000,initial_version=7.2.0-5325,nodes_init=3,initialize_events=create_fts_index_query_compare,initial-services-setting=kv,index,n1ql-kv,n1ql,index,fts-kv,index,n1ql,fts,upgrade_services_in=same,after_events=rebalance_in-run_fts_query_and_compare,after_upgrade_services_in=kv,index,n1ql,fts,disable_HTP=True,upgrade_test=True,init_nodes=False |
Job name : debian-upgrade-ce_ee-new-from-72x-2b