Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Couchbase Enterprise Edition build 7.6.0-1878
-
Untriaged
-
Linux x86_64
-
-
0
-
Unknown
Description
Steps to repro
- Create a single node cluster on ipv6 with services backup, fts, index, kv, n1ql
- Added a second IPV6 node with same set of services
Rebalance fails
2023-12-05T06:07:33.632-08:00, ns_orchestrator:0:critical:message(ns_1@s10505-ip6.qe.couchbase.com) - Rebalance exited with reason {service_rebalance_failed,backup, {{badmatch, {error, {bad_nodes,backup,set_service_manager, [{'ns_1@s10501-ip6.qe.couchbase.com', {exit, {{linked_process_died,<35160.3685.0>, {'ns_1@s10501-ip6.qe.couchbase.com', {no_connection,"backup-service_api"}}}, {gen_server,call, [{'service_agent-backup', 'ns_1@s10501-ip6.qe.couchbase.com'}, {set_service_manager,<0.3996.0>}, infinity]}}}}]}}}, [{service_manager,set_service_manager,1, [{file,"src/service_manager.erl"}, {line,188}]}, {service_manager,run_op,1, [{file,"src/service_manager.erl"}, {line,146}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,225}]}]}}.Rebalance Operation Id = e0896f4384930db698c12d2f5b1a20c2 |
Observing many backup service shutdown and restarts in ns_server.debug.log
[ns_server:debug,2023-12-05T06:07:22.400-08:00,ns_1@s10501-ip6.qe.couchbase.com:json_rpc_connection-backup-cbauth<0.5742.0>:json_rpc_connection:init:71]Observed revrpc connection: label "backup-cbauth", handling process <0.5742.0>[ns_server:debug,2023-12-05T06:07:22.400-08:00,ns_1@s10501-ip6.qe.couchbase.com:menelaus_cbauth<0.591.0>:menelaus_cbauth:handle_cast:201]Observed json rpc process {"backup-cbauth",[{internal,true}],<0.5742.0>} started[ns_server:debug,2023-12-05T06:07:29.907-08:00,ns_1@s10501-ip6.qe.couchbase.com:json_rpc_connection-backup-cbauth<0.5742.0>:json_rpc_connection:handle_info:142]Socket closed[ns_server:debug,2023-12-05T06:07:29.908-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.5744.0>:ns_pubsub:do_subscribe_link_continue:150]Parent process of subscription {chronicle_compat_event_manager,<0.5742.0>} exited with reason shutdown[ns_server:debug,2023-12-05T06:07:29.909-08:00,ns_1@s10501-ip6.qe.couchbase.com:menelaus_cbauth<0.591.0>:menelaus_cbauth:handle_info:254]Observed json rpc process {rpc_process,"backup-cbauth",internal, #Ref<0.641258218.2764832771.222624>,undefined, -576460632908} died with reason shutdown[user:info,2023-12-05T06:07:29.909-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.431.0>:ns_log:consume_log:76]Service 'backup' exited with status 1. Restarting. Messages:2023-12-05T06:07:22.404-08:00 DEBUG (REST) (Attempt 1) (GET) (200) Received response from 'http://[::1]:8091/pools/default/nodeServices'2023-12-05T06:07:24.905-08:00 WARN (REST) (Attempt 8891712) Failed to get credentials due to error: address ::1:8091: too many colons in address2023-12-05T06:07:27.406-08:00 WARN (REST) (Attempt 8891712) Failed to get credentials due to error: address ::1:8091: too many colons in address2023-12-05T06:07:29.906-08:00 WARN (REST) (Attempt 1) (GET) Request to endpoint '/pools' failed due to error: failed to prepare request: failed to set auth headers: exhausted retry count after 3 attempts: address ::1:8091: too many colons in address2023-12-05T06:07:29.906-08:00 ERROR (Main) Failed to run node {"err": "could not create REST client: failed to get cluster information: failed to get cluster metadata: failed to execute request: failed to execute request: exhausted retry count after 3 retries, last error: failed to set auth headers: exhausted retry count after 3 attempts: address ::1:8091: too many colons in address"} |
|
Observing CRASH reports for backup at the time of failure in ns_server.debug.log
[ns_server:debug,2023-12-05T06:07:30.141-08:00,ns_1@s10501-ip6.qe.couchbase.com:json_rpc_connection-backup-cbauth<0.6058.0>:json_rpc_connection:init:71]Observed revrpc connection: label "backup-cbauth", handling process <0.6058.0>[ns_server:debug,2023-12-05T06:07:30.141-08:00,ns_1@s10501-ip6.qe.couchbase.com:menelaus_cbauth<0.591.0>:menelaus_cbauth:handle_cast:201]Observed json rpc process {"backup-cbauth",[{internal,true}],<0.6058.0>} started[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_daemon:process_scheduler_message:1316]No buckets to compact for compact_kv. Rescheduling compaction.[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_scheduler:schedule_next:51]Finished compaction for compact_kv too soon. Next run will be in 30s[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_daemon:process_scheduler_message:1316]No buckets to compact for compact_views. Rescheduling compaction.[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_scheduler:schedule_next:51]Finished compaction for compact_views too soon. Next run will be in 30s[ns_server:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.3685.0>:service_agent:wait_for_connection_loop:387]No connection with label "backup-service_api" after 60000ms. Exiting.[error_logger:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.3685.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: service_agent:'-spawn_connection_waiter/2-fun-0-'/0 pid: <0.3685.0> registered_name: [] exception exit: {no_connection,"backup-service_api"} in function service_agent:wait_for_connection_loop/3 (src/service_agent.erl, line 389) ancestors: ['service_agent-backup',service_agent_children_sup, service_agent_sup,ns_server_sup,ns_server_nodes_sup, <0.289.0>,ns_server_cluster_sup,root_sup,<0.155.0>] message_queue_len: 0 messages: [] links: [<0.3684.0>,<0.3687.0>] dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 28 reductions: 2854 neighbours: |
[ns_server:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:service_agent:handle_info:320]Linked process <0.3685.0> died with reason {no_connection, "backup-service_api"}. Terminating[ns_server:debug,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.3687.0>:ns_pubsub:do_subscribe_link_continue:150]Parent process of subscription {json_rpc_events,<0.3685.0>} exited with reason {no_connection, "backup-service_api"}[ns_server:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:service_agent:terminate:350]Terminating abnormally[error_logger:error,2023-12-05T06:07:33.630-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server 'service_agent-backup' terminating ** Last message in was {'EXIT',<0.3685.0>, {no_connection,"backup-service_api"}}** When Server state == {state,backup, {dict,4,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[], [[{uuid,<<"b2920bb0d6ee8cbde3ea5d0bfde3b54f">>}| 'ns_1@s10505-ip6.qe.couchbase.com']], [],[],[], [[{node,'ns_1@s10501-ip6.qe.couchbase.com'}| <<"37945d5df8d57072eff2147d32728ad3">>], [{node,'ns_1@s10505-ip6.qe.couchbase.com'}| <<"b2920bb0d6ee8cbde3ea5d0bfde3b54f">>]], [],[],[],[],[],[],[],[], [[{uuid, <<"37945d5df8d57072eff2147d32728ad3">>}| 'ns_1@s10501-ip6.qe.couchbase.com']]}}}, undefined,undefined,<35530.3996.0>, #Ref<0.641258218.2764832771.219692>,<0.3750.0>, {[{<35530.4004.0>, [alias|#Ref<35530.2355886961.2765422594.254650>]}], []}, undefined,undefined,undefined,undefined,undefined, undefined}** Reason for termination ==** {linked_process_died,<0.3685.0>, {'ns_1@s10501-ip6.qe.couchbase.com', {no_connection,"backup-service_api"}}} |
[error_logger:error,2023-12-05T06:07:33.630-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: service_agent:init/1 pid: <0.3684.0> registered_name: 'service_agent-backup' exception exit: {linked_process_died,<0.3685.0>, {'ns_1@s10501-ip6.qe.couchbase.com', {no_connection,"backup-service_api"}}} in function gen_server:handle_common_reply/8 (gen_server.erl, line 1241) ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup, ns_server_nodes_sup,<0.289.0>,ns_server_cluster_sup, root_sup,<0.155.0>] message_queue_len: 1 messages: [{'EXIT',<0.3750.0>, {linked_process_died,<0.3685.0>, {'ns_1@s10501-ip6.qe.couchbase.com', {no_connection,"backup-service_api"}}}}] links: [<0.3686.0>,<0.604.0>] dictionary: [] trap_exit: true status: running heap_size: 10958 stack_size: 28 reductions: 30699 neighbours: |
Testrunner script to reproduce
./testrunner -i /data/workspace/debian-p0-ipv6-vset00-00-sanity-mix/testexec.17578.ini -p get-cbcollect-info=False,enable_ipv6=True,get-cbcollect-info=True,get-cbcollect-info=True,sirius_url=http://172.23.120.103:4000 -t xdcr.uniXDCR.unidirectional.load_with_ops,items=5000,ctopology=chain,rdirection=unidirection,update=C1,delete=C1
Job name : debian-ipv6_sanity-mix
Job ref link : http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.6.0-1878/jenkins_logs/test_suite_executor/649150/
Attachments
For Gerrit Dashboard: MB-60018 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
202372,4 | MB-60018 Wrap IPv6 addrs in auth provider correctly | master | cbbs | Status: MERGED | +2 | +1 |