Description
What's the issue?
I can't currently setup a cluster_run cluster with two nodes running the backup service, the rebalance fails.
Steps to reproduce
1) cluster_run --nodes 2 --dont-rename
2) cluster_connect -n 2 -s 1024 -M plasma -T n0:kv+backup,n1:kv+backup
Observations
1) This works as expected with 1, 3 and 4 nodes, just not 2
2) The second node hasn't yet created a backup service log file
3) This does appear to work in 7.0.0
4) This does appear to work with other services
Heartbeat Failure |
[ns_server:error,2021-11-23T18:51:34.323Z,n_0@127.0.0.1:ns_heart_slow_status_updater<0.499.0>:ns_heart:grab_one_service_status:409]Failed to grab service backup status: {exit,
|
{timeout,
|
{gen_server,call,
|
['service_agent-backup',get_status,
|
2000]}},
|
[{gen_server,call,3,
|
[{file,"gen_server.erl"},{line,247}]},
|
{ns_heart,grab_one_service_status,1,
|
[{file,"src/ns_heart.erl"},
|
{line,406}]},
|
{ns_heart,
|
'-grab_service_statuses/0-lc$^1/1-1-',
|
1,
|
[{file,"src/ns_heart.erl"},
|
{line,402}]},
|
{ns_heart,current_status_slow_inner,
|
0,
|
[{file,"src/ns_heart.erl"},
|
{line,276}]},
|
{ns_heart,current_status_slow,1,
|
[{file,"src/ns_heart.erl"},
|
{line,235}]},
|
{ns_heart,slow_updater_loop,0,
|
[{file,"src/ns_heart.erl"},
|
{line,229}]}]}
|
CBAuth 500 status codes |
2021/11/23 18:51:09 revrpc: Got error (Need 200 status!. Got {500 Internal Server Error 500 HTTP/1.1 1 1 map[Cache-Control:[no-cache,no-store,must-revalidate] Content-Length:[44] Content-Type:[application/json] Date:[Tue, 23 Nov 2021 18:51:08 GMT] Expires:[Thu, 01 Jan 1970 00:00:00 GMT] Pragma:[no-cache] Server:[Couchbase Server] X-Content-Type-Options:[nosniff] X-Frame-Options:[DENY] X-Permitted-Cross-Domain-Policies:[none] X-Xss-Protection:[1; mode=block]] 0xc000159540 44 [] true false map[] 0xc00018e000 <nil>}) and will retry in 1s
|
...
|
2021-11-23T18:51:09.477Z INFO (Main) Running node version backup-0.0.0-0000-bd0ebcd with options: -http-port=7100 -grpc-port=7200 -https-port=17100 -cert-path=/home/couchbase/Projects/couchbase-build/ns_server/data/n_0/config/certs/chain.pem -key-path=/home/couchbase/Projects/couchbase-build/ns_server/data/n_0/config/certs/pkey.pem -ca-path=/home/couchbase/Projects/couchbase-build/ns_server/data/n_0/config/certs/ca.pem -ipv4=required -ipv6=optional -cbm=/home/couchbase/Projects/couchbase-build/install/bin/cbbackupmgr -node-uuid=85ce9cbd1ede710f468d8ad026c12e62 -public-address=127.0.0.1 -admin-port=9000 -log-file=none -log-level=debug -integrated-mode -integrated-mode-host=http://127.0.0.1:9000 -secure-integrated-mode-host=https://127.0.0.1:19000 -integrated-mode-user=@backup -tmp-dir=/home/couchbase/Projects/couchbase-build/ns_server/tmp -cbauth-host=127.0.0.1:9000
|
ns_server crash report |
[error_logger:error,2021-11-23T18:53:17.790Z,n_0@127.0.0.1:service_rebalancer-backup<0.5126.0>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: misc:'-spawn_monitor/1-fun-0-'/0
|
pid: <0.5126.0>
|
registered_name: 'service_rebalancer-backup'
|
exception exit: {agent_died,<0.5049.0>,
|
{linked_process_died,<0.5050.0>,
|
{'n_0@127.0.0.1',
|
{no_connection,"backup-service_api"}}}}
|
in function service_rebalancer:run_rebalance/1 (src/service_rebalancer.erl, line 73)
|
ancestors: [cleanup_process,ns_janitor_server,ns_orchestrator_child_sup,
|
ns_orchestrator_sup,mb_master_sup,mb_master,
|
leader_registry_sup,leader_services_sup,<0.678.0>,
|
ns_server_sup,ns_server_nodes_sup,<0.273.0>,
|
ns_server_cluster_sup,root_sup,<0.146.0>]
|
message_queue_len: 0
|
messages: []
|
links: []
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 2586
|
stack_size: 29
|
reductions: 7423
|
neighbours:
|
|
[ns_server:error,2021-11-23T18:53:17.791Z,n_0@127.0.0.1:cleanup_process<0.5125.0>:service_janitor:init_topology_aware_service:91]Initial rebalance for `backup` failed: {error,
|
{initial_rebalance_failed,backup,
|
{agent_died,<0.5049.0>,
|
{linked_process_died,<0.5050.0>,
|
{'n_0@127.0.0.1',
|
{no_connection,
|
"backup-service_api"}}}}}}
|
Rebalance Cancelled |
2021-11-23T18:51:17.784Z INFO (Rebalance) Cancelling rebalance
|
2021-11-23T18:51:17.784Z ERROR (Rebalance) Couldn't confirm node was added {"nodeID": "85ce9cbd1ede710f468d8ad026c12e62", "err": "could not add self: retries aborted after 1 attempts: context canceled"}
|
Attachments
Issue Links
- relates to
-
MB-49946 [CBBS] Add unit testing for the top-level rebalance functions
- Open