Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49732

[CBBS] cluster_run/cluster_connect does not work with 2 nodes (both using cbbs)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 7.1.0
    • 7.1.0
    • tools
    • None
    • Untriaged
    • 1
    • Yes
    • Tools 2021 Dec

    Description

      What's the issue?
      I can't currently setup a cluster_run cluster with two nodes running the backup service, the rebalance fails.

      Steps to reproduce
      1) cluster_run --nodes 2 --dont-rename
      2) cluster_connect -n 2 -s 1024 -M plasma -T n0:kv+backup,n1:kv+backup

      Observations
      1) This works as expected with 1, 3 and 4 nodes, just not 2
      2) The second node hasn't yet created a backup service log file
      3) This does appear to work in 7.0.0
      4) This does appear to work with other services

      Heartbeat Failure

      [ns_server:error,2021-11-23T18:51:34.323Z,n_0@127.0.0.1:ns_heart_slow_status_updater<0.499.0>:ns_heart:grab_one_service_status:409]Failed to grab service backup status: {exit,
                                             {timeout,
                                              {gen_server,call,
                                               ['service_agent-backup',get_status,
                                                2000]}},
                                             [{gen_server,call,3,
                                               [{file,"gen_server.erl"},{line,247}]},
                                              {ns_heart,grab_one_service_status,1,
                                               [{file,"src/ns_heart.erl"},
                                                {line,406}]},
                                              {ns_heart,
                                               '-grab_service_statuses/0-lc$^1/1-1-',
                                               1,
                                               [{file,"src/ns_heart.erl"},
                                                {line,402}]},
                                              {ns_heart,current_status_slow_inner,
                                               0,
                                               [{file,"src/ns_heart.erl"},
                                                {line,276}]},
                                              {ns_heart,current_status_slow,1,
                                               [{file,"src/ns_heart.erl"},
                                                {line,235}]},
                                              {ns_heart,slow_updater_loop,0,
                                               [{file,"src/ns_heart.erl"},
                                                {line,229}]}]}
      

      CBAuth 500 status codes

      2021/11/23 18:51:09 revrpc: Got error (Need 200 status!. Got {500 Internal Server Error 500 HTTP/1.1 1 1 map[Cache-Control:[no-cache,no-store,must-revalidate] Content-Length:[44] Content-Type:[application/json] Date:[Tue, 23 Nov 2021 18:51:08 GMT] Expires:[Thu, 01 Jan 1970 00:00:00 GMT] Pragma:[no-cache] Server:[Couchbase Server] X-Content-Type-Options:[nosniff] X-Frame-Options:[DENY] X-Permitted-Cross-Domain-Policies:[none] X-Xss-Protection:[1; mode=block]] 0xc000159540 44 [] true false map[] 0xc00018e000 <nil>}) and will retry in 1s
      ...
      2021-11-23T18:51:09.477Z INFO (Main) Running node version backup-0.0.0-0000-bd0ebcd with options: -http-port=7100 -grpc-port=7200 -https-port=17100 -cert-path=/home/couchbase/Projects/couchbase-build/ns_server/data/n_0/config/certs/chain.pem -key-path=/home/couchbase/Projects/couchbase-build/ns_server/data/n_0/config/certs/pkey.pem -ca-path=/home/couchbase/Projects/couchbase-build/ns_server/data/n_0/config/certs/ca.pem -ipv4=required -ipv6=optional -cbm=/home/couchbase/Projects/couchbase-build/install/bin/cbbackupmgr -node-uuid=85ce9cbd1ede710f468d8ad026c12e62 -public-address=127.0.0.1 -admin-port=9000 -log-file=none -log-level=debug -integrated-mode -integrated-mode-host=http://127.0.0.1:9000 -secure-integrated-mode-host=https://127.0.0.1:19000 -integrated-mode-user=@backup -tmp-dir=/home/couchbase/Projects/couchbase-build/ns_server/tmp -cbauth-host=127.0.0.1:9000
      

      ns_server crash report

      [error_logger:error,2021-11-23T18:53:17.790Z,n_0@127.0.0.1:service_rebalancer-backup<0.5126.0>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: misc:'-spawn_monitor/1-fun-0-'/0
          pid: <0.5126.0>
          registered_name: 'service_rebalancer-backup'
          exception exit: {agent_died,<0.5049.0>,
                              {linked_process_died,<0.5050.0>,
                                  {'n_0@127.0.0.1',
                                      {no_connection,"backup-service_api"}}}}
            in function  service_rebalancer:run_rebalance/1 (src/service_rebalancer.erl, line 73)
          ancestors: [cleanup_process,ns_janitor_server,ns_orchestrator_child_sup,
                        ns_orchestrator_sup,mb_master_sup,mb_master,
                        leader_registry_sup,leader_services_sup,<0.678.0>,
                        ns_server_sup,ns_server_nodes_sup,<0.273.0>,
                        ns_server_cluster_sup,root_sup,<0.146.0>]
          message_queue_len: 0
          messages: []
          links: []
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 2586
          stack_size: 29
          reductions: 7423
        neighbours:
       
      [ns_server:error,2021-11-23T18:53:17.791Z,n_0@127.0.0.1:cleanup_process<0.5125.0>:service_janitor:init_topology_aware_service:91]Initial rebalance for `backup` failed: {error,
                                              {initial_rebalance_failed,backup,
                                               {agent_died,<0.5049.0>,
                                                {linked_process_died,<0.5050.0>,
                                                 {'n_0@127.0.0.1',
                                                  {no_connection,
                                                   "backup-service_api"}}}}}}
      

      Rebalance Cancelled

      2021-11-23T18:51:17.784Z INFO (Rebalance) Cancelling rebalance
      2021-11-23T18:51:17.784Z ERROR (Rebalance) Couldn't confirm node was added {"nodeID": "85ce9cbd1ede710f468d8ad026c12e62", "err": "could not add self: retries aborted after 1 attempts: context canceled"}
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              james.lee James Lee
              james.lee James Lee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty