Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59216

[Upgrade] Swap Rebalance exits with the reason 'error: 500 reason: unknown ["Unexpected server error, request logged."]' when upgrading from 7.2.0 -> 7.6.0 with 30 buckets in the cluster

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 7.6.0
    • 7.6.0
    • ns_server
    • OS - Debian 10
      Initial version - 7.2.0-5325
      Upgrade version - 7.6.0-1666

    Description

      Steps:

      1. Install 7.2.0-5325 on 2 nodes and initialise the cluster with those 2 nodes running just the KV service. The nodes used in this test have 8 CPU cores each.
      2. Create 30 Magma buckets of 256MB RAM Quota each.
      3. Load 1000 documents of 1024 bytes each into all the buckets.
      4. Install 7.6.0-1666 with a provisioned profile on a spare node which is not a part of the cluster.
      5. Perform a swap rebalance of the spare node with a node inside the cluster.
      6. Swap Rebalance fails with the reason "error: 500 reason: unknown ["Unexpected server error, request logged."]"
      7. Cb-collect logs were collected and this was observed in http_access.log

      [ns_server:error,2023-10-18T22:13:14.943-07:00,ns_1@172.23.106.201:<0.20565.9>:menelaus_util:reply_server_error_before_close:211]Server error during processing: ["web request failed",
                                       {path,"/controller/rebalance"},
                                       {method,'POST'},
                                       {type,error},
                                       {what,
                                        {case_clause,
                                         {not_enough_cores_for_num_buckets,
                                          <<"The following node(s) being added have insufficient cpu cores for the number of buckets already in the cluster: ns_1@172.23.106.203">>}}},
                                       {trace,
                                        [{menelaus_web_cluster,do_handle_rebalance,
                                          4,
                                          [{file,"src/menelaus_web_cluster.erl"},
                                           {line,962}]},
                                         {request_tracker,request,2,
                                          [{file,"src/request_tracker.erl"},
                                           {line,40}]}, 

      According to the bucket guardrail which is enforced in 7.6.0, the maximum number of buckets we can have for an 8 core node = 8 / 0.4 = 20.

      1. We are exceeding the limit in 7.2.0, but since the upgrade to 7.6.0 is happening with a swap rebalance of identical nodes, it shouldn't fail. Guardrails should be enforced for future buckets that are created after the completion of the upgrade.
      2. Another point to mention is that, this is a regression from 7.6.0-1640, the same test passed when being upgraded to 7.6.0-1640, but it fails when the test is run with 7.6.0-1666.
      3. Cb-collect logs have been attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            vibhav.sp Vibhav S P
            vibhav.sp Vibhav S P
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty