Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48366

[BP 7.0.2 MB-48351] - [Enforce-TLS] 'Rebalance exited with reason {service_rebalance_failed,index}..'

    XMLWordPrintable

Details

    Description

      Script to Repo

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/durability_volume.ini  -t volumetests.Collections.volume.test_volume_taf,nodes_init=6,bucket_spec=volume_templates.buckets_scalable_stats_for_volume_test,iterations=1,rerun=False,get-cbcollect-info=True,skip_validations=True,services_for_rebalance_in=kv:index,services_init=kv-n1ql-n1ql-kv:index-kv:index-kv:index,number_of_indexes=300,quota_percent=80,use_https=True,enforce_tls=True'
      

      Steps
      1.  Create a 6 node cluster with strict level of n2n encryption

      2021-09-06 22:32:40,663 | test  | INFO    | pool-3-thread-9 | [table_view:display:72] Rebalance Overview
      +----------------+--------------+-----------------------+---------------+--------------+
      | Nodes          | Services     | Version               | CPU           | Status       |
      +----------------+--------------+-----------------------+---------------+--------------+
      | 172.23.105.175 | kv           | 7.0.2-6644-enterprise | 2.13032581454 | Cluster node |
      | 172.23.106.233 | ['n1ql']     |                       |               | <--- IN ---  |
      | 172.23.106.236 | ['n1ql']     |                       |               | <--- IN ---  |
      | 172.23.106.238 | ['kv,index'] |                       |               | <--- IN ---  |
      | 172.23.106.250 | ['kv,index'] |                       |               | <--- IN ---  |
      | 172.23.106.251 | ['kv,index'] |                       |               | <--- IN ---  |
      +----------------+--------------+-----------------------+---------------+--------------+
      

      2. Create 15 buckets with 1000 collections with a few documents
      3. Flush the documents
      4. Create 300 gsi indexes on collections

      2021-09-06 23:15:55,128 | test  | INFO    | MainThread | [Collections:build_deferred_indexes:224] online indexes count: 300

      5. Load a few documents
      6. Rebalance in a node with kv, index services along with data loading

      2021-09-06 23:20:55,394 | test  | INFO    | pool-3-thread-1 | [table_view:display:72] Rebalance Overview
      +----------------+--------------+-----------------------+---------------+--------------+
      | Nodes          | Services     | Version               | CPU           | Status       |
      +----------------+--------------+-----------------------+---------------+--------------+
      | 172.23.105.175 | kv           | 7.0.2-6644-enterprise | 11.5120711563 | Cluster node |
      | 172.23.106.250 | index, kv    | 7.0.2-6644-enterprise | 10.356448477  | Cluster node |
      | 172.23.106.236 | n1ql         | 7.0.2-6644-enterprise | 3.07789740342 | Cluster node |
      | 172.23.106.251 | index, kv    | 7.0.2-6644-enterprise | 10.9372979961 | Cluster node |
      | 172.23.106.233 | n1ql         | 7.0.2-6644-enterprise | 2.97769893563 | Cluster node |
      | 172.23.106.238 | index, kv    | 7.0.2-6644-enterprise | 9.12857697786 | Cluster node |
      | 172.23.121.78  | ['kv,index'] |                       |               | <--- IN ---  |
      +----------------+--------------+-----------------------+---------------+--------------+
      

      it failed at 43% of progress

      2021-09-06 23:24:05,075 | test  | ERROR   | pool-3-thread-1 | [rest_client:_rebalance_status_and_progress:1639] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'da3dd761c86b379aa8bb6f4ec3475051', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=d890e2a55b0366d043c9bab3b7cd7bed', u'status': u'notRunning'} - rebalance failed
      2021-09-06 23:24:05,125 | test  | INFO    | pool-3-thread-1 | [rest_client:print_UI_logs:2785] Latest logs from UI on 172.23.105.175:
      2021-09-06 23:24:05,125 | test  | ERROR   | pool-3-thread-1 | [rest_client:print_UI_logs:2787] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.105.175', u'tstamp': 1630995838453L, u'shortText': u'message', u'serverTime': u'2021-09-06T23:23:58.453Z', u'text': u'Rebalance exited with reason {service_rebalance_failed,index,\n                              {agent_died,<32558.27011.3>,\n                               {linked_process_died,<32558.10764.9>,\n                                {\'ns_1@172.23.121.78\',\n                                 {timeout,\n                                  {gen_server,call,\n                                   [<32558.32400.3>,\n                                    {call,"ServiceAPI.StartTopologyChange",\n                                     #Fun<json_rpc_connection.0.77329884>},\n                                    60000]}}}}}}.\nRebalance Operation Id = 7a4d174bbdc32df7b48a00f09fe881a8'}

      However, retrying the rebalance after a few hours succeeded.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-7.0.2-6676 contains indexing commit a0d6a82 with commit message:
            MB-48366 : [BP 7.0.2 MB 48351] Disable encryption for GET calls to ns_server

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.2-6676 contains indexing commit a0d6a82 with commit message: MB-48366 : [BP 7.0.2 MB 48351] Disable encryption for GET calls to ns_server

            Build couchbase-server-7.0.2-6676 contains indexing commit 2414625 with commit message:
            MB-48366 : [BP 7.0.2 MB 48351] Increase TLSHandshakeTimeout from 10 to 60 secs.

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.2-6676 contains indexing commit 2414625 with commit message: MB-48366 : [BP 7.0.2 MB 48351] Increase TLSHandshakeTimeout from 10 to 60 secs.

            To Verify we must check if ns_server accesses are working with strict encryption and check if they are going to non tls port.

            sai.teja Sai Krishna Teja added a comment - To Verify we must check if ns_server accesses are working with strict encryption and check if they are going to non tls port.

            Verified with a run on 7.0.2-6676. Closing

            sumedh.basarkod Sumedh Basarkod added a comment - Verified with a run on 7.0.2-6676. Closing

            People

              sumedh.basarkod Sumedh Basarkod
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty