Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51319

Some nodes in the cluster are rendered unusable post cleanup

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Unknown
    • KV 2022-Feb, KV March-22

    Description

      Script to Repro

      There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
      

      Logs before node became unusable.

      2022-03-06 13:49:41,844 | test  | INFO    | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out =========
      2022-03-06 13:50:22,311 | test  | INFO    | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete
      2022-03-06 13:50:22,319 | test  | INFO    | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete
      2022-03-06 13:50:27,359 | test  | ERROR   | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body:  headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
      http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
      2022-03-06 13:50:27,362 | test  | ERROR   | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
      2022-03-06 13:50:32,378 | test  | ERROR   | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body:  headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
      http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
      2022-03-06 13:50:32,380 | test  | ERROR   | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
      2022-03-06 13:50:37,394 | test  | ERROR   | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body:  headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
      http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
      2022-03-06 13:50:37,395 | test  | ERROR   | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
      2022-03-06 13:50:42,404 | test  | ERROR   | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
      2022-03-06 13:50:42,404 | test  | ERROR   | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to MB-8430: Unable to reach the host @ 172.23.100.35
      2022-03-06 13:50:42,411 | test  | INFO    | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: MB-8430
      2022-03-06 13:50:52,420 | test  | ERROR   | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body:  headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
      http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
      2022-03-06 13:50:52,421 | test  | ERROR   | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
      2022-03-06 13:50:57,436 | test  | ERROR   | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body:  headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
      http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
      2022-03-06 13:50:57,437 | test  | ERROR   | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
      2022-03-06 13:51:02,453 | test  | ERROR   | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body:  headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
      http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
      2022-03-06 13:51:02,456 | test  | ERROR   | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
      2022-03-06 13:51:07,463 | test  | ERROR   | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
      Traceback (most recent call last):
        File "pytests/basetestcase.py", line 363, in setUp
          self.cluster_util.cluster_cleanup(cluster,
        File "pytests/basetestcase.py", line 363, in setUp
          self.cluster_util.cluster_cleanup(cluster,
        File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup
          self.cleanup_cluster(cluster, master=cluster.master)
        File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster
          rest = RestConnection(removed)
        File "lib/membase/api/rest_client.py", line 319, in __init__
          raise ServerUnavailableException(self.ip)
      ServerUnavailableException: Unable to reach the host @ 172.23.100.35
      

      172.23.100.35

      [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
      [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
      [{ets,lookup,
            [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
            [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
       {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
                        [{file,"src/ns_server_stats.erl"},{line,257}]},
       {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
       {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                        [{file,"src/ns_server_stats.erl"},{line,170}]},
       {ns_server_stats,report_prom_stats,2,
                        [{file,"src/ns_server_stats.erl"},{line,180}]},
       {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
      [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
                                              {gen_server,call,
                                               [ns_server_stats,get_stats]}}
      [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
       {ns_server_stats,report_system_stats,1,
                        [{file,"src/ns_server_stats.erl"},{line,188}]},
       {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                        [{file,"src/ns_server_stats.erl"},{line,170}]},
       {ns_server_stats,report_prom_stats,2,
                        [{file,"src/ns_server_stats.erl"},{line,182}]},
       {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
      [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
      [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
      [{ets,safe_fixtable,
            [ns_server_stats,true],
            [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
       {ets,foldl,3,[{file,"ets.erl"},{line,625}]},
       {ns_server_stats,report_ns_server_hc_stats,1,
                        [{file,"src/ns_server_stats.erl"},{line,264}]},
       {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                        [{file,"src/ns_server_stats.erl"},{line,170}]},
       {ns_server_stats,report_prom_stats,2,
                        [{file,"src/ns_server_stats.erl"},{line,178}]},
       {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
      [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                       {path,"/pools/default"},
                                       {method,'GET'},
                                       {type,exit},
                                       {what,
                                        {noproc,
                                         {gen_server,call,
                                          ['service_status_keeper-index',
                                           get_version]}}},
                                       {trace,
                                        [{gen_server,call,2,
                                          [{file,"gen_server.erl"},{line,239}]},
                                         {menelaus_web_pools,do_build_pool_info,4,
                                          [{file,"src/menelaus_web_pools.erl"},
                                           {line,211}]},
                                         {menelaus_web_pools,pool_info,6,
                                          [{file,"src/menelaus_web_pools.erl"},
                                           {line,106}]},
                                         {menelaus_web_pools,handle_pool_info_wait,
                                          5,
                                          [{file,"src/menelaus_web_pools.erl"},
                                           {line,118}]},
                                         {request_tracker,request,2,
                                          [{file,"src/request_tracker.erl"},
                                           {line,40}]},
                                         {menelaus_util,handle_request,2,
                                          [{file,"src/menelaus_util.erl"},
                                           {line,221}]},
                                         {mochiweb_http,headers,6,
                                          [{file,
                                            "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                           {line,153}]},
                                         {proc_lib,init_p_do_apply,3,
                                          [{file,"proc_lib.erl"},{line,226}]}]}]
      
      

      Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
      cbcollect_info attached.

      Attachments

        1. consoleText_MB-51319.txt
          1.98 MB
        2. debug.log.1
          40.00 MB
        3. latest-diag.log
          1.52 MB
        4. memcached.log.000021.txt
          7.94 MB
        5. node35.gdb.log
          789 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Balakumaran.Gopal Balakumaran Gopal created issue -
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Field Original Value New Value
          Summary Certain nodes are not accessible post cleanup of the cluster Some nodes in the cluster are rendered unusable post cleanup
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Description +Script to Repro+
          {noformat}
          There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster in unusable state.
          {noformat}

          +172.23.100.35+
          {noformat}
          [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,lookup,
                [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,257}]},
           {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,180}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
                                                  {gen_server,call,
                                                   [ns_server_stats,get_stats]}}
          [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
           {ns_server_stats,report_system_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,188}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,182}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,safe_fixtable,
                [ns_server_stats,true],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ets,foldl,3,[{file,"ets.erl"},{line,625}]},
           {ns_server_stats,report_ns_server_hc_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,264}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,178}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                           {path,"/pools/default"},
                                           {method,'GET'},
                                           {type,exit},
                                           {what,
                                            {noproc,
                                             {gen_server,call,
                                              ['service_status_keeper-index',
                                               get_version]}}},
                                           {trace,
                                            [{gen_server,call,2,
                                              [{file,"gen_server.erl"},{line,239}]},
                                             {menelaus_web_pools,do_build_pool_info,4,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,211}]},
                                             {menelaus_web_pools,pool_info,6,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,106}]},
                                             {menelaus_web_pools,handle_pool_info_wait,
                                              5,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,118}]},
                                             {request_tracker,request,2,
                                              [{file,"src/request_tracker.erl"},
                                               {line,40}]},
                                             {menelaus_util,handle_request,2,
                                              [{file,"src/menelaus_util.erl"},
                                               {line,221}]},
                                             {mochiweb_http,headers,6,
                                              [{file,
                                                "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                               {line,153}]},
                                             {proc_lib,init_p_do_apply,3,
                                              [{file,"proc_lib.erl"},{line,226}]}]}]

          {noformat}

          Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
          cbcollect_info attached.
          +Script to Repro+
          {noformat}
          There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
          {noformat}

          +172.23.100.35+
          {noformat}
          [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,lookup,
                [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,257}]},
           {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,180}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
                                                  {gen_server,call,
                                                   [ns_server_stats,get_stats]}}
          [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
           {ns_server_stats,report_system_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,188}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,182}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,safe_fixtable,
                [ns_server_stats,true],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ets,foldl,3,[{file,"ets.erl"},{line,625}]},
           {ns_server_stats,report_ns_server_hc_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,264}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,178}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                           {path,"/pools/default"},
                                           {method,'GET'},
                                           {type,exit},
                                           {what,
                                            {noproc,
                                             {gen_server,call,
                                              ['service_status_keeper-index',
                                               get_version]}}},
                                           {trace,
                                            [{gen_server,call,2,
                                              [{file,"gen_server.erl"},{line,239}]},
                                             {menelaus_web_pools,do_build_pool_info,4,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,211}]},
                                             {menelaus_web_pools,pool_info,6,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,106}]},
                                             {menelaus_web_pools,handle_pool_info_wait,
                                              5,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,118}]},
                                             {request_tracker,request,2,
                                              [{file,"src/request_tracker.erl"},
                                               {line,40}]},
                                             {menelaus_util,handle_request,2,
                                              [{file,"src/menelaus_util.erl"},
                                               {line,221}]},
                                             {mochiweb_http,headers,6,
                                              [{file,
                                                "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                               {line,153}]},
                                             {proc_lib,init_p_do_apply,3,
                                              [{file,"proc_lib.erl"},{line,226}]}]}]

          {noformat}

          Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
          cbcollect_info attached.
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Attachment consoleText_MB-51319.txt [ 179076 ]
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Description +Script to Repro+
          {noformat}
          There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
          {noformat}

          +172.23.100.35+
          {noformat}
          [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,lookup,
                [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,257}]},
           {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,180}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
                                                  {gen_server,call,
                                                   [ns_server_stats,get_stats]}}
          [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
           {ns_server_stats,report_system_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,188}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,182}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,safe_fixtable,
                [ns_server_stats,true],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ets,foldl,3,[{file,"ets.erl"},{line,625}]},
           {ns_server_stats,report_ns_server_hc_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,264}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,178}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                           {path,"/pools/default"},
                                           {method,'GET'},
                                           {type,exit},
                                           {what,
                                            {noproc,
                                             {gen_server,call,
                                              ['service_status_keeper-index',
                                               get_version]}}},
                                           {trace,
                                            [{gen_server,call,2,
                                              [{file,"gen_server.erl"},{line,239}]},
                                             {menelaus_web_pools,do_build_pool_info,4,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,211}]},
                                             {menelaus_web_pools,pool_info,6,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,106}]},
                                             {menelaus_web_pools,handle_pool_info_wait,
                                              5,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,118}]},
                                             {request_tracker,request,2,
                                              [{file,"src/request_tracker.erl"},
                                               {line,40}]},
                                             {menelaus_util,handle_request,2,
                                              [{file,"src/menelaus_util.erl"},
                                               {line,221}]},
                                             {mochiweb_http,headers,6,
                                              [{file,
                                                "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                               {line,153}]},
                                             {proc_lib,init_p_do_apply,3,
                                              [{file,"proc_lib.erl"},{line,226}]}]}]

          {noformat}

          Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
          cbcollect_info attached.
          +Script to Repro+
          {noformat}
          There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
          {noformat}

          Sequence of steps that led to the node being unusable.

          {noformat}
          2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out =========
          2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete
          2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete
          2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
          2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to MB-8430: Unable to reach the host @ 172.23.100.35
          2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: MB-8430
          2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
          Traceback (most recent call last):
            File "pytests/basetestcase.py", line 363, in setUp
              self.cluster_util.cluster_cleanup(cluster,
            File "pytests/basetestcase.py", line 363, in setUp
              self.cluster_util.cluster_cleanup(cluster,
            File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup
              self.cleanup_cluster(cluster, master=cluster.master)
            File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster
              rest = RestConnection(removed)
            File "lib/membase/api/rest_client.py", line 319, in __init__
              raise ServerUnavailableException(self.ip)
          ServerUnavailableException: Unable to reach the host @ 172.23.100.35
          {noformat}

          +172.23.100.35+
          {noformat}
          [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,lookup,
                [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,257}]},
           {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,180}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
                                                  {gen_server,call,
                                                   [ns_server_stats,get_stats]}}
          [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
           {ns_server_stats,report_system_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,188}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,182}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,safe_fixtable,
                [ns_server_stats,true],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ets,foldl,3,[{file,"ets.erl"},{line,625}]},
           {ns_server_stats,report_ns_server_hc_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,264}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,178}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                           {path,"/pools/default"},
                                           {method,'GET'},
                                           {type,exit},
                                           {what,
                                            {noproc,
                                             {gen_server,call,
                                              ['service_status_keeper-index',
                                               get_version]}}},
                                           {trace,
                                            [{gen_server,call,2,
                                              [{file,"gen_server.erl"},{line,239}]},
                                             {menelaus_web_pools,do_build_pool_info,4,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,211}]},
                                             {menelaus_web_pools,pool_info,6,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,106}]},
                                             {menelaus_web_pools,handle_pool_info_wait,
                                              5,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,118}]},
                                             {request_tracker,request,2,
                                              [{file,"src/request_tracker.erl"},
                                               {line,40}]},
                                             {menelaus_util,handle_request,2,
                                              [{file,"src/menelaus_util.erl"},
                                               {line,221}]},
                                             {mochiweb_http,headers,6,
                                              [{file,
                                                "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                               {line,153}]},
                                             {proc_lib,init_p_do_apply,3,
                                              [{file,"proc_lib.erl"},{line,226}]}]}]

          {noformat}

          Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
          cbcollect_info attached.
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Description +Script to Repro+
          {noformat}
          There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
          {noformat}

          Sequence of steps that led to the node being unusable.

          {noformat}
          2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out =========
          2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete
          2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete
          2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
          2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to MB-8430: Unable to reach the host @ 172.23.100.35
          2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: MB-8430
          2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
          Traceback (most recent call last):
            File "pytests/basetestcase.py", line 363, in setUp
              self.cluster_util.cluster_cleanup(cluster,
            File "pytests/basetestcase.py", line 363, in setUp
              self.cluster_util.cluster_cleanup(cluster,
            File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup
              self.cleanup_cluster(cluster, master=cluster.master)
            File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster
              rest = RestConnection(removed)
            File "lib/membase/api/rest_client.py", line 319, in __init__
              raise ServerUnavailableException(self.ip)
          ServerUnavailableException: Unable to reach the host @ 172.23.100.35
          {noformat}

          +172.23.100.35+
          {noformat}
          [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,lookup,
                [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,257}]},
           {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,180}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
                                                  {gen_server,call,
                                                   [ns_server_stats,get_stats]}}
          [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
           {ns_server_stats,report_system_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,188}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,182}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,safe_fixtable,
                [ns_server_stats,true],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ets,foldl,3,[{file,"ets.erl"},{line,625}]},
           {ns_server_stats,report_ns_server_hc_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,264}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,178}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                           {path,"/pools/default"},
                                           {method,'GET'},
                                           {type,exit},
                                           {what,
                                            {noproc,
                                             {gen_server,call,
                                              ['service_status_keeper-index',
                                               get_version]}}},
                                           {trace,
                                            [{gen_server,call,2,
                                              [{file,"gen_server.erl"},{line,239}]},
                                             {menelaus_web_pools,do_build_pool_info,4,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,211}]},
                                             {menelaus_web_pools,pool_info,6,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,106}]},
                                             {menelaus_web_pools,handle_pool_info_wait,
                                              5,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,118}]},
                                             {request_tracker,request,2,
                                              [{file,"src/request_tracker.erl"},
                                               {line,40}]},
                                             {menelaus_util,handle_request,2,
                                              [{file,"src/menelaus_util.erl"},
                                               {line,221}]},
                                             {mochiweb_http,headers,6,
                                              [{file,
                                                "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                               {line,153}]},
                                             {proc_lib,init_p_do_apply,3,
                                              [{file,"proc_lib.erl"},{line,226}]}]}]

          {noformat}

          Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
          cbcollect_info attached.
          +Script to Repro+
          {noformat}
          There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
          {noformat}

          Logs before node became unusable.
          {noformat}
          2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out =========
          2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete
          2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete
          2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
          2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to MB-8430: Unable to reach the host @ 172.23.100.35
          2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: MB-8430
          2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
          http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
          2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
          2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
          Traceback (most recent call last):
            File "pytests/basetestcase.py", line 363, in setUp
              self.cluster_util.cluster_cleanup(cluster,
            File "pytests/basetestcase.py", line 363, in setUp
              self.cluster_util.cluster_cleanup(cluster,
            File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup
              self.cleanup_cluster(cluster, master=cluster.master)
            File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster
              rest = RestConnection(removed)
            File "lib/membase/api/rest_client.py", line 319, in __init__
              raise ServerUnavailableException(self.ip)
          ServerUnavailableException: Unable to reach the host @ 172.23.100.35
          {noformat}

          +172.23.100.35+
          {noformat}
          [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,lookup,
                [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,257}]},
           {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,180}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
                                                  {gen_server,call,
                                                   [ns_server_stats,get_stats]}}
          [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
           {ns_server_stats,report_system_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,188}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,182}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
          [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
          [{ets,safe_fixtable,
                [ns_server_stats,true],
                [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
           {ets,foldl,3,[{file,"ets.erl"},{line,625}]},
           {ns_server_stats,report_ns_server_hc_stats,1,
                            [{file,"src/ns_server_stats.erl"},{line,264}]},
           {ns_server_stats,'-report_prom_stats/2-fun-0-',2,
                            [{file,"src/ns_server_stats.erl"},{line,170}]},
           {ns_server_stats,report_prom_stats,2,
                            [{file,"src/ns_server_stats.erl"},{line,178}]},
           {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
          [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                           {path,"/pools/default"},
                                           {method,'GET'},
                                           {type,exit},
                                           {what,
                                            {noproc,
                                             {gen_server,call,
                                              ['service_status_keeper-index',
                                               get_version]}}},
                                           {trace,
                                            [{gen_server,call,2,
                                              [{file,"gen_server.erl"},{line,239}]},
                                             {menelaus_web_pools,do_build_pool_info,4,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,211}]},
                                             {menelaus_web_pools,pool_info,6,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,106}]},
                                             {menelaus_web_pools,handle_pool_info_wait,
                                              5,
                                              [{file,"src/menelaus_web_pools.erl"},
                                               {line,118}]},
                                             {request_tracker,request,2,
                                              [{file,"src/request_tracker.erl"},
                                               {line,40}]},
                                             {menelaus_util,handle_request,2,
                                              [{file,"src/menelaus_util.erl"},
                                               {line,221}]},
                                             {mochiweb_http,headers,6,
                                              [{file,
                                                "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                               {line,153}]},
                                             {proc_lib,init_p_do_apply,3,
                                              [{file,"proc_lib.erl"},{line,226}]}]}]

          {noformat}

          Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
          cbcollect_info attached.
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Labels affects-neo-testing functional-test affects-neo-testing functional-test magma
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Assignee Balakumaran Gopal [ balakumaran.gopal ] Daniel Owen [ owend ]
          owend Daniel Owen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          owend Daniel Owen made changes -
          Component/s couchbase-bucket [ 10173 ]
          Component/s ns_server [ 10019 ]
          owend Daniel Owen made changes -
          Status In Progress [ 3 ] Open [ 1 ]
          owend Daniel Owen made changes -
          Assignee Daniel Owen [ owend ] Dave Finlay [ dfinlay ]
          dfinlay Dave Finlay made changes -
          Assignee Dave Finlay [ dfinlay ] Daniel Owen [ owend ]
          dfinlay Dave Finlay made changes -
          Component/s couchbase-bucket [ 10173 ]
          dfinlay Dave Finlay made changes -
          Attachment latest-diag.log [ 179095 ]
          Attachment debug.log.1 [ 179094 ]
          Attachment memcached.log.000021.txt [ 179093 ]
          owend Daniel Owen made changes -
          Component/s ns_server [ 10019 ]
          Component/s storage-engine [ 10175 ]
          owend Daniel Owen made changes -
          Assignee Daniel Owen [ owend ] Sarath Lakshman [ sarath ]
          owend Daniel Owen made changes -
          Assignee Sarath Lakshman [ sarath ] Paolo Cocchi [ paolo.cocchi ]
          paolo.cocchi Paolo Cocchi made changes -
          Attachment node35.gdb.log [ 179104 ]
          paolo.cocchi Paolo Cocchi made changes -
          Assignee Paolo Cocchi [ paolo.cocchi ] Balakumaran Gopal [ balakumaran.gopal ]
          owend Daniel Owen made changes -
          Assignee Balakumaran Gopal [ balakumaran.gopal ] Srinath Duvuru [ srinath.duvuru ]
          sarath Sarath Lakshman made changes -
          Assignee Srinath Duvuru [ srinath.duvuru ] Sarath Lakshman [ sarath ]
          owend Daniel Owen made changes -
          Sprint KV Post-Neo [ 2016 ]
          owend Daniel Owen made changes -
          Sprint KV Post-Neo [ 2016 ] KV 2022-Feb [ 2002 ]
          owend Daniel Owen made changes -
          Rank Ranked higher
          owend Daniel Owen made changes -
          Sprint KV 2022-Feb [ 2002 ] KV 2022-Feb, KV Post-Neo 2 [ 2002, 2050 ]
          owend Daniel Owen made changes -
          Component/s couchbase-bucket [ 10173 ]
          srinath.duvuru Srinath Duvuru made changes -
          Due Date 10/Mar/22
          sarath Sarath Lakshman made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          ritam.sharma Ritam Sharma made changes -
          Assignee Sarath Lakshman [ sarath ] Balakumaran Gopal [ balakumaran.gopal ]
          Balakumaran.Gopal Balakumaran Gopal made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty