Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60335

[Rebalance][Windows] : Rebalance stopped by janitor

    XMLWordPrintable

Details

    Description

      Steps to reproduce

      1. Created a 4 node cluster with the following services
        1. 172.23.138.230 - kv
        2. 172.23.138.231 - index, n1ql
        3. 172.23.138.232 - kv
        4. 172.23.138.234 - index
      2. Created a couchbase bucket named "default" loaded few documents
      3. Created indexes and ran a few queries
      4. Added node 172.23.138.233 to cluster
      5. Rebalance stopped by janitor

       

      [ns_server:debug,2023-12-15T00:26:25.724-08:00,ns_1@172.23.138.232:chronicle_kv_log<0.491.0>:chronicle_kv_log:log:59]update (key: rebalance_status, rev: {<<"54976bb716bbce5001f45d19d8b21416">>,                                     58}){none,<<"Rebalance stopped by janitor.">>}
       
      2023-12-15T00:26:26.329-08:00, ns_janitor:0:info:message(ns_1@172.23.138.230) - Resetting rebalance status since it's not really running 

      On node 172.23.138.233 seeing a lot of errors in ns_server.error.log

       

       

      [ns_server:error,2023-12-14T22:32:35.313-08:00,ns_1@cb.local:menelaus_web_cache<0.556.0>:menelaus_web_cache:read_package_variant:53]Failed to read '"c:/Program Files/Couchbase/Server/bin/../VARIANT.txt"': {error,                                                                          enoent}[ns_server:error,2023-12-14T22:32:37.404-08:00,ns_1@cb.local:<0.557.0>:prometheus:post_async:200]Prometheus http request failed:URL: http://127.0.0.1:9123/api/v1/queryBody: query=%7Bname%3D~%60kv_curr_items%7Ckv_curr_items_tot%7Ckv_mem_used_bytes%7Ccouch_docs_actual_disk_size%7Ccouch_views_actual_disk_size%7Ckv_ep_db_data_size_bytes%7Ckv_ep_bg_fetched%60%7D+or+kv_vb_curr_items%7Bstate%3D%27replica%27%7D+or+kv_vb_num_non_resident%7Bstate%3D%27active%27%7D+or+label_replace%28sum+by+%28bucket%2C+name%29+%28irate%28kv_ops%7Bop%3D%60get%60%7D%5B1m%5D%29%29%2C+%60name%60%2C%60cmd_get%60%2C+%60%60%2C+%60%60%29+or+label_replace%28irate%28kv_ops%7Bop%3D%60get%60%2Cresult%3D%60hit%60%7D%5B1m%5D%29%2C%60name%60%2C%60get_hits%60%2C%60%60%2C%60%60%29+or+label_replace%28sum+by+%28bucket%29+%28irate%28kv_cmd_lookup%5B1m%5D%29+or+irate%28kv_ops%7Bop%3D~%60set%7Cincr%7Cdecr%7Cdelete%7Cdel_meta%7Cget_meta%7Cset_meta%7Cset_ret_meta%7Cdel_ret_meta%60%7D%5B1m%5D%29%29%2C+%60name%60%2C+%60ops%60%2C+%60%60%2C+%60%60%29+or+sum+by+%28bucket%2C+name%29+%28%7Bname%3D~%60index_data_size%7Cindex_disk_size%7Ccouch_spatial_data_size%7Ccouch_spatial_disk_size%7Ccouch_views_data_size%60%7D%29&timeout=5sReason: <<"Service Unavailable">>[ns_server:error,2023-12-14T22:34:01.976-08:00,ns_1@cb.local:<0.3017.0>:ns_server_stats:report_prom_stats:170]ns_server stats reporting exception: exit:normal[{mochiweb_request,send,2,                   [{file,"c:/Jenkins/workspace/couchbase-server-windows/couchdb/src/mochiweb/mochiweb_request.erl"},                    {line,264}]}, {lists,foreach_1,2,[{file,"lists.erl"},{line,1442}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2,                  [{file,"src/ns_server_stats.erl"},{line,168}]}, {ns_server_stats,report_prom_stats,2,                  [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}][ns_server:error,2023-12-14T22:34:01.976-08:00,ns_1@cb.local:<0.3017.0>:ns_server_stats:report_prom_stats:170]audit stats reporting exception: exit:normal[{mochiweb_request,send,2,                   [{file,"c:/Jenkins/workspace/couchbase-server-windows/couchdb/src/mochiweb/mochiweb_request.erl"},                    {line,264}]}, {ns_server_stats,report_audit_stats,1,                  [{file,"src/ns_server_stats.erl"},{line,295}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2,                  [{file,"src/ns_server_stats.erl"},{line,168}]}, {ns_server_stats,report_prom_stats,2,                  [{file,"src/ns_server_stats.erl"},{line,179}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}][ns_server:error,2023-12-14T22:34:01.976-08:00,ns_1@cb.local:<0.2999.0>:ns_server_stats:report_prom_stats:170]ns_server stats reporting exception: exit:normal[{mochiweb_request,send,2,                   [{file,"c:/Jenkins/workspace/couchbase-server-windows/couchdb/src/mochiweb/mochiweb_request.erl"},                    {line,264}]}, {lists,foreach_1,2,[{file,"lists.erl"},{line,1442}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2,                  [{file,"src/ns_server_stats.erl"},{line,168}]}, {ns_server_stats,report_prom_stats,2,                  [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}][ns_server:error,2023-12-14T22:34:01.976-08:00,ns_1@cb.local:<0.3009.0>:ns_server_stats:report_prom_stats:170]ns_server stats reporting exception: exit:normal[{mochiweb_request,send,2,                   [{file,"c:/Jenkins/workspace/couchbase-server-windows/couchdb/src/mochiweb/mochiweb_request.erl"},                    {line,264}]}, {lists,foreach_1,2,[{file,"lists.erl"},{line,1442}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2,                  [{file,"src/ns_server_stats.erl"},{line,168}]}, {ns_server_stats,report_prom_stats,2,                  [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] 

      On node 172.23.138.233 seeing a lot of errors in ns_server.babysitter.log

       

       

      [ns_server:debug,2023-12-14T23:14:33.015-08:00,babysitter_of_ns_1@cb.local:<0.220.0>:restartable:start_child:92]Started child process <0.221.0>  MFA: {supervisor_cushion,start_link,                           [kv,5000,infinity,ns_port_server,start_link,                            [#Fun<ns_child_ports_sup.4.11883217>]]}[error_logger:info,2023-12-14T23:14:33.015-08:00,babysitter_of_ns_1@cb.local:ns_child_ports_sup<0.141.0>:ale_error_logger_handler:do_log:101]=========================PROGRESS REPORT=========================    supervisor: {local,ns_child_ports_sup}    started: [{pid,<0.220.0>},              {id,{kv,"c:/Program Files/Couchbase/Server/bin/projector.exe",                      ["--httpsPort=9999",                       "--certFile=c:/Program Files/Couchbase/Server/var/lib/couchbase/config/certs/chain.pem",                       "--keyFile=c:/Program Files/Couchbase/Server/var/lib/couchbase/config/certs/pkey.pem",                       "--caFile=c:/Program Files/Couchbase/Server/var/lib/couchbase/config/certs/ca.pem",                       "-ipv4=required","-ipv6=optional",                       "-kvaddrs=127.0.0.1:11210","-adminport=:9999",                       "-diagDir=c:/Program Files/Couchbase/Server/var/lib/couchbase/crash",                       "127.0.0.1:8091"],                      [via_goport,exit_status,stderr_to_stdout,                       {env,                           [{"GODEBUG","madvdontneed=1"},                            {"GOTRACEBACK","single"},                            {"CBAUTH_REVRPC_URL",                             <<197,61,150,9,78,244,213,87,1,135,126,235,112,                               22,149,206,212,91,250,118,19,229,28,145,59,                               254,186,4,190,255,...>>}]},                       {log,"projector.log"}]}},              {mfargs,                  {restartable,start_link,                      [{supervisor_cushion,start_link,                           [kv,5000,infinity,ns_port_server,start_link,                            [#Fun<ns_child_ports_sup.4.11883217>]]},                       86400000]}},              {restart_type,permanent},              {significant,false},              {shutdown,infinity},              {child_type,worker}]
      [error_logger:error,2023-12-14T23:23:04.855-08:00,babysitter_of_ns_1@cb.local:<0.147.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server <0.147.0> terminating ** Last message in was {#Port<0.13>,{exit_status,1}}** When Server state == {state,#Port<0.13>,7304,                            {memcached,                                "c:/Program Files/Couchbase/Server/bin/memcached",                                ["-C",                                 "c:/Program Files/Couchbase/Server/var/lib/couchbase/config/memcached.json"],                                [{env,                                     [{"EVENT_NOSELECT","1"},                                      {"CBSASL_PWFILE",                                       "c:/Program Files/Couchbase/Server/var/lib/couchbase/isasl.pw"},                                      {"JE_MALLOC_CONF","narenas:1"}]},                                 use_stdio,stderr_to_stdout,exit_status,                                 stream]},                            {ringbuffer,0,1024,{[],[]}},                            undefined,undefined,[],0}** Reason for termination ==** {abnormal,1}
      [error_logger:error,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:<0.147.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: ns_port_server:init/1    pid: <0.147.0>    registered_name: []    exception exit: {abnormal,1}      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1241)    ancestors: [<0.146.0>,<0.145.0>,ns_child_ports_sup,ns_babysitter_sup,                  <0.122.0>]    message_queue_len: 1    messages: [{'EXIT',#Port<0.13>,normal}]    links: [<0.146.0>]    dictionary: []    trap_exit: true    status: running    heap_size: 1598    stack_size: 28    reductions: 23991  neighbours:
      [ns_server:info,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:<0.146.0>:supervisor_cushion:handle_info:54]Cushion managed supervisor for memcached failed:  {abnormal,1}[error_logger:error,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:<0.146.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server <0.146.0> terminating ** Last message in was {die,{abnormal,1}}** When Server state == {state,memcached,5000,141110201,undefined,infinity}** Reason for termination ==** {abnormal,1}
      [error_logger:error,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:<0.146.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: supervisor_cushion:init/1    pid: <0.146.0>    registered_name: []    exception exit: {abnormal,1}      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1241)    ancestors: [<0.145.0>,ns_child_ports_sup,ns_babysitter_sup,<0.122.0>]    message_queue_len: 0    messages: []    links: [<0.145.0>]    dictionary: []    trap_exit: true    status: running    heap_size: 4185    stack_size: 28    reductions: 13338  neighbours:
      [error_logger:error,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:<0.145.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: erlang:apply/2    pid: <0.145.0>    registered_name: []    exception exit: {abnormal,1}      in function  restartable:loop/4 (src/restartable.erl, line 63)    ancestors: [ns_child_ports_sup,ns_babysitter_sup,<0.122.0>]    message_queue_len: 0    messages: []    links: [<0.141.0>]    dictionary: []    trap_exit: true    status: running    heap_size: 1598    stack_size: 28    reductions: 3228  neighbours:
      [error_logger:error,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:ns_child_ports_sup<0.141.0>:ale_error_logger_handler:do_log:101]=========================SUPERVISOR REPORT=========================    supervisor: {local,ns_child_ports_sup}    errorContext: child_terminated    reason: {abnormal,1}    offender: [{pid,<0.145.0>},               {id,{memcached,                       "c:/Program Files/Couchbase/Server/bin/memcached",                       ["-C",                        "c:/Program Files/Couchbase/Server/var/lib/couchbase/config/memcached.json"],                       [{env,                            [{"EVENT_NOSELECT","1"},                             {"CBSASL_PWFILE",                              "c:/Program Files/Couchbase/Server/var/lib/couchbase/isasl.pw"},                             {"JE_MALLOC_CONF","narenas:1"}]},                        use_stdio,stderr_to_stdout,exit_status,                        port_server_dont_start,stream]}},               {mfargs,                   {restartable,start_link,                       [{supervisor_cushion,start_link,                            [memcached,5000,infinity,ns_port_server,                             start_link,                             [#Fun<ns_child_ports_sup.4.11883217>]]},                        86400000]}},               {restart_type,permanent},               {significant,false},               {shutdown,infinity},               {child_type,worker}]
      [ns_server:debug,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:<0.421.0>:supervisor_cushion:init:33]starting ns_port_server with delay of 5000[ns_server:debug,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:<0.420.0>:restartable:start_child:92]Started child process <0.421.0>  MFA: {supervisor_cushion,start_link,                           [memcached,5000,infinity,ns_port_server,start_link,                            [#Fun<ns_child_ports_sup.4.11883217>]]}[error_logger:info,2023-12-14T23:23:04.871-08:00,babysitter_of_ns_1@cb.local:ns_child_ports_sup<0.141.0>:ale_error_logger_handler:do_log:101]=========================PROGRESS REPORT=========================    supervisor: {local,ns_child_ports_sup}    started: [{pid,<0.420.0>},              {id,{memcached,                      "c:/Program Files/Couchbase/Server/bin/memcached",                      ["-C",                       "c:/Program Files/Couchbase/Server/var/lib/couchbase/config/memcached.json"],                      [{env,                           [{"EVENT_NOSELECT","1"},                            {"CBSASL_PWFILE",                             "c:/Program Files/Couchbase/Server/var/lib/couchbase/isasl.pw"},                            {"JE_MALLOC_CONF","narenas:1"}]},                       use_stdio,stderr_to_stdout,exit_status,                       port_server_dont_start,stream]}},              {mfargs,                  {restartable,start_link,                      [{supervisor_cushion,start_link,                           [memcached,5000,infinity,ns_port_server,start_link,                            [#Fun<ns_child_ports_sup.4.11883217>]]},                       86400000]}},              {restart_type,permanent},              {significant,false},              {shutdown,infinity},              {child_type,worker}]
      [ns_server:info,2023-12-14T23:28:24.877-08:00,babysitter_of_ns_1@cb.local:<0.98.0>:ns_babysitter_bootstrap:stop:30]3980: got shutdown request. Terminating.[ns_server:debug,2023-12-14T23:28:24.877-08:00,babysitter_of_ns_1@cb.local:<0.222.0>:ns_port_server:terminate:198]Shutting down port kv[ns_server:debug,2023-12-14T23:28:24.893-08:00,babysitter_of_ns_1@cb.local:projector.exe-goport<0.228.0>:goport:handle_eof:585]Stream 'stdout' closed[ns_server:debug,2023-12-14T23:28:24.893-08:00,babysitter_of_ns_1@cb.local:projector.exe-goport<0.228.0>:goport:handle_eof:585]Stream 'stderr' closed[ns_server:info,2023-12-14T23:28:24.909-08:00,babysitter_of_ns_1@cb.local:projector.exe-goport<0.228.0>:goport:handle_process_exit:566]Port exited with status 0.[ns_server:info,2023-12-14T23:28:24.912-08:00,babysitter_of_ns_1@cb.local:<0.222.0>:ns_port_server:handle_info:149]Got {exit_status,0} from port kv. Exiting normally[ns_server:debug,2023-12-14T23:28:24.912-08:00,babysitter_of_ns_1@cb.local:<0.222.0>:ns_port_server:terminate:201]kv has exited[ns_server:debug,2023-12-14T23:28:24.912-08:00,babysitter_of_ns_1@cb.local:<0.220.0>:restartable:shutdown_child:114]Successfully terminated process <0.221.0>[ns_server:debug,2023-12-14T23:28:24.912-08:00,babysitter_of_ns_1@cb.local:<0.153.0>:ns_port_server:terminate:198]Shutting down port goxdcr[ns_server:debug,2023-12-14T23:28:24.914-08:00,babysitter_of_ns_1@cb.local:goxdcr.exe-goport<0.159.0>:goport:handle_eof:585]Stream 'stdout' closed[ns_server:debug,2023-12-14T23:28:24.914-08:00,babysitter_of_ns_1@cb.local:goxdcr.exe-goport<0.159.0>:goport:handle_eof:585]Stream 'stderr' closed[ns_server:info,2023-12-14T23:28:24.940-08:00,babysitter_of_ns_1@cb.local:goxdcr.exe-goport<0.159.0>:goport:handle_process_exit:566]Port exited with status 0.[ns_server:info,2023-12-14T23:28:24.940-08:00,babysitter_of_ns_1@cb.local:<0.153.0>:ns_port_server:handle_info:149]Got {exit_status,0} from port goxdcr. Exiting normally[ns_server:debug,2023-12-14T23:28:24.940-08:00,babysitter_of_ns_1@cb.local:<0.153.0>:ns_port_server:terminate:201]goxdcr has exited[ns_server:debug,2023-12-14T23:28:24.940-08:00,babysitter_of_ns_1@cb.local:<0.151.0>:restartable:shutdown_child:114]Successfully terminated process <0.152.0>[ns_server:debug,2023-12-14T23:28:24.941-08:00,babysitter_of_ns_1@cb.local:<0.150.0>:ns_port_server:terminate:198]Shutting down port saslauthd_port[ns_server:debug,2023-12-14T23:28:24.941-08:00,babysitter_of_ns_1@cb.local:<0.150.0>:ns_port_server:port_shutdown:317]Shutdown command: "shutdown"[ns_server:info,2023-12-14T23:28:24.942-08:00,babysitter_of_ns_1@cb.local:<0.150.0>:ns_port_server:handle_info:149]Got {exit_status,0} from port saslauthd_port. Exiting normally[ns_server:debug,2023-12-14T23:28:24.942-08:00,babysitter_of_ns_1@cb.local:<0.150.0>:ns_port_server:terminate:201]saslauthd_port has exited[ns_server:info,2023-12-14T23:28:24.942-08:00,babysitter_of_ns_1@cb.local:<0.150.0>:ns_port_server:log:226]saslauthd_port<0.150.0>: 2023/12/14 23:28:24 Got EOL. Exiting
      [ns_server:debug,2023-12-14T23:28:24.942-08:00,babysitter_of_ns_1@cb.local:<0.148.0>:restartable:shutdown_child:114]Successfully terminated process <0.149.0>[ns_server:debug,2023-12-14T23:28:24.942-08:00,babysitter_of_ns_1@cb.local:<0.422.0>:ns_port_server:terminate:198]Shutting down port memcached[ns_server:debug,2023-12-14T23:28:24.942-08:00,babysitter_of_ns_1@cb.local:<0.422.0>:ns_port_server:port_shutdown:317]Shutdown command: "shutdown"[ns_server:info,2023-12-14T23:28:25.144-08:00,babysitter_of_ns_1@cb.local:<0.422.0>:ns_port_server:log:226]memcached<0.422.0>: EOL on stdin.  Initiating shutdown
      [ns_server:info,2023-12-14T23:28:25.394-08:00,babysitter_of_ns_1@cb.local:<0.422.0>:ns_port_server:handle_info:149]Got {exit_status,0} from port memcached. Exiting normally[ns_server:debug,2023-12-14T23:28:25.394-08:00,babysitter_of_ns_1@cb.local:<0.422.0>:ns_port_server:terminate:201]memcached has exited[ns_server:debug,2023-12-14T23:28:25.394-08:00,babysitter_of_ns_1@cb.local:<0.420.0>:restartable:shutdown_child:114]Successfully terminated process <0.421.0>[ns_server:debug,2023-12-14T23:28:25.394-08:00,babysitter_of_ns_1@cb.local:<0.140.0>:ns_port_server:terminate:198]Shutting down port ns_server[ns_server:debug,2023-12-14T23:28:25.394-08:00,babysitter_of_ns_1@cb.local:<0.140.0>:ns_port_server:port_shutdown:317]Shutdown command: "shutdown"[error_logger:info,2023-12-14T23:28:25.472-08:00,babysitter_of_ns_1@cb.local:net_kernel<0.71.0>:ale_error_logger_handler:do_log:101] 

       


       

       

      Testrunner Script to reproduce

      ./testrunner -i /tmp/testexec.19367.ini -p get-cbcollect-info=False,get-logs=False,get-coredumps=False,GROUP=set2,get-cbcollect-info=True,get-cbcollect-info=True -t gsi.recovery_gsi.SecondaryIndexingRecoveryTests.test_rebalance_in_out,before=create_index,in_between=query,after=query_with_explain:verify_explain_result-query:verify_query_result:verify_explain_result,groups=simple,dataset=default,doc-per-day=10,nodes_in=1,nodes_out=1,services_in=kv,services_init=kv-kv-index-index:n1ql,nodes_out_dist=kv:1,nodes_init=4,targetMaster=True,GROUP=REB-IN-OUT;P0;set2

      Job name : windows11-os_certify-2i_2

      Job ref : http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.2.4-7059/jenkins_logs/test_suite_executor-dynvm/18758/

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              bryan.mccoid Bryan McCoid
              raghav.sk Raghav S K
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty