Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44229

ns_server crashes after reinstallation on Centos7

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • installer
    • CentOS Linux release 7.9.2009 (Core), Vagrants, IPv4
    • 1
    • Build Team 2021 Sprint 5

    Description

      Description

      Reinstalling Couchbase and then attempting to start it results in ns_server crashing.

      Steps to reproduce

      1. Spawn a completely new vagrant virtual machine running 7.0.0-4390 on Centos7.
      2. Stop and remove Couchbase.

        systemctl stop couchbase-server
        yum -y remove couchbase-server
        

      3. Download build 7.0.0-4390

        wget http://172.23.126.166/builds/latestbuilds/couchbase-server/cheshire-cat/4390/couchbase-server-enterprise-7.0.0-4390-centos7.x86_64.rpm
        

      4. Reinstall and start Couchbase

        yum -y install couchbase-server-enterprise-7.0.0-4390-centos7.x86_64.rpm
        systemctl start couchbase-server
        

      What happens:

      The installation succeeds, but ns_server crashes. The output of 'systemctl status couchbase-server -l' shows the following:

      systemctl status couchbase-server -l

      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: client_info: undefined
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: label: {gen_server,terminate}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: last_message: {die,{abnormal,3}}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: log: []
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: name: <0.120.0>
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: reason: {abnormal,3}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: state: {state,ns_server,5000,-576460751640103199,undefined,infinity}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.346049 crash_report        #{label=>{proc_lib,crash},report=>[[{initial_call,{supervisor_cushion,init,['Argument__1']}},{pid,<0.120.0>},{registered_name,[]},{error_info,{exit,{abnormal,3},[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,751}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}},{ancestors,[<0.119.0>,child_ns_server_sup,ns_babysitter_sup,<0.105.0>]},{message_queue_len,0},{messages,[]},{links,[<0.119.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,1598},{stack_size,27},{reductions,8769}],[]]}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.346634 crash_report        #{label=>{proc_lib,crash},report=>[[{initial_call,{erlang,apply,['Argument__1','Argument__2']}},{pid,<0.119.0>},{registered_name,[]},{error_info,{exit,{abnormal,3},[{restartable,loop,4,[{file,"src/restartable.erl"},{line,69}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}},{ancestors,[child_ns_server_sup,ns_babysitter_sup,<0.105.0>]},{message_queue_len,0},{messages,[]},{links,[<0.118.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,10958},{stack_size,27},{reductions,3018}],[]]}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.347059 supervisor_report   #{label=>{supervisor,child_terminated},report=>[{supervisor,{local,child_ns_server_sup}},{errorContext,child_terminated},{reason,{abnormal,3}},{offender,[{pid,<0.119.0>},{id,{ns_server,"/opt/couchbase/lib/erlang/bin/erl",["+A","16","-smp","enable","+sbt","u","+P","327680","+K","true","+swt","low","+sbwt","none","+MMmcs","30","+e102400","-setcookie","nocookie","-kernel","logger","[{handler, default, undefined}]","-user","user_io","-run","child_erlang","child_start","ns_bootstrap","--","-smp","enable","-kernel","logger","[{handler, default, undefined}]","inetrc","\"/opt/couchbase/etc/couchbase/hosts.cfg\"","dist_config_file","\"/opt/couchbase/var/lib/couchbase/config/dist_cfg\"","-proto_dist","cb","-epmd_module","cb_epmd","-ssl_dist_optfile","/opt/couchbase/etc/couchbase/ssl_dist_opts","-couch_ini","/opt/couchbase/etc/couchdb/default.ini","/opt/couchbase/etc/couchdb/default.d/capi.ini","/opt/couchbase/etc/couchdb/default.d/geocouch.ini","/opt/couchbase/etc/couchdb/local.ini"],[{env,[{"NS_SERVER_BABYSITTER_PID","7748"},{"CHILD_ERLANG_ENV_ARGS","[{ns_babysitter,\n     [{cookiefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.cookie\"},\n      {pidfile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.pid\"},\n      {nodefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.node\"}]},\n {ns_server,\n     [{loglevel_stats,debug},\n      {error_logger_mf_dir,\"/opt/couchbase/var/lib/couchbase/logs\"},\n      {max_t,10},\n      {loglevel_views,debug},\n      {path_config_etcdir,\"/opt/couchbase/etc/couchbase\"},\n      {path_config_secdir,\"/opt/couchbase/etc/security\"},\n      {path_config_datadir,\"/opt/couchbase/var/lib/couchbase\"},\n      {loglevel_cluster,debug},\n      {loglevel_error_logger,debug},\n      {loglevel_user,debug},\n      {loglevel_ns_server,debug},\n      {path_config_libdir,\"/opt/couchbase/lib\"},\n      {loglevel_cbas,debug},\n      {max_r,20},\n      {config_path,\"/opt/couchbase/etc/couchbase/static_config\"},\n      {disk_sink_opts_json_rpc,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,2},\n                {buffer_size_max,52428800}]}]},\n      {loglevel_access,info},\n      {loglevel_couchdb,info},\n      {path_config_tmpdir,\"/opt/couchbase/var/lib/couchbase/tmp\"},\n      {net_kernel_verbosity,10},\n      {loglevel_rebalance,debug},\n      {loglevel_default,debug},\n      {nodefile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.node\"},\n      {loglevel_xdcr,debug},\n      {loglevel_ns_doctor,debug},\n      {path_config_bindir,\"/opt/couchbase/bin\"},\n      {loglevel_menelaus,debug},\n      {disk_sink_opts,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,10},\n                {buffer_size_max,52428800}]}]},\n      {loglevel_mapreduce_errors,debug}]}]"},{"ERL_CRASH_DUMP","erl_crash.dump.1612963342.7748.ns_server"}]},exit_status,use_stdio,stream,eof]}},{mfargs,{restartable,start_link,[{supervisor_cushion,start_link,[ns_server,5000,infinity,ns_port_server,start_link,[#Fun<ns_child_ports_sup.4.12324699>]]},86400000]}},{restart_type,permanent},{shutdown,infinity},{child_type,worker}]}]}

      Furthermore, I could not make http requests to 'localhost:8091' using curl or wget.

      The ns_server.error log file reports additional errors (see appendix).

      A reason for why this may happen:

      With help from Daniel Owen, I was able to narrow down what might be the cause.

      Post uninstallation, the 'chronicle' folder is left behind:

      [root@node1-cheshire-cat-testing-centos7 ~]# tree /opt/couchbase/var/lib/couchbase/config/chronicle/
      /opt/couchbase/var/lib/couchbase/config/chronicle/
      ├── logs
      │   └── 0.log
      └── snapshots
      

      Deleting this folder after the uninstall and before the reinstall prevents the crash after reinstallation.

       rm -rf /opt/couchbase/var/lib/couchbase/config/chronicle/

      What I expected to happen:

      I expected ns_server to start successfully after reinstallation.

      Logs:

      Logs from vagrant (based on the steps to reproduce): output_vagrant.zip

      I observed this issue on a test vm first, the logs for that can be found here: output_testing_vm.zip

      Steps to reproduce in a video format:

      Part 1.mov

      Part 2.mov

      Part 3.mov

      Appendix

      The last 100 entries from 'ns_server.error.log':

      ns_server.error.log

      [ns_server:error,2021-02-10T13:19:21.744Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:19:36.748Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:19:36.748Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:19:51.423Z,ns_1@10.112.210.101:<0.12005.0>:service_agent:wait_for_connection_loop:351]No connection with label "index-service_api" after 60000ms. Exiting.
      [ns_server:error,2021-02-10T13:19:51.424Z,ns_1@10.112.210.101:service_agent-index<0.12004.0>:service_agent:handle_info:283]Linked process <0.12005.0> died with reason {no_connection,
                                                   "index-service_api"}. Terminating
      [ns_server:error,2021-02-10T13:19:51.424Z,ns_1@10.112.210.101:service_agent-index<0.12004.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:19:51.750Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:19:51.750Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:06.752Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:06.752Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:21.754Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:21.754Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:36.756Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:36.756Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:<0.14861.0>:service_agent:wait_for_connection_loop:351]No connection with label "index-service_api" after 60000ms. Exiting.
      [ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:service_agent-index<0.14860.0>:service_agent:handle_info:283]Linked process <0.14861.0> died with reason {no_connection,
                                                   "index-service_api"}. Terminating
      [ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:service_agent-index<0.14860.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:20:51.758Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:51.758Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:06.761Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:21:06.761Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:21.763Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:21:21.763Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:27.584Z,ns_1@10.112.210.101:service_agent-backup<0.505.0>:service_agent:handle_info:291]Lost json rpc connection for service backup, reason shutdown. Terminating.
      [ns_server:error,2021-02-10T13:21:27.584Z,ns_1@10.112.210.101:service_agent-backup<0.505.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:21:27.595Z,ns_1@10.112.210.101:service_agent-eventing<0.509.0>:service_agent:handle_info:291]Lost json rpc connection for service eventing, reason shutdown. Terminating.
      [ns_server:error,2021-02-10T13:21:27.596Z,ns_1@10.112.210.101:service_agent-eventing<0.509.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:21:27.601Z,ns_1@10.112.210.101:service_agent-fts<0.513.0>:service_agent:handle_info:291]Lost json rpc connection for service fts, reason shutdown. Terminating.
      [ns_server:error,2021-02-10T13:21:27.601Z,ns_1@10.112.210.101:service_agent-fts<0.513.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,
                                                   {econnrefused,
                                                    [{lhttpc_client,send_request,1,
                                                      [{file,
                                                        "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                       {line,220}]},
                                                     {lhttpc_client,execute,9,
                                                      [{file,
                                                        "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                       {line,169}]},
                                                     {lhttpc_client,request,9,
                                                      [{file,
                                                        "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                       {line,93}]}]}}
      [ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (fts) api/nsstatus failed: {error,
                                             {econnrefused,
                                              [{lhttpc_client,send_request,1,
                                                [{file,
                                                  "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                 {line,220}]},
                                               {lhttpc_client,execute,9,
                                                [{file,
                                                  "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                 {line,169}]},
                                               {lhttpc_client,request,9,
                                                [{file,
                                                  "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                 {line,93}]}]}}
      [ns_server:error,2021-02-10T13:21:27.623Z,ns_1@10.112.210.101:service_status_keeper-fts<0.613.0>:service_status_keeper:handle_cast:132]Service service_fts returned incorrect status
      [ns_server:error,2021-02-10T13:21:27.626Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (eventing) api/v1/functions failed: {error,
                                                      {econnrefused,
                                                       [{lhttpc_client,
                                                         send_request,1,
                                                         [{file,
                                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                          {line,220}]},
                                                        {lhttpc_client,execute,9,
                                                         [{file,
                                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                          {line,169}]},
                                                        {lhttpc_client,request,9,
                                                         [{file,
                                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                          {line,93}]}]}}
      [ns_server:error,2021-02-10T13:21:27.626Z,ns_1@10.112.210.101:service_status_keeper-eventing<0.618.0>:service_status_keeper:handle_cast:132]Service service_eventing returned incorrect status
      [ns_server:error,2021-02-10T13:21:28.697Z,ns_1@10.112.210.101:<0.354.0>:ns_port_server:handle_info:157]Got unexpected exit signal from port: {'EXIT',<0.360.0>,normal}. Exiting.
      [ns_server:error,2021-02-10T13:21:28.697Z,ns_1@10.112.210.101:prometheus_cfg<0.349.0>:prometheus_cfg:handle_info:376]Received exit from Prometheus port server - <0.354.0>: normal. Restarting Prometheus...
      [ns_server:error,2021-02-10T13:21:29.082Z,ns_1@10.112.210.101:<0.19440.0>:prometheus:post_async:194]Prometheus http request failed:
      URL: http://127.0.0.1:9123/api/v1/query
      Body: query=%7Bname%3D~%60kv_curr_items%7Ckv_curr_items_tot%7Ckv_vb_replica_curr_items%7Ckv_mem_used_bytes%7Ccouch_docs_actual_disk_size%7Ccouch_views_actual_disk_size%7Ckv_ep_db_data_size_bytes%7Ckv_vb_active_num_non_resident%7Ckv_ep_bg_fetched%60%7D+or+label_replace%28sum+by+%28bucket%2C+name%29+%28irate%28kv_ops%7Bop%3D%60get%60%7D%5B1m%5D%29%29%2C+%60name%60%2C%60cmd_get%60%2C+%60%60%2C+%60%60%29+or+label_replace%28irate%28kv_ops%7Bop%3D%60get%60%2Cresult%3D%60hit%60%7D%5B1m%5D%29%2C%60name%60%2C%60get_hits%60%2C%60%60%2C%60%60%29+or+label_replace%28sum+by+%28bucket%29+%28irate%28kv_cmd_lookup%5B1m%5D%29+or+irate%28kv_ops%7Bop%3D~%60set%7Cincr%7Cdecr%7Cdelete%7Cdel_meta%7Cget_meta%7Cset_meta%7Cset_ret_meta%7Cdel_ret_meta%60%7D%5B1m%5D%29%29%2C+%60name%60%2C+%60ops%60%2C+%60%60%2C+%60%60%29+or+sum+by+%28bucket%2C+name%29+%28%7Bname%3D~%60index_data_size%7Cindex_disk_size%7Ccouch_spatial_data_size%7Ccouch_spatial_disk_size%7Ccouch_views_data_size%60%7D%29&timeout=5s
      Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
                               {inet,[inet],econnrefused}]}
      [ns_server:error,2021-02-10T13:21:29.131Z,ns_1@10.112.210.101:<0.19443.0>:prometheus:post_async:194]Prometheus http request failed:
      URL: http://127.0.0.1:9123/api/v1/query
      Body: query=%7Bcategory%3D%60system-processes%60%2Cname%3D~%60sysproc_mem_resident%7Csysproc_mem_size%7Csysproc_cpu_utilization%7Csysproc_major_faults_raw%60%7D&timeout=5s
      Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
                               {inet,[inet],econnrefused}]}
      [ns_server:error,2021-02-10T13:21:29.169Z,ns_1@10.112.210.101:<0.19448.0>:prometheus:post_async:194]Prometheus http request failed:
      URL: http://127.0.0.1:9123/api/v1/query
      Body: query=%7Bcategory%3D%60system%60%2Cname%3D~%60sys_cpu_utilization_rate%7Csys_cpu_stolen_rate%7Csys_swap_total%7Csys_swap_used%7Csys_mem_total%7Csys_mem_free%7Csys_mem_limit%7Csys_cpu_cores_available%7Csys_allocstall%60%7D&timeout=5s
      Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
                               {inet,[inet],econnrefused}]}
      [ns_server:error,2021-02-10T13:21:38.564Z,ns_1@10.112.210.101:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:22:42.736Z,ns_1@127.0.0.1:prometheus_cfg<0.358.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:23:02.447Z,ns_1@127.0.0.1:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:23:22.128Z,ns_1@127.0.0.1:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:23:41.783Z,ns_1@127.0.0.1:prometheus_cfg<0.347.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:24:01.892Z,ns_1@127.0.0.1:prometheus_cfg<0.352.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      

       

      Attachments

        1. output_testing_vm.zip
          21.50 MB
        2. output_vagrant.zip
          8.55 MB
        3. Part 1.mov
          12.52 MB
        4. Part 2.mov
          23.06 MB
        5. Part 3.mov
          16.39 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              asad.zaidi Asad Zaidi (Inactive)
              asad.zaidi Asad Zaidi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty