Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44229

ns_server crashes after reinstallation on Centos7

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • installer
    • CentOS Linux release 7.9.2009 (Core), Vagrants, IPv4
    • 1
    • Build Team 2021 Sprint 5

    Description

      Description

      Reinstalling Couchbase and then attempting to start it results in ns_server crashing.

      Steps to reproduce

      1. Spawn a completely new vagrant virtual machine running 7.0.0-4390 on Centos7.
      2. Stop and remove Couchbase.

        systemctl stop couchbase-server
        yum -y remove couchbase-server
        

      3. Download build 7.0.0-4390

        wget http://172.23.126.166/builds/latestbuilds/couchbase-server/cheshire-cat/4390/couchbase-server-enterprise-7.0.0-4390-centos7.x86_64.rpm
        

      4. Reinstall and start Couchbase

        yum -y install couchbase-server-enterprise-7.0.0-4390-centos7.x86_64.rpm
        systemctl start couchbase-server
        

      What happens:

      The installation succeeds, but ns_server crashes. The output of 'systemctl status couchbase-server -l' shows the following:

      systemctl status couchbase-server -l

      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: client_info: undefined
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: label: {gen_server,terminate}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: last_message: {die,{abnormal,3}}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: log: []
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: name: <0.120.0>
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: reason: {abnormal,3}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: state: {state,ns_server,5000,-576460751640103199,undefined,infinity}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.346049 crash_report        #{label=>{proc_lib,crash},report=>[[{initial_call,{supervisor_cushion,init,['Argument__1']}},{pid,<0.120.0>},{registered_name,[]},{error_info,{exit,{abnormal,3},[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,751}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}},{ancestors,[<0.119.0>,child_ns_server_sup,ns_babysitter_sup,<0.105.0>]},{message_queue_len,0},{messages,[]},{links,[<0.119.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,1598},{stack_size,27},{reductions,8769}],[]]}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.346634 crash_report        #{label=>{proc_lib,crash},report=>[[{initial_call,{erlang,apply,['Argument__1','Argument__2']}},{pid,<0.119.0>},{registered_name,[]},{error_info,{exit,{abnormal,3},[{restartable,loop,4,[{file,"src/restartable.erl"},{line,69}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}},{ancestors,[child_ns_server_sup,ns_babysitter_sup,<0.105.0>]},{message_queue_len,0},{messages,[]},{links,[<0.118.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,10958},{stack_size,27},{reductions,3018}],[]]}
      Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.347059 supervisor_report   #{label=>{supervisor,child_terminated},report=>[{supervisor,{local,child_ns_server_sup}},{errorContext,child_terminated},{reason,{abnormal,3}},{offender,[{pid,<0.119.0>},{id,{ns_server,"/opt/couchbase/lib/erlang/bin/erl",["+A","16","-smp","enable","+sbt","u","+P","327680","+K","true","+swt","low","+sbwt","none","+MMmcs","30","+e102400","-setcookie","nocookie","-kernel","logger","[{handler, default, undefined}]","-user","user_io","-run","child_erlang","child_start","ns_bootstrap","--","-smp","enable","-kernel","logger","[{handler, default, undefined}]","inetrc","\"/opt/couchbase/etc/couchbase/hosts.cfg\"","dist_config_file","\"/opt/couchbase/var/lib/couchbase/config/dist_cfg\"","-proto_dist","cb","-epmd_module","cb_epmd","-ssl_dist_optfile","/opt/couchbase/etc/couchbase/ssl_dist_opts","-couch_ini","/opt/couchbase/etc/couchdb/default.ini","/opt/couchbase/etc/couchdb/default.d/capi.ini","/opt/couchbase/etc/couchdb/default.d/geocouch.ini","/opt/couchbase/etc/couchdb/local.ini"],[{env,[{"NS_SERVER_BABYSITTER_PID","7748"},{"CHILD_ERLANG_ENV_ARGS","[{ns_babysitter,\n     [{cookiefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.cookie\"},\n      {pidfile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.pid\"},\n      {nodefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.node\"}]},\n {ns_server,\n     [{loglevel_stats,debug},\n      {error_logger_mf_dir,\"/opt/couchbase/var/lib/couchbase/logs\"},\n      {max_t,10},\n      {loglevel_views,debug},\n      {path_config_etcdir,\"/opt/couchbase/etc/couchbase\"},\n      {path_config_secdir,\"/opt/couchbase/etc/security\"},\n      {path_config_datadir,\"/opt/couchbase/var/lib/couchbase\"},\n      {loglevel_cluster,debug},\n      {loglevel_error_logger,debug},\n      {loglevel_user,debug},\n      {loglevel_ns_server,debug},\n      {path_config_libdir,\"/opt/couchbase/lib\"},\n      {loglevel_cbas,debug},\n      {max_r,20},\n      {config_path,\"/opt/couchbase/etc/couchbase/static_config\"},\n      {disk_sink_opts_json_rpc,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,2},\n                {buffer_size_max,52428800}]}]},\n      {loglevel_access,info},\n      {loglevel_couchdb,info},\n      {path_config_tmpdir,\"/opt/couchbase/var/lib/couchbase/tmp\"},\n      {net_kernel_verbosity,10},\n      {loglevel_rebalance,debug},\n      {loglevel_default,debug},\n      {nodefile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.node\"},\n      {loglevel_xdcr,debug},\n      {loglevel_ns_doctor,debug},\n      {path_config_bindir,\"/opt/couchbase/bin\"},\n      {loglevel_menelaus,debug},\n      {disk_sink_opts,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,10},\n                {buffer_size_max,52428800}]}]},\n      {loglevel_mapreduce_errors,debug}]}]"},{"ERL_CRASH_DUMP","erl_crash.dump.1612963342.7748.ns_server"}]},exit_status,use_stdio,stream,eof]}},{mfargs,{restartable,start_link,[{supervisor_cushion,start_link,[ns_server,5000,infinity,ns_port_server,start_link,[#Fun<ns_child_ports_sup.4.12324699>]]},86400000]}},{restart_type,permanent},{shutdown,infinity},{child_type,worker}]}]}

      Furthermore, I could not make http requests to 'localhost:8091' using curl or wget.

      The ns_server.error log file reports additional errors (see appendix).

      A reason for why this may happen:

      With help from Daniel Owen, I was able to narrow down what might be the cause.

      Post uninstallation, the 'chronicle' folder is left behind:

      [root@node1-cheshire-cat-testing-centos7 ~]# tree /opt/couchbase/var/lib/couchbase/config/chronicle/
      /opt/couchbase/var/lib/couchbase/config/chronicle/
      ├── logs
      │   └── 0.log
      └── snapshots
      

      Deleting this folder after the uninstall and before the reinstall prevents the crash after reinstallation.

       rm -rf /opt/couchbase/var/lib/couchbase/config/chronicle/

      What I expected to happen:

      I expected ns_server to start successfully after reinstallation.

      Logs:

      Logs from vagrant (based on the steps to reproduce): output_vagrant.zip

      I observed this issue on a test vm first, the logs for that can be found here: output_testing_vm.zip

      Steps to reproduce in a video format:

      Part 1.mov

      Part 2.mov

      Part 3.mov

      Appendix

      The last 100 entries from 'ns_server.error.log':

      ns_server.error.log

      [ns_server:error,2021-02-10T13:19:21.744Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:19:36.748Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:19:36.748Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:19:51.423Z,ns_1@10.112.210.101:<0.12005.0>:service_agent:wait_for_connection_loop:351]No connection with label "index-service_api" after 60000ms. Exiting.
      [ns_server:error,2021-02-10T13:19:51.424Z,ns_1@10.112.210.101:service_agent-index<0.12004.0>:service_agent:handle_info:283]Linked process <0.12005.0> died with reason {no_connection,
                                                   "index-service_api"}. Terminating
      [ns_server:error,2021-02-10T13:19:51.424Z,ns_1@10.112.210.101:service_agent-index<0.12004.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:19:51.750Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:19:51.750Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:06.752Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:06.752Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:21.754Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:21.754Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:36.756Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:36.756Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:<0.14861.0>:service_agent:wait_for_connection_loop:351]No connection with label "index-service_api" after 60000ms. Exiting.
      [ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:service_agent-index<0.14860.0>:service_agent:handle_info:283]Linked process <0.14861.0> died with reason {no_connection,
                                                   "index-service_api"}. Terminating
      [ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:service_agent-index<0.14860.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:20:51.758Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:20:51.758Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:06.761Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:21:06.761Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:21.763Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
      [ns_server:error,2021-02-10T13:21:21.763Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:27.584Z,ns_1@10.112.210.101:service_agent-backup<0.505.0>:service_agent:handle_info:291]Lost json rpc connection for service backup, reason shutdown. Terminating.
      [ns_server:error,2021-02-10T13:21:27.584Z,ns_1@10.112.210.101:service_agent-backup<0.505.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:21:27.595Z,ns_1@10.112.210.101:service_agent-eventing<0.509.0>:service_agent:handle_info:291]Lost json rpc connection for service eventing, reason shutdown. Terminating.
      [ns_server:error,2021-02-10T13:21:27.596Z,ns_1@10.112.210.101:service_agent-eventing<0.509.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:21:27.601Z,ns_1@10.112.210.101:service_agent-fts<0.513.0>:service_agent:handle_info:291]Lost json rpc connection for service fts, reason shutdown. Terminating.
      [ns_server:error,2021-02-10T13:21:27.601Z,ns_1@10.112.210.101:service_agent-fts<0.513.0>:service_agent:terminate:312]Terminating abnormally
      [ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,
                                                   {econnrefused,
                                                    [{lhttpc_client,send_request,1,
                                                      [{file,
                                                        "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                       {line,220}]},
                                                     {lhttpc_client,execute,9,
                                                      [{file,
                                                        "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                       {line,169}]},
                                                     {lhttpc_client,request,9,
                                                      [{file,
                                                        "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                       {line,93}]}]}}
      [ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
      [ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (fts) api/nsstatus failed: {error,
                                             {econnrefused,
                                              [{lhttpc_client,send_request,1,
                                                [{file,
                                                  "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                 {line,220}]},
                                               {lhttpc_client,execute,9,
                                                [{file,
                                                  "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                 {line,169}]},
                                               {lhttpc_client,request,9,
                                                [{file,
                                                  "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                 {line,93}]}]}}
      [ns_server:error,2021-02-10T13:21:27.623Z,ns_1@10.112.210.101:service_status_keeper-fts<0.613.0>:service_status_keeper:handle_cast:132]Service service_fts returned incorrect status
      [ns_server:error,2021-02-10T13:21:27.626Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (eventing) api/v1/functions failed: {error,
                                                      {econnrefused,
                                                       [{lhttpc_client,
                                                         send_request,1,
                                                         [{file,
                                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                          {line,220}]},
                                                        {lhttpc_client,execute,9,
                                                         [{file,
                                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                          {line,169}]},
                                                        {lhttpc_client,request,9,
                                                         [{file,
                                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
                                                          {line,93}]}]}}
      [ns_server:error,2021-02-10T13:21:27.626Z,ns_1@10.112.210.101:service_status_keeper-eventing<0.618.0>:service_status_keeper:handle_cast:132]Service service_eventing returned incorrect status
      [ns_server:error,2021-02-10T13:21:28.697Z,ns_1@10.112.210.101:<0.354.0>:ns_port_server:handle_info:157]Got unexpected exit signal from port: {'EXIT',<0.360.0>,normal}. Exiting.
      [ns_server:error,2021-02-10T13:21:28.697Z,ns_1@10.112.210.101:prometheus_cfg<0.349.0>:prometheus_cfg:handle_info:376]Received exit from Prometheus port server - <0.354.0>: normal. Restarting Prometheus...
      [ns_server:error,2021-02-10T13:21:29.082Z,ns_1@10.112.210.101:<0.19440.0>:prometheus:post_async:194]Prometheus http request failed:
      URL: http://127.0.0.1:9123/api/v1/query
      Body: query=%7Bname%3D~%60kv_curr_items%7Ckv_curr_items_tot%7Ckv_vb_replica_curr_items%7Ckv_mem_used_bytes%7Ccouch_docs_actual_disk_size%7Ccouch_views_actual_disk_size%7Ckv_ep_db_data_size_bytes%7Ckv_vb_active_num_non_resident%7Ckv_ep_bg_fetched%60%7D+or+label_replace%28sum+by+%28bucket%2C+name%29+%28irate%28kv_ops%7Bop%3D%60get%60%7D%5B1m%5D%29%29%2C+%60name%60%2C%60cmd_get%60%2C+%60%60%2C+%60%60%29+or+label_replace%28irate%28kv_ops%7Bop%3D%60get%60%2Cresult%3D%60hit%60%7D%5B1m%5D%29%2C%60name%60%2C%60get_hits%60%2C%60%60%2C%60%60%29+or+label_replace%28sum+by+%28bucket%29+%28irate%28kv_cmd_lookup%5B1m%5D%29+or+irate%28kv_ops%7Bop%3D~%60set%7Cincr%7Cdecr%7Cdelete%7Cdel_meta%7Cget_meta%7Cset_meta%7Cset_ret_meta%7Cdel_ret_meta%60%7D%5B1m%5D%29%29%2C+%60name%60%2C+%60ops%60%2C+%60%60%2C+%60%60%29+or+sum+by+%28bucket%2C+name%29+%28%7Bname%3D~%60index_data_size%7Cindex_disk_size%7Ccouch_spatial_data_size%7Ccouch_spatial_disk_size%7Ccouch_views_data_size%60%7D%29&timeout=5s
      Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
                               {inet,[inet],econnrefused}]}
      [ns_server:error,2021-02-10T13:21:29.131Z,ns_1@10.112.210.101:<0.19443.0>:prometheus:post_async:194]Prometheus http request failed:
      URL: http://127.0.0.1:9123/api/v1/query
      Body: query=%7Bcategory%3D%60system-processes%60%2Cname%3D~%60sysproc_mem_resident%7Csysproc_mem_size%7Csysproc_cpu_utilization%7Csysproc_major_faults_raw%60%7D&timeout=5s
      Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
                               {inet,[inet],econnrefused}]}
      [ns_server:error,2021-02-10T13:21:29.169Z,ns_1@10.112.210.101:<0.19448.0>:prometheus:post_async:194]Prometheus http request failed:
      URL: http://127.0.0.1:9123/api/v1/query
      Body: query=%7Bcategory%3D%60system%60%2Cname%3D~%60sys_cpu_utilization_rate%7Csys_cpu_stolen_rate%7Csys_swap_total%7Csys_swap_used%7Csys_mem_total%7Csys_mem_free%7Csys_mem_limit%7Csys_cpu_cores_available%7Csys_allocstall%60%7D&timeout=5s
      Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
                               {inet,[inet],econnrefused}]}
      [ns_server:error,2021-02-10T13:21:38.564Z,ns_1@10.112.210.101:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:22:42.736Z,ns_1@127.0.0.1:prometheus_cfg<0.358.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:23:02.447Z,ns_1@127.0.0.1:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:23:22.128Z,ns_1@127.0.0.1:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:23:41.783Z,ns_1@127.0.0.1:prometheus_cfg<0.347.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      [ns_server:error,2021-02-10T13:24:01.892Z,ns_1@127.0.0.1:prometheus_cfg<0.352.0>:prometheus_cfg:terminate:397]Terminate: shutdown
      

       

      Attachments

        1. Part 3.mov
          16.39 MB
        2. Part 2.mov
          23.06 MB
        3. Part 1.mov
          12.52 MB
        4. output_vagrant.zip
          8.55 MB
        5. output_testing_vm.zip
          21.50 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              asad.zaidi Asad Zaidi (Inactive)
              asad.zaidi Asad Zaidi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty