Details
Description
Description
Reinstalling Couchbase and then attempting to start it results in ns_server crashing.
Steps to reproduce
- Spawn a completely new vagrant virtual machine running 7.0.0-4390 on Centos7.
- Stop and remove Couchbase.
systemctl stop couchbase-server
yum -y remove couchbase-server
- Download build 7.0.0-4390
wget http://172.23.126.166/builds/latestbuilds/couchbase-server/cheshire-cat/4390/couchbase-server-enterprise-7.0.0-4390-centos7.x86_64.rpm
- Reinstall and start Couchbase
yum -y install couchbase-server-enterprise-7.0.0-4390-centos7.x86_64.rpm
systemctl start couchbase-server
What happens:
The installation succeeds, but ns_server crashes. The output of 'systemctl status couchbase-server -l' shows the following:
systemctl status couchbase-server -l |
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: client_info: undefined
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: label: {gen_server,terminate}
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: last_message: {die,{abnormal,3}}
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: log: []
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: name: <0.120.0>
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: reason: {abnormal,3}
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: state: {state,ns_server,5000,-576460751640103199,undefined,infinity}
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.346049 crash_report #{label=>{proc_lib,crash},report=>[[{initial_call,{supervisor_cushion,init,['Argument__1']}},{pid,<0.120.0>},{registered_name,[]},{error_info,{exit,{abnormal,3},[{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,751}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}},{ancestors,[<0.119.0>,child_ns_server_sup,ns_babysitter_sup,<0.105.0>]},{message_queue_len,0},{messages,[]},{links,[<0.119.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,1598},{stack_size,27},{reductions,8769}],[]]}
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.346634 crash_report #{label=>{proc_lib,crash},report=>[[{initial_call,{erlang,apply,['Argument__1','Argument__2']}},{pid,<0.119.0>},{registered_name,[]},{error_info,{exit,{abnormal,3},[{restartable,loop,4,[{file,"src/restartable.erl"},{line,69}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}},{ancestors,[child_ns_server_sup,ns_babysitter_sup,<0.105.0>]},{message_queue_len,0},{messages,[]},{links,[<0.118.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,10958},{stack_size,27},{reductions,3018}],[]]}
|
Feb 10 13:22:44 node1-cheshire-cat-testing-centos7.vagrants couchbase[7748]: 2021-02-10 13:22:44.347059 supervisor_report #{label=>{supervisor,child_terminated},report=>[{supervisor,{local,child_ns_server_sup}},{errorContext,child_terminated},{reason,{abnormal,3}},{offender,[{pid,<0.119.0>},{id,{ns_server,"/opt/couchbase/lib/erlang/bin/erl",["+A","16","-smp","enable","+sbt","u","+P","327680","+K","true","+swt","low","+sbwt","none","+MMmcs","30","+e102400","-setcookie","nocookie","-kernel","logger","[{handler, default, undefined}]","-user","user_io","-run","child_erlang","child_start","ns_bootstrap","--","-smp","enable","-kernel","logger","[{handler, default, undefined}]","inetrc","\"/opt/couchbase/etc/couchbase/hosts.cfg\"","dist_config_file","\"/opt/couchbase/var/lib/couchbase/config/dist_cfg\"","-proto_dist","cb","-epmd_module","cb_epmd","-ssl_dist_optfile","/opt/couchbase/etc/couchbase/ssl_dist_opts","-couch_ini","/opt/couchbase/etc/couchdb/default.ini","/opt/couchbase/etc/couchdb/default.d/capi.ini","/opt/couchbase/etc/couchdb/default.d/geocouch.ini","/opt/couchbase/etc/couchdb/local.ini"],[{env,[{"NS_SERVER_BABYSITTER_PID","7748"},{"CHILD_ERLANG_ENV_ARGS","[{ns_babysitter,\n [{cookiefile,\n \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.cookie\"},\n {pidfile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.pid\"},\n {nodefile,\n \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.node\"}]},\n {ns_server,\n [{loglevel_stats,debug},\n {error_logger_mf_dir,\"/opt/couchbase/var/lib/couchbase/logs\"},\n {max_t,10},\n {loglevel_views,debug},\n {path_config_etcdir,\"/opt/couchbase/etc/couchbase\"},\n {path_config_secdir,\"/opt/couchbase/etc/security\"},\n {path_config_datadir,\"/opt/couchbase/var/lib/couchbase\"},\n {loglevel_cluster,debug},\n {loglevel_error_logger,debug},\n {loglevel_user,debug},\n {loglevel_ns_server,debug},\n {path_config_libdir,\"/opt/couchbase/lib\"},\n {loglevel_cbas,debug},\n {max_r,20},\n {config_path,\"/opt/couchbase/etc/couchbase/static_config\"},\n {disk_sink_opts_json_rpc,\n [{rotation,\n [{compress,true},\n {size,41943040},\n {num_files,2},\n {buffer_size_max,52428800}]}]},\n {loglevel_access,info},\n {loglevel_couchdb,info},\n {path_config_tmpdir,\"/opt/couchbase/var/lib/couchbase/tmp\"},\n {net_kernel_verbosity,10},\n {loglevel_rebalance,debug},\n {loglevel_default,debug},\n {nodefile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.node\"},\n {loglevel_xdcr,debug},\n {loglevel_ns_doctor,debug},\n {path_config_bindir,\"/opt/couchbase/bin\"},\n {loglevel_menelaus,debug},\n {disk_sink_opts,\n [{rotation,\n [{compress,true},\n {size,41943040},\n {num_files,10},\n {buffer_size_max,52428800}]}]},\n {loglevel_mapreduce_errors,debug}]}]"},{"ERL_CRASH_DUMP","erl_crash.dump.1612963342.7748.ns_server"}]},exit_status,use_stdio,stream,eof]}},{mfargs,{restartable,start_link,[{supervisor_cushion,start_link,[ns_server,5000,infinity,ns_port_server,start_link,[#Fun<ns_child_ports_sup.4.12324699>]]},86400000]}},{restart_type,permanent},{shutdown,infinity},{child_type,worker}]}]}
|
Furthermore, I could not make http requests to 'localhost:8091' using curl or wget.
The ns_server.error log file reports additional errors (see appendix).
A reason for why this may happen:
With help from Daniel Owen, I was able to narrow down what might be the cause.
Post uninstallation, the 'chronicle' folder is left behind:
[root@node1-cheshire-cat-testing-centos7 ~]# tree /opt/couchbase/var/lib/couchbase/config/chronicle/
|
/opt/couchbase/var/lib/couchbase/config/chronicle/
|
├── logs
|
│ └── 0.log
|
└── snapshots
|
Deleting this folder after the uninstall and before the reinstall prevents the crash after reinstallation.
rm -rf /opt/couchbase/var/lib/couchbase/config/chronicle/
|
What I expected to happen:
I expected ns_server to start successfully after reinstallation.
Logs:
Logs from vagrant (based on the steps to reproduce): output_vagrant.zip
I observed this issue on a test vm first, the logs for that can be found here: output_testing_vm.zip
Steps to reproduce in a video format:
Appendix
The last 100 entries from 'ns_server.error.log':
ns_server.error.log |
[ns_server:error,2021-02-10T13:19:21.744Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:19:36.748Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:19:36.748Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:19:51.423Z,ns_1@10.112.210.101:<0.12005.0>:service_agent:wait_for_connection_loop:351]No connection with label "index-service_api" after 60000ms. Exiting.
|
[ns_server:error,2021-02-10T13:19:51.424Z,ns_1@10.112.210.101:service_agent-index<0.12004.0>:service_agent:handle_info:283]Linked process <0.12005.0> died with reason {no_connection,
|
"index-service_api"}. Terminating
|
[ns_server:error,2021-02-10T13:19:51.424Z,ns_1@10.112.210.101:service_agent-index<0.12004.0>:service_agent:terminate:312]Terminating abnormally
|
[ns_server:error,2021-02-10T13:19:51.750Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:19:51.750Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:20:06.752Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:20:06.752Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:20:21.754Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:20:21.754Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:20:36.756Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:20:36.756Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:<0.14861.0>:service_agent:wait_for_connection_loop:351]No connection with label "index-service_api" after 60000ms. Exiting.
|
[ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:service_agent-index<0.14860.0>:service_agent:handle_info:283]Linked process <0.14861.0> died with reason {no_connection,
|
"index-service_api"}. Terminating
|
[ns_server:error,2021-02-10T13:20:51.427Z,ns_1@10.112.210.101:service_agent-index<0.14860.0>:service_agent:terminate:312]Terminating abnormally
|
[ns_server:error,2021-02-10T13:20:51.758Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:20:51.758Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:21:06.761Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:21:06.761Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:21:21.763Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,timeout}
|
[ns_server:error,2021-02-10T13:21:21.763Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:21:27.584Z,ns_1@10.112.210.101:service_agent-backup<0.505.0>:service_agent:handle_info:291]Lost json rpc connection for service backup, reason shutdown. Terminating.
|
[ns_server:error,2021-02-10T13:21:27.584Z,ns_1@10.112.210.101:service_agent-backup<0.505.0>:service_agent:terminate:312]Terminating abnormally
|
[ns_server:error,2021-02-10T13:21:27.595Z,ns_1@10.112.210.101:service_agent-eventing<0.509.0>:service_agent:handle_info:291]Lost json rpc connection for service eventing, reason shutdown. Terminating.
|
[ns_server:error,2021-02-10T13:21:27.596Z,ns_1@10.112.210.101:service_agent-eventing<0.509.0>:service_agent:terminate:312]Terminating abnormally
|
[ns_server:error,2021-02-10T13:21:27.601Z,ns_1@10.112.210.101:service_agent-fts<0.513.0>:service_agent:handle_info:291]Lost json rpc connection for service fts, reason shutdown. Terminating.
|
[ns_server:error,2021-02-10T13:21:27.601Z,ns_1@10.112.210.101:service_agent-fts<0.513.0>:service_agent:terminate:312]Terminating abnormally
|
[ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (indexer) getIndexStatus failed: {error,
|
{econnrefused,
|
[{lhttpc_client,send_request,1,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,220}]},
|
{lhttpc_client,execute,9,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,169}]},
|
{lhttpc_client,request,9,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,93}]}]}}
|
[ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper-index<0.608.0>:service_status_keeper:handle_cast:132]Service service_index returned incorrect status
|
[ns_server:error,2021-02-10T13:21:27.622Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (fts) api/nsstatus failed: {error,
|
{econnrefused,
|
[{lhttpc_client,send_request,1,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,220}]},
|
{lhttpc_client,execute,9,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,169}]},
|
{lhttpc_client,request,9,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,93}]}]}}
|
[ns_server:error,2021-02-10T13:21:27.623Z,ns_1@10.112.210.101:service_status_keeper-fts<0.613.0>:service_status_keeper:handle_cast:132]Service service_fts returned incorrect status
|
[ns_server:error,2021-02-10T13:21:27.626Z,ns_1@10.112.210.101:service_status_keeper_worker<0.607.0>:rest_utils:get_json:59]Request to (eventing) api/v1/functions failed: {error,
|
{econnrefused,
|
[{lhttpc_client,
|
send_request,1,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,220}]},
|
{lhttpc_client,execute,9,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,169}]},
|
{lhttpc_client,request,9,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
|
{line,93}]}]}}
|
[ns_server:error,2021-02-10T13:21:27.626Z,ns_1@10.112.210.101:service_status_keeper-eventing<0.618.0>:service_status_keeper:handle_cast:132]Service service_eventing returned incorrect status
|
[ns_server:error,2021-02-10T13:21:28.697Z,ns_1@10.112.210.101:<0.354.0>:ns_port_server:handle_info:157]Got unexpected exit signal from port: {'EXIT',<0.360.0>,normal}. Exiting.
|
[ns_server:error,2021-02-10T13:21:28.697Z,ns_1@10.112.210.101:prometheus_cfg<0.349.0>:prometheus_cfg:handle_info:376]Received exit from Prometheus port server - <0.354.0>: normal. Restarting Prometheus...
|
[ns_server:error,2021-02-10T13:21:29.082Z,ns_1@10.112.210.101:<0.19440.0>:prometheus:post_async:194]Prometheus http request failed:
|
URL: http://127.0.0.1:9123/api/v1/query
|
Body: query=%7Bname%3D~%60kv_curr_items%7Ckv_curr_items_tot%7Ckv_vb_replica_curr_items%7Ckv_mem_used_bytes%7Ccouch_docs_actual_disk_size%7Ccouch_views_actual_disk_size%7Ckv_ep_db_data_size_bytes%7Ckv_vb_active_num_non_resident%7Ckv_ep_bg_fetched%60%7D+or+label_replace%28sum+by+%28bucket%2C+name%29+%28irate%28kv_ops%7Bop%3D%60get%60%7D%5B1m%5D%29%29%2C+%60name%60%2C%60cmd_get%60%2C+%60%60%2C+%60%60%29+or+label_replace%28irate%28kv_ops%7Bop%3D%60get%60%2Cresult%3D%60hit%60%7D%5B1m%5D%29%2C%60name%60%2C%60get_hits%60%2C%60%60%2C%60%60%29+or+label_replace%28sum+by+%28bucket%29+%28irate%28kv_cmd_lookup%5B1m%5D%29+or+irate%28kv_ops%7Bop%3D~%60set%7Cincr%7Cdecr%7Cdelete%7Cdel_meta%7Cget_meta%7Cset_meta%7Cset_ret_meta%7Cdel_ret_meta%60%7D%5B1m%5D%29%29%2C+%60name%60%2C+%60ops%60%2C+%60%60%2C+%60%60%29+or+sum+by+%28bucket%2C+name%29+%28%7Bname%3D~%60index_data_size%7Cindex_disk_size%7Ccouch_spatial_data_size%7Ccouch_spatial_disk_size%7Ccouch_views_data_size%60%7D%29&timeout=5s
|
Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
|
{inet,[inet],econnrefused}]}
|
[ns_server:error,2021-02-10T13:21:29.131Z,ns_1@10.112.210.101:<0.19443.0>:prometheus:post_async:194]Prometheus http request failed:
|
URL: http://127.0.0.1:9123/api/v1/query
|
Body: query=%7Bcategory%3D%60system-processes%60%2Cname%3D~%60sysproc_mem_resident%7Csysproc_mem_size%7Csysproc_cpu_utilization%7Csysproc_major_faults_raw%60%7D&timeout=5s
|
Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
|
{inet,[inet],econnrefused}]}
|
[ns_server:error,2021-02-10T13:21:29.169Z,ns_1@10.112.210.101:<0.19448.0>:prometheus:post_async:194]Prometheus http request failed:
|
URL: http://127.0.0.1:9123/api/v1/query
|
Body: query=%7Bcategory%3D%60system%60%2Cname%3D~%60sys_cpu_utilization_rate%7Csys_cpu_stolen_rate%7Csys_swap_total%7Csys_swap_used%7Csys_mem_total%7Csys_mem_free%7Csys_mem_limit%7Csys_cpu_cores_available%7Csys_allocstall%60%7D&timeout=5s
|
Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
|
{inet,[inet],econnrefused}]}
|
[ns_server:error,2021-02-10T13:21:38.564Z,ns_1@10.112.210.101:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
|
[ns_server:error,2021-02-10T13:22:42.736Z,ns_1@127.0.0.1:prometheus_cfg<0.358.0>:prometheus_cfg:terminate:397]Terminate: shutdown
|
[ns_server:error,2021-02-10T13:23:02.447Z,ns_1@127.0.0.1:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
|
[ns_server:error,2021-02-10T13:23:22.128Z,ns_1@127.0.0.1:prometheus_cfg<0.349.0>:prometheus_cfg:terminate:397]Terminate: shutdown
|
[ns_server:error,2021-02-10T13:23:41.783Z,ns_1@127.0.0.1:prometheus_cfg<0.347.0>:prometheus_cfg:terminate:397]Terminate: shutdown
|
[ns_server:error,2021-02-10T13:24:01.892Z,ns_1@127.0.0.1:prometheus_cfg<0.352.0>:prometheus_cfg:terminate:397]Terminate: shutdown
|