Description
With Couchbase Server 7.1 now being the latest version on Docker (couchbase/server:latest), we have noticed our tests have been failing consistently due to bucket flush errors, reported in SGW logs as:
Error flushing bucket: {"_":"Flush failed with unexpected error. Check server logs for details."} Will retry. -- base.(*Collection).Flush.func1() at collection.go:631
|
This has happened locally and on Jenkins. I looked at the logs and noticed a lot of crash reports in the ns server debug logs:
=========================CRASH REPORT=========================
|
crasher:
|
initial call: misc:turn_into_gen_server/4
|
pid: <15922.25127.141>
|
registered_name: 'capi_set_view_manager-sg_int_1_1651663419116586663'
|
exception throw: {file_already_opened,
|
"/opt/couchbase/var/lib/couchbase/data/@indexes/sg_int_1_1651663419116586663/main_72cc6e6eba2986295f83acae24e19759.view.1"}
|
in function couch_set_view:get_group_server/2 (/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/couch_set_view/src/couch_set_view.erl, line 437)
|
in call from couch_set_view:define_group/4 (/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/couch_set_view/src/couch_set_view.erl, line 143)
|
in call from timer:tc/3 (timer.erl, line 197)
|
in call from capi_set_view_manager:maybe_define_group/2 (src/capi_set_view_manager.erl, line 292)
|
in call from capi_set_view_manager:'-init/1-lc$^1/1-0-'/2 (src/capi_set_view_manager.erl, line 175)
|
in call from capi_set_view_manager:init/1 (src/capi_set_view_manager.erl, line 176)
|
in call from misc:turn_into_gen_server/4 (src/misc.erl, line 503)
|
ancestors: [<0.11095.28>,
|
'single_bucket_kv_sup-sg_int_1_1651663419116586663',
|
ns_bucket_sup,ns_bucket_worker_sup,ns_server_sup,
|
ns_server_nodes_sup,<0.270.0>,ns_server_cluster_sup,
|
root_sup,<0.145.0>]
|
message_queue_len: 0
|
messages: []
|
links: [<0.11095.28>,<15922.25143.141>]
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 4185
|
stack_size: 29
|
reductions: 28307
|
neighbours:
|
as well as errors in the ns server error logs:
[ns_server:error,2022-05-04T11:28:22.050Z,ns_1@127.0.0.1:<0.8103.8>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
|
{path,
|
"/pools/default/buckets/sg_int_2_1651663419116586663/controller/doFlush"},
|
{method,'POST'},
|
{type,exit},
|
{what,
|
{{{badmatch,
|
{error,
|
{failed_nodes,['ns_1@127.0.0.1']}}},
|
[{ns_janitor,cleanup_apply_config_body,4,
|
[{file,"src/ns_janitor.erl"},
|
{line,295}]},
|
{ns_janitor,
|
'-cleanup_apply_config/4-fun-0-',4,
|
[{file,"src/ns_janitor.erl"},
|
{line,215}]},
|
{async,'-async_init/4-fun-1-',3,
|
[{file,"src/async.erl"},{line,191}]}]},
|
{gen_statem,call,
|
[{via,leader_registry,ns_orchestrator},
|
{flush_bucket,
|
"sg_int_2_1651663419116586663"},
|
infinity]}}},
|
{trace,
|
[{gen,do_call,4,
|
[{file,"gen.erl"},{line,220}]},
|
{gen,do_for_proc,2,
|
[{file,"gen.erl"},{line,381}]},
|
{gen_statem,call_dirty,4,
|
[{file,"gen_statem.erl"},{line,684}]},
|
{menelaus_web_buckets,
|
do_handle_bucket_flush,2,
|
[{file,"src/menelaus_web_buckets.erl"},
|
{line,703}]},
|
{request_tracker,request,2,
|
[{file,"src/request_tracker.erl"},
|
{line,40}]},
|
{menelaus_util,handle_request,2,
|
[{file,"src/menelaus_util.erl"},
|
{line,221}]},
|
{mochiweb_http,headers,6,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
|
{line,153}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,226}]}]}]
|
[ns_server:error,2022-05-04T11:28:22.274Z,ns_1@127.0.0.1:ns_doctor<0.882.0>:ns_doctor:update_status:303]The following buckets became not ready on node 'ns_1@127.0.0.1': ["sg_int_0_1651663419116586663",
|
"sg_int_2_1651663419116586663"], those of them are active ["sg_int_0_1651663419116586663",
|
"sg_int_2_1651663419116586663"]
|
and warnings in the memcached logs:
WARNING (sg_int_1_1651664472706223954) CouchKVStore::unlinkCouchFile: remove error:2, vb:446, rev:42, fname:/opt/couchbase/var/lib/couchbase/data/sg_int_1_1651664472706223954/446.couch.42.
The Docker image enterprise-7.0.3 and other versions have had similar errors in the past but only quite rarely and never so consistent for all tests. We are running only 1 node. Sync Gateway and cbcollect logs attached.
Could you please guide us as to what is going wrong and if it is a potential bug in CBS?
Attachments
Issue Links
- blocks
-
CBG-2063 Change Jenkins to use the latest version of CBS
- Resolved