Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
Cheshire-Cat
-
Untriaged
-
Hides3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.101.zip
s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.102.zip
s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.103.zip
s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.104.zip
s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.105.zip
Enterprise Edition 7.0.0 build 5102:
s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.101.zip
s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.102.zip
s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.103.zip
s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.104.zip
s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.105.zip
Latest logs
s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.101.zip
s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.102.zip
s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.103.zip
s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.104.zip
s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.105.zipShows3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.101.zip s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.102.zip s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.103.zip s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.104.zip s3://cb-customers-secure/mb-45907--build-5092/2021-05-05/collectinfo-2021-05-05t053740-ns_1@10.112.206.105.zip Enterprise Edition 7.0.0 build 5102: s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.101.zip s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.102.zip s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.103.zip s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.104.zip s3://cb-customers-secure/mb-45907--build-5102/2021-05-05/collectinfo-2021-05-05t085210-ns_1@10.112.206.105.zip Latest logs s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.101.zip s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.102.zip s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.103.zip s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.104.zip s3://cb-customers-secure/mb-45907--build-5102_latest/2021-05-05/collectinfo-2021-05-05t154607-ns_1@10.112.206.105.zip -
1
-
Unknown
Description
Build: 7.0.0-5017
Test
./testrunner -i /tmp/testexec.18313.ini -p get-cbcollect-info=True,GROUP=bucket_op,skip_log_scan=False,bucket_storage=couchstore -t eventing.eventing_rebalance_collection.EventingRebalanceCollection.test_eventing_rebalance_with_multiple_kv_nodes,doc-per-day=20,dataset=default,nodes_init=5,services_init=kv-kv-kv-eventing-index:n1ql,groups=simple,reset_services=True,GROUP=bucket_op,java_sdk_client=True,get-cbcollect-info=True,skip_log_scan=False,bucket_storage=couchstore
|
|
|
Exception: missing data in destination bucket. Current : 94233 Expected : 120960 dcp_backlog : 0 TIMERS_IN_PAST : 0 lcb_exceptions : {'172.23.122.171': {}, '172.23.122.172': {}}
|
Seeing no lcb exceptions in the stats
Attachments
- mb-45907_testlogs.log
- 579 kB
- mb-54907_stats
- 67 kB
- Test_logs.txt
- 1.02 MB
- test_output.log
- 398 kB
Issue Links
- depends on
-
MB-44324 1 Timer callback failed with LCB_ERR_BUCKET_NOT_FOUND
-
- Closed
-
Activity
Vinayaka Kamath why we are not seeing these errors in lcb stats? Do we have a regression there?
Vikas Chaudhary lcb_exception_stats are non zero
2021-04-24T21:49:37.903-07:00 [Info] Consumer::processStatsEvents [worker_sbm_0:/tmp/127.0.0.1:8091_0_2698172145.sock:107807] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:510 DCP_MUTATION:2919 V8 events: thr_map:1 latency_stats:600 curl_latency_stats:600 *lcb_exception_stats:600* log_level:1 thr_count:1 v8_init:1 v8_load:1 vb_map:2 execution_stats:600 failure_stats:600 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:49:50.075-07:00 [Info] Consumer::processStatsEvents [worker_bucket_op_0:/tmp/127.0.0.1:8091_0_3698201870.sock:107682] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: curl_latency_stats:660 log_level:1 v8_init:1 vb_map:2 execution_stats:660 failure_stats:660 latency_stats:660 thr_map:1 thr_count:1 v8_load:1 *lcb_exception_stats:660* Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:49:53.898-07:00 [Info] Consumer::processStatsEvents [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: vb_map:2 execution_stats:600 lcb_exception_stats:600 curl_latency_stats:600 log_level:1 thr_count:1 v8_init:1 v8_load:1 failure_stats:600 latency_stats:600 thr_map:1 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:50:06.010-07:00 [Info] Consumer::processStatsEvents [worker_curl_0:/tmp/127.0.0.1:8091_0_2876695861.sock:107709] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: v8_load:1 vb_map:2 execution_stats:660 failure_stats:660 latency_stats:660 curl_latency_stats:660 thr_count:1 thr_map:1 v8_init:1 lcb_exception_stats:660 log_level:1 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:50:21.942-07:00 [Info] Consumer::processStatsEvents [worker_n1ql_0:/tmp/127.0.0.1:8091_0_15129118.sock:107782] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: v8_load:1 vb_map:2 latency_stats:660 curl_latency_stats:660 thr_map:1 thr_count:1 v8_init:1 lcb_exception_stats:660 log_level:1 execution_stats:660 failure_stats:660 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:50:37.903-07:00 [Info] Consumer::processStatsEvents [worker_sbm_0:/tmp/127.0.0.1:8091_0_2698172145.sock:107807] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:510 DCP_MUTATION:2919 V8 events: vb_map:2 execution_stats:660 failure_stats:660 2021-04-24T21:49:37.903-07:00 [Info] Consumer::processStatsEvents [worker_sbm_0:/tmp/127.0.0.1:8091_0_2698172145.sock:107807] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:510 DCP_MUTATION:2919 V8 events: thr_map:1 latency_stats:600 curl_latency_stats:600 lcb_exception_stats:600 log_level:1 thr_count:1 v8_init:1 v8_load:1 vb_map:2 execution_stats:600 failure_stats:600 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:49:50.075-07:00 [Info] Consumer::processStatsEvents [worker_bucket_op_0:/tmp/127.0.0.1:8091_0_3698201870.sock:107682] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: curl_latency_stats:660 log_level:1 v8_init:1 vb_map:2 execution_stats:660 failure_stats:660 latency_stats:660 thr_map:1 thr_count:1 v8_load:1 lcb_exception_stats:660 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:49:53.898-07:00 [Info] Consumer::processStatsEvents [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: vb_map:2 execution_stats:600 lcb_exception_stats:600 curl_latency_stats:600 log_level:1 thr_count:1 v8_init:1 v8_load:1 failure_stats:600 latency_stats:600 thr_map:1 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:50:06.010-07:00 [Info] Consumer::processStatsEvents [worker_curl_0:/tmp/127.0.0.1:8091_0_2876695861.sock:107709] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: v8_load:1 vb_map:2 execution_stats:660 failure_stats:660 latency_stats:660 curl_latency_stats:660 thr_count:1 thr_map:1 v8_init:1 lcb_exception_stats:660 log_level:1 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:50:21.942-07:00 [Info] Consumer::processStatsEvents [worker_n1ql_0:/tmp/127.0.0.1:8091_0_15129118.sock:107782] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:512 DCP_MUTATION:26982 V8 events: v8_load:1 vb_map:2 latency_stats:660 curl_latency_stats:660 thr_map:1 thr_count:1 v8_init:1 lcb_exception_stats:660 log_level:1 execution_stats:660 failure_stats:660 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
2021-04-24T21:50:37.903-07:00 [Info] Consumer::processStatsEvents [worker_sbm_0:/tmp/127.0.0.1:8091_0_2698172145.sock:107807] DCP events: DCP_STREAMREQ:512 DCP_SNAPSHOT:510 DCP_MUTATION:2919 V8 events: vb_map:2 execution_stats:660 failure_stats:660 lcb_exception_stats:660 log_level:1 thr_count:1 v8_init:1 v8_load:1 thr_map:1 latency_stats:660 curl_latency_stats:660 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]log_level:1 thr_count:1 v8_init:1 v8_load:1 thr_map:1 latency_stats:660 curl_latency_stats:660 Timer events: Doc: 0, vbs owned len: 512 vbs owned: [512-1023]
|
I see the same values in Test_logs.txt file in the attachment too. For example
"event_processing_stats": {
|
"adhoc_timer_response_received": 2,
|
"agg_messages_sent_to_worker": 4370,
|
"agg_queue_memory": 0,
|
"agg_queue_memory_cap": 53477376,
|
"agg_queue_size": 0,
|
"agg_queue_size_cap": 100000,
|
"agg_timer_feedback_queue_cap": 500,
|
"dcp_mutation": 26982,
|
"dcp_snapshot": 512,
|
"dcp_stream_req_counter": 512,
|
"dcp_streamreq": 512,
|
"execution_stats": 667,
|
"failure_stats": 667,
|
"latency_stats": 667,
|
"lcb_exception_stats": 667,
|
"log_level": 1,
|
"num_processed_events": 512,
|
"processed_events_size": 49152,
|
"thr_count": 1,
|
"thr_map": 1,
|
"v8_init": 1,
|
"v8_load": 1
|
},
|
Edit: Sorry, I think I misunderstood your question there Vikas, I am looking into it right now. Will update the ticket with information. Thanks!
Vinayaka Kamath we have exception details as , it contains error codes as well
"lcb_exception_stats": {},
|
Vikas Chaudhary Can you run this test with the latest build? Please close the ticket if it's not present in the latest run. Eventing has forwarded the manifest to libcouchbase 3.1.2.
Coming back to the lcb_exception_stats, it seems like all the LCB_ERR_BUCKET_NOT_FOUND are actually logged by libcouchbase and none by eventing. Seems like they are struck in a retry loop from the logs. The latest build should have the fix for that too.
Vinayaka Kamath it passed on rerun. Why "lcb_exception_stats": {} are empty. can you confirm there is no issue with stats?
Build couchbase-server-7.0.0-5092 contains eventing commit 42dcdf8 with commit message:
MB-45907: Update LCB stats when Bucket Connect Fails
Verified with Enterprise Edition 7.0.0 build 5092
I am still seeing below exception
Exception: missing data in destination bucket. Current : 0 Expected : 40320 dcp_backlog : 0 TIMERS_IN_PAST : 0 lcb_exceptions : {'10.112.206.104': {}}
In the logs, I see that the bucket is getting deleted(or SDKs are returning that the bucket is deleted) about the same time as undeploy of n1ql and timers handlers.
2021-05-04T22:11:11.111-07:00 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: n1ql current state: 1 requested status for deployment: false processing: false
|
2021-05-04T22:11:11.111-07:00 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: n1ql Begin undeploy process. Current state: 1
|
2021-05-04T22:11:11.111-07:00 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: n1ql enabled, settings change requesting undeployment
|
2021-05-04T22:11:11.111-07:00 [Info] SuperSupervisor::deleteFromLocallyDeployedApps [0] Function: n1ql deleting from locally deployed apps map
|
2021-05-04T22:11:11.111-07:00 [Info] SuperSupervisor::undeployFunctionCallback [0] Function: timers response from server: resp: &{200 OK 200 HTTP/1.1 1 1 map[Content-Length:[0] Date:[Wed, 05 May 2021 05:11:11 GMT] Status:[0]] 0xc00105e3c0 0 [] false false map[] 0xc0004c2e00 <nil>}
|
2021-05-04T22:11:11.121-07:00 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: timers Begin undeploy process. Current state: 1
|
2021-05-04T22:11:11.121-07:00 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: timers enabled, settings change requesting undeployment
|
2021-05-04T22:11:11.121-07:00 [Info] SuperSupervisor::deleteFromLocallyDeployedApps [0] Function: timers deleting from locally deployed apps map
|
2021-05-04T22:11:11.129-07:00 [Info] SuperSupervisor::deleteFromDeployedApps [0] Function: timers deleting from deployed apps map
|
2021-05-04T22:11:11.129-07:00 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: timers undeployment done
|
2021-05-04T22:11:11.253-07:00 [Error] Unable to connect with bucket "src_bucket"
|
2021-05-04T22:11:11.253-07:00 [Error] Consumer::dcpEventsRemainingToProcess [worker_n1ql_0:/tmp/127.0.0.1:8091_0_3213650520.sock:19323] Failed to fetch get_all_vb_seqnos, err: No bucket named src_bucket
|
2021-05-04T22:11:11.259-07:00 [Error] Unable to connect with bucket "src_bucket"
|
2021-05-04T22:11:11.259-07:00 [Error] Consumer::dcpEventsRemainingToProcess [worker_sbm_0:/tmp/127.0.0.1:8091_0_2795806267.sock:19284] Failed to fetch get_all_vb_seqnos, err: No bucket named src_bucket
|
This is followed by a lot of connection failures to the source bucket in both gocb and lcb:
2021-05-04T22:11:11.462-07:00 [Error] Unable to connect with bucket "src_bucket"
|
2021-05-04T22:11:11.463-07:00 [Error] Consumer::dcpEventsRemainingToProcess [worker_bucket_op_0:/tmp/127.0.0.1:8091_0_1017459917.sock:19243] Failed to fetch get_all_vb_seqnos, err: No bucket named src_bucket
|
2021-05-04T22:11:11.642-07:00 [Info] serviceChangeNotifier: received PoolChangeNotification
|
2021-05-04T22:11:11.660-07:00 [Error] Unable to connect with bucket "src_bucket"
|
2021-05-04T22:11:11.660-07:00 [Error] Consumer::dcpEventsRemainingToProcess [worker_curl_0:/tmp/127.0.0.1:8091_0_31498936.sock:19304] Failed to fetch get_all_vb_seqnos, err: No bucket named src_bucket
|
2021-05-04T22:11:12.296-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:961 I:3637500671] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f16301a04c0,IX=2) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
|
2021-05-04T22:11:12.297-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:961 I:3637500671] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f163019ebb0,IX=1) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
|
2021-05-04T22:11:12.297-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:961 I:2692714925] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f16300e6620,IX=2) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
|
2021-05-04T22:11:12.305-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:961 I:2692714925] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f16300e5c40,IX=1) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
|
2021-05-04T22:11:12.306-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:961 I:3637500671] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f16301a6090,IX=0) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
|
2021-05-04T22:11:12.306-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:961 I:2692714925] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f163004ef70,IX=0) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
|
We can also see that the current connections we had opened with the buckets have also been shutdown:
2021-05-04T22:11:12.273-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:1108 I:3637500671] <10.112.206.102:11210> (CTX=0x7f1630129570,memcached,SRV=0x7f163019ebb0,IX=1) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
|
2021-05-04T22:11:12.273-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,connection L:485 I:3637500671] <10.112.206.102:11210> (SOCK=ebb2360296b71299) Starting. Timeout=5000000us
|
2021-05-04T22:11:12.273-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:1108 I:3637500671] <10.112.206.103:11210> (CTX=0x7f163023a7f0,memcached,SRV=0x7f16301a04c0,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
|
2021-05-04T22:11:12.273-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,connection L:485 I:3637500671] <10.112.206.103:11210> (SOCK=34cd6a6ec17d57c1) Starting. Timeout=5000000us
|
2021-05-04T22:11:12.274-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:1108 I:2692714925] <10.112.206.102:11210> (CTX=0x7f1630109950,memcached,SRV=0x7f16300e5c40,IX=1) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
|
2021-05-04T22:11:12.274-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,connection L:485 I:2692714925] <10.112.206.102:11210> (SOCK=838e9f3e985013f0) Starting. Timeout=5000000us
|
2021-05-04T22:11:12.274-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2569735366.sock:19262] [lcb,server L:1108 I:2692714925] <10.112.206.103:11210> (CTX=0x7f16301e4ef0,memcached,SRV=0x7f16300e6620,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
|
Concurrently, I see a shutdown in the memcached logs too
2021-05-04T22:11:40.266407-07:00 INFO Received shutdown request
|
2021-05-04T22:11:40.266440-07:00 INFO Initiating graceful shutdown.
|
2021-05-04T22:11:40.266442-07:00 INFO Stop all buckets
|
2021-05-04T22:11:40.266448-07:00 INFO Shutting down Prometheus exporter
|
2021-05-04T22:11:40.317389-07:00 INFO Shutting down audit daemon
|
2021-05-04T22:11:40.317896-07:00 INFO Shutting down client worker threads
|
2021-05-04T22:11:40.317954-07:00 INFO Stopping worker thread 0
|
2021-05-04T22:11:40.318053-07:00 INFO Stopping worker thread 1
|
2021-05-04T22:11:40.318156-07:00 INFO Stopping worker thread 2
|
2021-05-04T22:11:40.318206-07:00 INFO Stopping worker thread 3
|
2021-05-04T22:11:40.319026-07:00 INFO Releasing server sockets
|
2021-05-04T22:11:40.319077-07:00 INFO Shutting down IPv4 interface: 0.0.0.0:11210
|
2021-05-04T22:11:40.319118-07:00 INFO Shutting down IPv4 interface: 0.0.0.0:11209
|
2021-05-04T22:11:40.319192-07:00 INFO Shutting down IPv4 interface: 0.0.0.0:11207
|
2021-05-04T22:11:40.319197-07:00 INFO Shutting down IPv4 interface: 0.0.0.0:11206
|
2021-05-04T22:11:40.319201-07:00 INFO Shutting down IPv6 interface: [::]:11210
|
2021-05-04T22:11:40.319206-07:00 INFO Shutting down IPv6 interface: [::]:11209
|
2021-05-04T22:11:40.319210-07:00 INFO Shutting down IPv6 interface: [::]:11207
|
2021-05-04T22:11:40.319213-07:00 INFO Shutting down IPv6 interface: [::]:11206
|
2021-05-04T22:11:40.319219-07:00 INFO Releasing bucket resources
|
2021-05-04T22:11:40.319240-07:00 INFO Shutting down RBAC subsystem
|
2021-05-04T22:11:40.319246-07:00 INFO Releasing thread resources
|
2021-05-04T22:11:40.319289-07:00 INFO Shutting down executor pool
|
2021-05-04T22:11:40.319813-07:00 INFO Releasing signal handlers
|
2021-05-04T22:11:40.319819-07:00 INFO Shutting down external authentication service
|
2021-05-04T22:11:40.319894-07:00 INFO Release cached SSL contexts
|
2021-05-04T22:11:40.319896-07:00 INFO Shutting down SASL server
|
2021-05-04T22:11:40.319897-07:00 INFO Deinitialising tracing
|
2021-05-04T22:11:40.319983-07:00 INFO Shutting down all engines
|
2021-05-04T22:11:40.320060-07:00 INFO Removing breakpad
|
2021-05-04T22:11:40.320066-07:00 INFO Shutting down OpenSSL
|
2021-05-04T22:11:40.320073-07:00 INFO Shutting down libevent
|
2021-05-04T22:11:40.320084-07:00 INFO Shutting down logger extension
|
2021-05-04T22:11:40.320295-07:00 INFO ---------- Closing logfile
|
2021-05-04T22:11:40.353242-07:00 INFO ---------- Opening logfile: /opt/couchbase/var/lib/couchbase/logs/memcached.log.000000.txt
|
2021-05-04T22:11:40.353969-07:00 INFO Couchbase version 7.0.0-5092 starting.
|
2021-05-04T22:11:40.353983-07:00 INFO recalculate_max_connections: {"max_fds":200000,"max_connections":65000,"system_connections":5000,"engine_fds":133982}
|
2021-05-04T22:11:40.354027-07:00 INFO Breakpad enabled. Minidumps will be written to '/opt/couchbase/var/lib/couchbase/crash'
|
2021-05-04T22:11:40.354222-07:00 INFO Using SLA configuration: {"COMPACT_DB":{"slow":"1800 s"},"CREATE_BUCKET":{"slow":"5 s"},"DELETE_BUCKET":{"slow":"10 s"},"SELECT_BUCKET":{"slow":"10 ms"},"SEQNO_PERSISTENCE":{"slow":"30 s"},"comment":"Current MCBP SLA configuration","default":{"slow":"500 ms"},"version":1}
|
I see a PrepareTopologyChange after I start seeing the LCB_ERR_BUCKET_NOT_FOUND errors
2021-05-04T22:11:12.456-07:00 [Info] ServiceMgr::PrepareTopologyChange change: service.TopologyChange{ID:"a638566cbdd32a41b90cb11b7db11679", CurrentTopologyRev:service.Revision(nil), Type:"topology-change-rebalance", KeepNodes:[]struct { NodeInfo service.NodeInfo "json:\"nodeInfo\""; RecoveryType service.RecoveryType "json:\"recoveryType\"" }{}, EjectNodes:[]service.NodeInfo{service.NodeInfo{NodeID:"88674c4c66bb1af41616ae62f507d464", Priority:0, Opaque:interface {}(nil)}}}
|
2021-05-04T22:11:12.456-07:00 [Info] SuperSupervisor::RebalanceStatus [0] Rebalance status from all running applications: map[string]bool{}
|
2021-05-04T22:11:12.456-07:00 [Info] ServiceMgr::PrepareTopologyChange ejectNodeUUIDs: [88674c4c66bb1af41616ae62f507d464] keepNodeUUIDs: []
|
2021-05-04T22:11:12.456-07:00 [Info] ServiceMgr::PrepareTopologyChange m.rebalanceID: a638566cbdd32a41b90cb11b7db11679 m.servers: [88674c4c66bb1af41616ae62f507d464]
|
2021-05-04T22:11:12.456-07:00 [Error] SuperSupervisor::NotifyPrepareTopologyChange [0] 0 eventing nodes supplied as keepNodes
|
2021-05-04T22:11:12.456-07:00 [Info] ServiceMgr::PrepareTopologyChange completed, isBalanced: true
|
Chanabasappa Ghali Would it be possible to get the output log of the test attached to the ticket to get a better idea of what was going on?
Chanabasappa Ghali is it possible to get a quick rerun on build 5102, we made recent changes to LCB command retry on it.
With Enterprise Edition 7.0.0 build 5102, same exception
Exception: missing data in destination bucket. Current : 0 Expected : 40320 dcp_backlog : 0 TIMERS_IN_PAST : 0 lcb_exceptions : {'10.112.206.104': {}}
Chanabasappa Ghali, In the latest run, mutation phase of test failed with unable to access jar file
Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar
|
2021-05-05 10:56:40 | INFO | MainProcess | Cluster_Thread | [task.execute_for_collection] java -jar java_sdk_client/collections/target/javaclient/javaclient.jar -i 10.112.206.101 -u Administrator -p password -b src_bucket -s _default -c _default -n 40320 -pc 100 -pu 0 -pd 0 -l uniform -dsn 1 -dpx doc_ -dt Person -de 0 -ds 500 -ac False -st 0 -en 40319 -o False -sd False Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar [<FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 890 in _bootstrap>, <FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 932 in _bootstrap_inner>, <FrameSummary file lib/tasks/taskmanager.py, line 34 in run>, <FrameSummary file lib/tasks/task.py, line 102 in step>, <FrameSummary file lib/tasks/task.py, line 6436 in execute>, <FrameSummary file lib/tasks/task.py, line 6424 in execute_for_collection>, <FrameSummary file lib/tasks/future.py, line 265 in set_exception>] Wed May 5 10:56:40 2021 2021-05-05 10:56:43 | INFO | MainProcess | test_thread | [rest_client.rebalance_reached] rebalance reached >100% in 3.006394863128662 seconds 2021-05-05 10:57:10 | INFO | MainProcess | Cluster_Thread | [task.execute_for_collection] java -jar java_sdk_client/collections/target/javaclient/javaclient.jar -i 10.112.206.101 -u Administrator -p password -b src_bucket -s scope_1 -c coll_1 -n 40320 -pc 100 -pu 0 -pd 0 -l uniform -dsn 1 -dpx doc_ -dt Person -de 0 -ds 500 -ac False -st 0 -en 40319 -o False -sd False Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar [<FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 890 in _bootstrap>, <FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 932 in _bootstrap_inner>, <FrameSummary file lib/tasks/taskmanager.py, line 34 in run>, <FrameSummary file lib/tasks/task.py, line 102 in step>, <FrameSummary file lib/tasks/task.py, line 6436 in execute>, <FrameSummary file lib/tasks/task.py, line 6424 in execute_for_collection>, <FrameSummary file lib/tasks/future.py, line 265 in set_exception>]
|
Could you please re run this test after fixing that error.
Exception: missing data in destination bucket. Current : 38032 Expected : 40320 dcp_backlog : 0 TIMERS_IN_PAST : 0 lcb_exceptions : {'10.112.206.104': {}}
Looking at test log the data mismatch happened for curl test handler.
2021-05-05 21:17:00 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code dst_bucket.scope_1.coll_3 to complete bucket operations... Current : 38032 Expected : 40320 ...
|
If there is some exception while sending/receiving request then function skips writing to destination scope.
function OnUpdate(doc, meta) {
|
var request = {
|
path : 'job/test_suite_executor/api/json?tree=jobs[component]'
|
}
|
try {
|
var response = curl("GET", server, request);
|
log('response body received from server:', response.body);
|
log('response headers received from server:', response.headers);
|
log('response status received from server:', response.status);
|
var res= new Uint8Array(response.body);
|
if(response.status == 200){
|
var result= couchbase.insert(dst_bucket,meta,response.body);
|
log(result);
|
}
|
else{
|
var result= couchbase.insert(dst_bucket,meta,response.status);
|
log(result);
|
}
|
}
|
catch (e) {
|
log('error:', e);
|
}
|
}
|
|
|
|
function OnDelete(meta) {
|
var request = {
|
path : 'job/test_suite_executor/api/json?tree=jobs[component]'
|
};
|
try {
|
var response = curl("GET", server, request);
|
log('response body received from server:', response.body);
|
log('response headers received from server:', response.headers);
|
log('response status received from server:', response.status);
|
var res= new Uint8Array(response.body);
|
var doc_meta={"id":meta.id};
|
var result = couchbase.delete(dst_bucket,doc_meta);
|
log(result);
|
}
|
catch (e) {
|
log('error:', e);
|
}
|
}
|
Looking at the curl function stats eventing executed all 40320 mutations and since function is in try catch block it won't increment on_update_failure if there is some exception. From stats there are curl failure count as 2286 and curl timeout 2 which gives total of 38032 mutations written to dst collection.
"app_worker_setting_events_lost": 0,
|
"bkt_ops_cas_mismatch_count": 0,
|
"bucket_op_exception_count": 0,
|
"checkpoint_failure_count": 0,
|
"curl_failure_count": 2286,
|
"curl_max_resp_size_exceeded": 0,
|
"curl_non_200_response": 0,
|
"curl_timeout_count": 2,
|
"dcp_events_lost": 0,
|
"debugger_events_lost": 0,
|
"delete_events_lost": 0,
|
"mutation_events_lost": 0,
|
"n1ql_op_exception_count": 0,
|
"timeout_count": 0,
|
"timer_callback_missing_counter": 0,
|
"timer_context_size_exceeded_counter": 0,
|
"timer_events_lost": 0
|
Function is logging curl error to application logs.
Chanabasappa Ghali Could you please check application logs of curl function to check why these curl failure happened?
One such error is unable to resolve host.
"last_log": "<ud>\"error:\" {\"message\":\"Unable to perform the request: Couldn't resolve host name\",\"stack\":\"Error\\n at OnUpdate (curl.js:7:21)\"} </ud>"
|
I see LCB_ERR_BUCKET_NOT_FOUND errors on node 172.23.122.171
2021-04-24T21:50:47.230-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:113727] [lcb,cccp L:187 I:1872108005] <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.230-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:113727] [lcb,confmon L:211 I:1872108005] Provider 'CCCP' failed: LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.231-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:113727] [lcb,server L:961 I:1489768050] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7faed00dc870,IX=0) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
2021-04-24T21:50:47.231-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:113727] [lcb,server L:818 I:1489768050] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7faed00dc870,IX=0) Failing command (pkt=0x7faecc0238f0, opaque=12413, opcode=0x0) with error LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.236-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:113727] [lcb,server L:961 I:1872108005] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7faed0126e90,IX=0) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
2021-04-24T21:50:47.236-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:113727] [lcb,server L:818 I:1872108005] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7faed0126e90,IX=0) Failing command (pkt=0x7faec802d740, opaque=12413, opcode=0x0) with error LCB_ERR_BUCKET_NOT_FOUND (210)
and on node 172.23.122.172
2021-04-24T21:50:47.517-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,server L:961 I:2437775077] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f56280dc870,IX=0) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
2021-04-24T21:50:47.517-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,server L:818 I:2437775077] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f56280dc870,IX=0) Failing command (pkt=0x7f5620035fa0, opaque=12449, opcode=0x0) with error LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.517-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,server L:818 I:2437775077] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f56280dc870,IX=0) Failing command (pkt=0x7f5620036010, opaque=12450, opcode=0xb5) with error LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.517-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,cccp L:187 I:2437775077] <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.517-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,confmon L:211 I:2437775077] Provider 'CCCP' failed: LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.534-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,server L:961 I:3930440849] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f5628126e90,IX=0) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
2021-04-24T21:50:47.534-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,server L:818 I:3930440849] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f5628126e90,IX=0) Failing command (pkt=0x7f5618025620, opaque=12449, opcode=0x0) with error LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.534-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,server L:818 I:3930440849] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f5628126e90,IX=0) Failing command (pkt=0x7f5618025690, opaque=12450, opcode=0xb5) with error LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.534-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,cccp L:187 I:3930440849] <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_BUCKET_NOT_FOUND (210)
2021-04-24T21:50:47.534-07:00 [Info] eventing-consumer [worker_timers_0:/tmp/127.0.0.1:8091_0_2026117502.sock:107835] [lcb,confmon L:211 I:3930440849] Provider 'CCCP' failed: LCB_ERR_BUCKET_NOT_FOUND (210)
Clearly related to
MB-44324