EMPFAIL on 20MB x 30 upserts

Description

A recent regression, we are seeing failures upserting docs using the REST api, after some number of docs are upserted (in this example we are upserting 30 docs, and we see this failure after a dozen or so successful upserts. When we move on to the next doc it works, but then we start seeing the same failure on all keys after that:

n_0:

172.18.0.3 - couchbase [10/Mar/2022:10:02:44 -0800] "POST /pools/default/buckets/testBucket/docs/key-0 HTTP/1.1" 200 2 - "Apache-HttpClient/4.5.13 (Java/11)" 4283 ... 172.18.0.3 - couchbase [10/Mar/2022:10:03:22 -0800] "POST /pools/default/buckets/testBucket/docs/key-12 HTTP/1.1" 200 2 - "Apache-HttpClient/4.5.13 (Java/11)" 2724 172.18.0.3 - - [10/Mar/2022:10:03:25 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 1865 172.18.0.3 - - [10/Mar/2022:10:03:30 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2718 172.18.0.3 - - [10/Mar/2022:10:03:34 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2330 172.18.0.3 - - [10/Mar/2022:10:03:39 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2501 172.18.0.3 - - [10/Mar/2022:10:03:44 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2762 172.18.0.3 - - [10/Mar/2022:10:03:48 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2478 172.18.0.3 - - [10/Mar/2022:10:03:53 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2602 172.18.0.3 - - [10/Mar/2022:10:03:58 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2767 172.18.0.3 - - [10/Mar/2022:10:04:03 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2614 172.18.0.3 - - [10/Mar/2022:10:04:08 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2403 172.18.0.3 - couchbase [10/Mar/2022:10:04:10 -0800] "POST /pools/default/buckets/testBucket/docs/key-14 HTTP/1.1" 200 2 - "Apache-HttpClient/4.5.13 (Java/11)" 2315 172.18.0.3 - - [10/Mar/2022:10:04:13 -0800] "POST /pools/default/buckets/testBucket/docs/key-15 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2373 ...
[ns_server:error,2022-03-10T10:22:47.348-08:00,n_0@172.18.0.3:<0.24422.2>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed", {path, "/pools/default/buckets/testBucket/docs/key-21"}, {method,'POST'}, {type,error}, {what, {case_clause, {badrpc, {'EXIT', {function_clause, [{capi_crud,handle_mutation_rv, [{mc_header,1,134,0,0,0,0,0,undefined}, {mc_entry,undefined,undefined,0,0,0, undefined,0}], [{file,"src/capi_crud.erl"}, {line,28}]}, {capi_crud,set,6,[]}]}}}}}, {trace, [{menelaus_web_crud,handle_post,4, [{file,"src/menelaus_web_crud.erl"}, {line,334}]}, {request_tracker,request,2, [{file,"src/request_tracker.erl"}, {line,40}]}, {menelaus_util,handle_request,2, [{file,"src/menelaus_util.erl"}, {line,221}]}, {mochiweb_http,headers,6, [{file, "/home/couchbase/jenkins/workspace/cbas-cbcluster-stress-oraclejdk11/couchdb/src/mochiweb/mochiweb_http.erl"}, {line,153}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}]

Affects versions

Fix versions

None

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Attachments

3
  • 12 Mar 2022, 04:31 AM
  • 12 Mar 2022, 04:31 AM
  • 12 Mar 2022, 04:31 AM

Activity

Show:

Michael Blow March 15, 2022 at 8:20 PM

Verified that on recent Neo manifests, the regression observed by Analytics is gone.

Dave Rigby March 15, 2022 at 3:31 PM

FYI this should be fixed in 7.1.0-2485.

Dave Rigby March 14, 2022 at 2:11 PM

Thanks for confirming. SHAs would also work, but given the CV job logs have wrapped it's somewhat moot now 😉

Michael Blow March 14, 2022 at 12:35 PM

>> Michael Blow Do you have any specific build numbers when this issue started to occur?

, just to close the loop- I do not have any specific build numbers as we no longer have any 7.1 Jenkins runs before the regression started, as we are not able to keep a large number of runs due to disk limitations placed on us by the build team.

We would only be able to provide SHAs in any event, as these are all manifest-based (i.e. not installer) tests.

Dave Rigby March 14, 2022 at 10:30 AM

As per comments on https://couchbasecloud.atlassian.net/browse/MB-51408#icft=MB-51408, this does appear to be an issue triggered by changes to tlm to disable what should have been assert-only code, however the disabled code incorrectly had side-effects.

Paolo is addressing the issue via https://couchbasecloud.atlassian.net/browse/MB-51408#icft=MB-51408, so closing this as a duplicate.

Duplicate
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Michael Blow

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

1

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created March 12, 2022 at 4:31 AM
Updated August 31, 2024 at 11:07 AM
Resolved March 14, 2022 at 10:30 AM
Instabug