Description
I am seeing failures with many Eventing workers and 25M+ docs
I create an Eventing function "test_update_2" (attached) with an alias of "bdp_vardata" to a bucket called "crondata" (Memory Quota 7.9GB) and have 64 workers with the following source code:
function OnUpdate(doc, meta) {
|
var maxattempt = 2;
|
for (var tries=1; tries<=maxattempt; tries++) {
|
try {
|
var doc = bdp_vardata[meta.id];
|
doc.random = Math.random();
|
bdp_vardata[meta.id] = doc;
|
break;
|
} catch (e) {
|
if (tries === maxattempt)
|
log("attempt "+ tries + " error occured during deletion :: ",
|
e, " for id ", meta.id);
|
}
|
}
|
}
|
The source bucket and the bucket that is updated is "crondata" in addition there is a 100MB Eventing meta data bucket "metadata"
I load 25,528,448 document into crondata with a KEYs like todelete01::100006 and data like
{
|
"type": "vbs_seed",
|
"id": 100006,
|
}
|
once the eventing function runs all documents in bucket "crondata" will be enriched with a new field called "random"
{
|
"type": "vbs_seed",
|
"id": 100006,
|
"random": 0.22187920300189878
|
}
|
The single node server
When I run Eventing on my 12 core 2.1Ghz 64 MB Xeon
uname -a
Linux couch01 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux
/opt/couchbase/bin/couchbase-server -v
Couchbase Server 6.6.0-7883 (EE)
Configured with Eventing 256 RAM, Data 7900 MB RAM no other services
The Issue
The system will process about 7.6 Million doc (mutations) and then I will get LCB_ETMPFAIL errors.
2020-07-15T18:52:46.795-07:00 [INFO] "attempt 2 error occured during deletion :: " {"message":{"code":392,"desc":"Temporary failure received from server. Try again later","name":"LCB_ETMPFAIL"},"stack":"Error\n at OnUpdate (test_update_2.js:10:35)"} " for id " "todelete22::63364"
|
A work around
In the UI simply pause then resume every 7 million rows. This proves that Eventing can process the data with 64 workers but something odd is happening where we don't honor some sort of resource constraint.
I believe I also have no issues if I set the workers down form sixty-four (64) to just three (3) workers
I have prepared a video showing exactly how it fails hopefully the video and the uploaded Eventing function will help track down the root cause.
Attachments
Issue Links
- backports to
-
MB-40518 [BP MB-40498] - Eventing is not retrying bucket ops failures like ETMPFAIL that can be retried
- Closed
For Gerrit Dashboard: MB-40498 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
132704,5 | MB-40498 : check retriability of LCB errors properly | unstable | eventing | Status: MERGED | +2 | +1 |