Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 6.6.1
Affects Version/s: 6.6.0
Component/s: eventing
Labels:
- approved-for-6.6.1

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

I am seeing failures with many Eventing workers and 25M+ docs

I create an Eventing function "test_update_2" (attached) with an alias of "bdp_vardata" to a bucket called "crondata" (Memory Quota 7.9GB) and have 64 workers with the following source code:

function OnUpdate(doc, meta) {

 var maxattempt = 2;

 for (var tries=1; tries<=maxattempt; tries++) {

   try {

     var doc = bdp_vardata[meta.id];

     doc.random = Math.random();

     bdp_vardata[meta.id] = doc;

     break;

   } catch (e) {

     if (tries === maxattempt)

       log("attempt "+ tries + " error occured during deletion :: ",

           e, " for id ", meta.id);

The source bucket and the bucket that is updated is "crondata" in addition there is a 100MB Eventing meta data bucket "metadata"

I load 25,528,448 document into crondata with a KEYs like todelete01::100006 and data like

 "type": "vbs_seed",

 "id": 100006,

once the eventing function runs all documents in bucket "crondata" will be enriched with a new field called "random"

 "type": "vbs_seed",

 "id": 100006,

 "random": 0.22187920300189878

The single node server

When I run Eventing on my 12 core 2.1Ghz 64 MB Xeon

uname -a
Linux couch01 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux

/opt/couchbase/bin/couchbase-server -v
Couchbase Server 6.6.0-7883 (EE)

Configured with Eventing 256 RAM, Data 7900 MB RAM no other services

The Issue

The system will process about 7.6 Million doc (mutations) and then I will get LCB_ETMPFAIL errors.

2020-07-15T18:52:46.795-07:00 [INFO] "attempt 2 error occured during deletion :: " {"message":{"code":392,"desc":"Temporary failure received from server. Try again later","name":"LCB_ETMPFAIL"},"stack":"Error\n at OnUpdate (test_update_2.js:10:35)"} " for id " "todelete22::63364"

A work around

In the UI simply pause then resume every 7 million rows. This proves that Eventing can process the data with 64 workers but something odd is happening where we don't honor some sort of resource constraint.

I believe I also have no issues if I set the workers down form sixty-four (64) to just three (3) workers

I have prepared a video showing exactly how it fails hopefully the video and the uploaded Eventing function will help track down the root cause.

Attachments

Issue Links

is a backport of

MB-40498 Eventing is not retrying bucket ops failures like ETMPFAIL that can be retried

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Vikas Chaudhary

Reporter:: Jeelan Poola

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Jul/20 10:39 PM

Updated:: 17/Aug/20 10:52 PM

Resolved:: 11/Aug/20 9:06 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-40518 : check retriability of LCB errors properly: Gerrit Review:

[BP MB-40498] - Eventing is not retrying bucket ops failures like ETMPFAIL that can be retried

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty