Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51408

KV performance degradation observed due to increased tmp_oom failures on Eventing CI

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Yes
    • KV March-22

    Description

      Issue observed only starting from build 2440 onwards

      The following issue can be observed clearly on eventing's CI tests starting from build 2440 and is reproducible everytime. For comparison, no KV failures are observed for the same tests on build 2434.

      Steps to reproduce:

      • Setup a single node cluster with services: kv, eventing
      • Create 3 buckets: default (will be used as source bucket for the eventing function), eventing (used to store metadata), hello-world (used as destination bucket binding)
      • Keep the memory quota of destination bucket to 500 MB.
      • Create the following function. For each mutation on src bucket, this function upserts 6 15 MB docs via bucket ops and via N1QL to the destination bucket (hello-world).

      Code:

      eventing_function.js

      • Deploy the function and create 10-20 documents on src bucket which in-turn should upsert 50 15 MB docs to the bucket "hello-world" (destination bucket).

      Observation based on eventing CI tests:

      • Everything works without issues on build 2434.
      • libcouchbase (the client used by eventing for upserts to destination bucket) reports a lot of tmpfails / tmp oom errors starting from build 2440. We've tested this behaviour until build 2470 where similar issues are observed.
        Example:

      2022-03-11T12:20:52.020+05:30 [INFO] {"message":{"code":4,"desc":"LCB_ERR_TEMPORARY_FAILURE (207): Temporary failure","name":"LCB_ERR_TEMPORARY_FAILURE"},"stack":"Error\n    at OnUpdate (TestCacheOverflowCustom.js:13:23)"}
      

      Attached is cbstats from one of the KV nodes for the destination bucket: hello-world_cbstats_n1.log where we observe KV reporting a high number of tmp_oom errors.

      Changelog : shows no change introduced in eventing, (only a few unrelated changes in ns_server) and 2 patches in kv:

      http://changelog.build.couchbase.com/?product=couchbase-server&fromVersion=7.1.0&fromBuild=2434&toVersion=7.1.0&toBuild=2440&f_cbas=off&f_cbas-core=off&f_couchdb=off&f_indexing=off&f_kv_engine=on&f_nitro=off&f_ns_server=off&f_product-metadata=off&f_query-ui=off&f_testrunner=off&f_tlm=off

      Attachments

        1. eventing_function.js
          1 kB
        2. eventing_Setup.png
          eventing_Setup.png
          228 kB
        3. hello-world_cbstats_n1.log
          33 kB
        4. MB-51408_neo_actual-DEV-OFF_ephe.png
          MB-51408_neo_actual-DEV-OFF_ephe.png
          306 kB
        5. MB-51408_neo_DEV-OFF.png
          MB-51408_neo_DEV-OFF.png
          372 kB
        6. MB-51408_neo_DEV-ON_ephe.png
          MB-51408_neo_DEV-ON_ephe.png
          346 kB
        7. MB-51408_neo_DEV-ON.png
          MB-51408_neo_DEV-ON.png
          369 kB

        Issue Links

          For Gerrit Dashboard: MB-51408
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-server-7.1.0-2485 contains kv_engine commit 9664b23 with commit message:
            MB-51408: Don't miss closing the open checkpoint at memory recovery

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2485 contains kv_engine commit 9664b23 with commit message: MB-51408 : Don't miss closing the open checkpoint at memory recovery
            drigby Dave Rigby added a comment -

            Fixed in 7.1.0-2485.

            drigby Dave Rigby added a comment - Fixed in 7.1.0-2485.
            ritam.sharma Ritam Sharma added a comment -

            Sujay Gad = Please get a full regression of eventing suite to be done, and also check with Bala on epengine tests too.

            ritam.sharma Ritam Sharma added a comment - Sujay Gad = Please get a full regression of eventing suite to be done, and also check with Bala on epengine tests too.
            sujay.gad Sujay Gad added a comment -

            Closing based on regression run on 7.1.0-2490.

            sujay.gad Sujay Gad added a comment - Closing based on regression run on 7.1.0-2490.

            Build couchbase-server-7.2.0-1021 contains kv_engine commit 9664b23 with commit message:
            MB-51408: Don't miss closing the open checkpoint at memory recovery

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1021 contains kv_engine commit 9664b23 with commit message: MB-51408 : Don't miss closing the open checkpoint at memory recovery

            People

              sujay.gad Sujay Gad
              abhishek.jindal Abhishek Jindal
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty