Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50627

[Magma] Possible issue in reporting active items resident ratio

    XMLWordPrintable

Details

    Description

      Build : 7.1.0-2147
      Test : -test tests/integration/neo/test_neo_magma_milestone4.yml -scope tests/integration/neo/scope_neo_magma.yml
      Scale : 3
      Iteration : 1st

      • In this system test, we have 4 magma buckets on which there are 31 collections each.
      • We run a workload to insert json docs to all these buckets sequentially, on all collections.
      • bucket6 and bucket7 have 1197MB memory quota each.
      • The doc loader is designed to keep on loading docs in batches, and poll for the active items resident ratio. If it is less than the targeted (80% in this case), then doc loading should stop.
      • For bucket6, 1050000 docs were loaded per collection, which is 32.55M for the whole bucket before hitting 80% RR.
      • Expectation is that for bucket7, since the memory quota is the same, when hitting 80% RR, number of docs should be around the number as bucket6.

      For bucket7, by the time active resident ratio hit 80%, 4038000 docs got loaded per collection, which is ~125M for the bucket.

      Could this be because of inaccurate computation of RR? Due to this excessive loading, we see memcached OOM kills and general instability in the cluster due to sizing issues.

      Doc loading for bucket6 started at around 2022-01-25T13:28:43-08:00 and went on till 2022-01-25T14:38:21-08:00
      Doc loading for bucket7 started at around 2022-01-25T14:38:21-08:00 and went on till 2022-01-25T20:29:01-08:00

      From the test console

      [2022-01-25T13:28:43-08:00, sequoiatools/catapult_dgm:d9a36a] -i 172.23.97.74 -r 80 -u Administrator -p password -n 6000 -b bucket6 -dt Hotel -pc 100 -ds 1000 -ac True --num_threads 4
      [2022-01-25T14:38:21-08:00, sequoiatools/catapult_dgm:723a04] -i 172.23.97.74 -r 80 -u Administrator -p password -n 6000 -b bucket7 -dt Hotel -pc 100 -ds 1000 -ac True --num_threads 4
      [2022-01-25T20:29:01-08:00, sequoiatools/couchbase-cli:30e82c] bucket-edit -c 172.23.97.74 -u Administrator -p password --bucket bucket7 --max-ttl 3600
      

      In between, we also see memcached process getting OOM killed on various KV nodes :
      On 172.23.97.241

      [user:info,2022-01-25T19:00:16.581-08:00,ns_1@172.23.97.241:<0.25496.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
      

      On 172.23.97.74

      [user:info,2022-01-25T19:55:09.470-08:00,ns_1@172.23.97.74:<0.25013.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
      

      On 172.23.96.122

      [user:info,2022-01-25T20:27:25.611-08:00,ns_1@172.23.96.122:<0.25938.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
      

      On 172.23.96.48

      [user:info,2022-01-25T21:02:25.342-08:00,ns_1@172.23.96.48:<0.25739.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages:
      

      Logs attached were collected around 2022-01-25 21:45 PST. Let me know if you need logs from an earlier timestamp.
      Nodes with KV service : 172.23.120.73, 172.23.120.74, 172.23.120.77, 172.23.120.86, 172.23.121.77, 172.23.123.25, 172.23.123.26, 172.23.96.122, 172.23.96.14, 172.23.96.48, 172.23.97.241, 172.23.97.74

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty