Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54374

OOM issues in Magma (but not in Couchstore)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • backlog
    • 7.1.2
    • storage-engine
    • Untriaged
    • 1
    • Unknown

    Description

      Testing 3M docs of size approx 5K bytes via pillowfight making sure I was 100% resident I saw some behaviour (perhaps a regression) where during a load phase using cbc-pillowfight I silently lose 1/2 my documents when using Magma.  Couchstore does not exhibit this issue.  I find this odd considering that Magma should support very low residency ratios to hitting OOMs at 100% RR doesn't seem right.

      I tested on a single Linux node (my dev box) the problem should be reproducible via changing some evars and running:

               ./doit.sh

      System configuration:

      uname -a
      Linux couch01 4.19.0-17-amd64 #1 SMP Debian 4.19.194-1 (2021-06-10) x86_64 GNU/Linux
       
       
      cat /proc/cpuinfo | egrep vendor_id\|cpu cores\|cache size\|model name | sort -u
      cache size	: 18432 KB
      cpu cores	: 12
      model name	: Intel(R) Xeon(R) CPU D-1567 @ 2.10GHz
      vendor_id	: GenuineIntel
       
       
      free
                    total        used        free      shared  buff/cache   available
      Mem:       65862936    20666768    41480332       70712     3715836    43224320
      Swap:      67003388     1222144    65781244
       
       
      sudo fdisk -l
      Disk /dev/nvme0n1: 953.9 GiB, 1024209543168 bytes, 2000409264 sectors
      Disk model: Samsung SSD 960 PRO 1TB                 
      Units: sectors of 1 * 512 = 512 bytes
      Sector size (logical/physical): 512 bytes / 512 bytes
      I/O size (minimum/optimal): 512 bytes / 512 bytes
      Disklabel type: gpt
      Disk identifier: D98AE010-0DC4-4110-B0FF-641C058BBE6C
       
       
      Device              Start        End    Sectors   Size Type
      /dev/nvme0n1p1       2048    1050623    1048576   512M EFI System
      /dev/nvme0n1p2    1050624 1866401791 1865351168 889.5G Linux filesystem
      /dev/nvme0n1p3 1866401792 2000409230  134007439  63.9G Linux filesystem
       
       
      /opt/couchbase/bin/couchbase-server --version
      Couchbase Server 7.1.2-3454 (EE)

      Testing Couchstore

      sleep 60
      >>>>> DELETE BUCKET Tue 01 Nov 2022 12:47:00 PM PDT
      bucket delete bucket 'massive'
      SUCCESS: Bucket deleted
      sleep 30
       
       
      ==========
      >>>>> CREATE BUCKET Tue 01 Nov 2022 12:47:58 PM PDT
      bucket create bucket 'massive', type couchstore, size 32000 (sometimes --storage-backend=magma doesn't load all expected data?)
      SUCCESS: Bucket created
      sleep 30
       
       
      ==========
      >>>>> BENCHMARK LOAD BUCKET Tue 01 Nov 2022 12:48:29 PM PDT
      loading 3000000 docs between 4.5K and 5.5K into bucket 'massive' type couchstore
      time cbc-pillowfight -I ${DOCS} --json -m 4500 -M 5500 -t 8 -U couchbase://${CB_HOSTNAME}/massive -u ${CB_USERNAME} -P ${CB_PASSWORD} --populate-only -B 64 -Dtimeout=60
      Populating using 5860 cycles
      Running. Press Ctrl-C to terminate...
      OPS/SEC:      72307
      OPS/SEC:      69062
      Thread 7 has finished populating.
      Thread 4 has finished populating.
      Thread 5 has finished populating.
      Thread 3 has finished populating.
      Thread 0 has finished populating.
      Thread 6 has finished populating.
      Thread 2 has finished populating.
      Thread 1 has finished populating.
       
       
      real	0m43.157s
      user	1m10.673s
      sys	1m1.092s
      sleep 25
      >>>>> BENCHMARK BUCKET 66.7% READ 33.3% WRITETue 01 Nov 2022 12:49:37 PM PDT
      GET DOCS IN BUCKET 'massive' SHOULD BE 3000000 (all stats in file stats.couchstore.json)
      curl -s -u tadmin:test_magma_999 http://localhost:8091/pools/default/buckets/massive/stats | jq .op.samples.curr_items[57]
      3000000
      sleep 5
       
       
      ==========
      pillow fight to measure perf 33% write type couchstore
      time cbc-pillowfight -I 3000000 --json -m 4500 -M 5500 -t 8 -U couchbase://localhost/massive -u tadmin -P test_magma_999 --no-population -B 64 -Dtimeout=60 --num-cycles 50000
      Running. Press Ctrl-C to terminate...
      OPS/SEC:     153828
      OPS/SEC:     153433
      real	2m45.974s
      user	6m45.075s
      sys	4m19.865s

      Testing Magma (about 1/2 way through there are some OOM issues and data is silently lost).  The OP/SEC drops during the load or populate portion I was hitting a few returns.

      We should have loaded 3M (3000000) but only loaded  1.35M (1354436) docs

      sleep 60
      >>>>> DELETE BUCKET Tue 01 Nov 2022 12:53:28 PM PDT
      bucket delete bucket 'massive'
      SUCCESS: Bucket deleted
      sleep 30
       
       
      ==========
      >>>>> CREATE BUCKET Tue 01 Nov 2022 12:54:04 PM PDT
      bucket create bucket 'massive', type magma, size 32000 (sometimes --storage-backend=magma doesn't load all expected data?)
      SUCCESS: Bucket created
      sleep 30
       
       
      ==========
      >>>>> BENCHMARK LOAD BUCKET Tue 01 Nov 2022 12:54:35 PM PDT
      loading 3000000 docs between 4.5K and 5.5K into bucket 'massive' type magma
      time cbc-pillowfight -I ${DOCS} --json -m 4500 -M 5500 -t 8 -U couchbase://${CB_HOSTNAME}/massive -u ${CB_USERNAME} -P ${CB_PASSWORD} --populate-only -B 64 -Dtimeout=60
      Populating using 5860 cycles
      Running. Press Ctrl-C to terminate...
      OPS/SEC:      67857
      OPS/SEC:      68787
      OPS/SEC:      58999
      OPS/SEC:      50204
      OPS/SEC:      44482
      OPS/SEC:      39558
      Thread 4 has finished populating.
      Thread 3 has finished populating.
      Thread 7 has finished populating.
      Thread 1 has finished populating.
      Thread 2 has finished populating.
      Thread 5 has finished populating.
      Thread 0 has finished populating.
      Thread 6 has finished populating.
       
       
      real	1m31.538s
      user	1m22.681s
      sys	1m11.551s
      sleep 25
      >>>>> BENCHMARK BUCKET 66.7% READ 33.3% WRITETue 01 Nov 2022 12:56:32 PM PDT
      GET DOCS IN BUCKET 'massive' SHOULD BE 3000000 (all stats in file stats.magma.json)
      curl -s -u tadmin:test_magma_999 http://localhost:8091/pools/default/buckets/massive/stats | jq .op.samples.curr_items[57]
      1354436
      sleep 5
       
       
      ==========
      pillow fight to measure perf 33% write type magma
      time cbc-pillowfight -I 3000000 --json -m 4500 -M 5500 -t 8 -U couchbase://localhost/massive -u tadmin -P test_magma_999 --no-population -B 64 -Dtimeout=60 --num-cycles 50000
      Running. Press Ctrl-C to terminate...
      OPS/SEC:     156129
      OPS/SEC:     157160
      real	2m41.668s
      user	6m35.753s
      sys	4m20.330s

      During the non-populate 66% read 33% write pillow fight step the bucket gets more docs and will eventually reach 3M (3000000). In this second step Magma behaves on pare with Couchstore.

      Attached is the following:

      • image01_couchstore_v_magma.PNG - showing the OOM
      • doit.sh - the script used for the test
      • doit.sh.log - the screen output of the script
      • stats.couchstore.json - full bucket stats after the populate for couchstore
      • stats.magma.json - full bucket stats after the populate for magma
      • image02_magma_with_3m_docs.PNG - 100% RR afer pillowfight for magma

      Attachments

        1. 32GB xfs test.png
          32GB xfs test.png
          113 kB
        2. doit_load_only_magma.jpg
          doit_load_only_magma.jpg
          90 kB
        3. doit.sh
          3 kB
        4. doit.sh.log
          5 kB
        5. fio_results.zip
          6 kB
        6. image01_couchstore_v_magma.png
          image01_couchstore_v_magma.png
          410 kB
        7. image02_magma_with_3m_docs.png
          image02_magma_with_3m_docs.png
          139 kB
        8. MB-54374_cbcollect.zip
          59.68 MB
        9. stats.couchstore.json
          51 kB
        10. stats.magma.json
          52 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apaar.gupta Apaar Gupta
            jon.strabala Jon Strabala
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty