Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49877

[Magma 10TB, 1%]: memcached cannot accept new connections due to deadlock in Magma::GetStats

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Neo
    • Neo
    • storage-engine
    • 7.1.0-1817

    Description

      1. Step 1: Create a 4 node cluster
      2. Step 2: Create required buckets and collections.
      3. Step 3: Create 625000 items sequentially
      4. Step 4: Update 625000 RandonKey keys to create 50 percent fragmentation
      5. Step 5: Create 625000 items sequentially
      6. Step 6: Update 625000 RandonKey keys to create 50 percent fragmentation
      7. Step 7: Rebalance in with Loading of docs. Abort and resume rebl at 20%, 40%, 60%, 80%
      8. Step 8: Crash Magma/memc with Loading of docs
      9. Step 9: Rebalance Out with Loading of docs. Abort and resume rebl at 20%. Rebalance failed on resumption.

      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.8491.142>,
      {{{bad_return_value,{error,timeout}},
      {gen_server,call,
      ['ns_memcached-GleamBookUsers0',
      {delete_vbucket,66},
      360000]}},
      {gen_server,call,
      [{'janitor_agent-GleamBookUsers0',
      'ns_1@172.23.110.65'},
      {if_rebalance,<0.8373.142>,
      {wait_dcp_data_move,
      ['ns_1@172.23.110.64',
      'ns_1@172.23.110.66'],
      406}},
      infinity]}}}}}.
      Rebalance Operation Id = ab0fe02c379c617b89c2ca96499125e2
      

      QE Test

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job3.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.Hospital.Murphy.SystemTestMagma,nodes_init=3,graceful=True,skip_cleanup=True,num_items=40000000,num_buckets=1,bucket_names=GleamBook,doc_size=1536,bucket_type=membase,eviction_policy=fullEviction,iterations=5,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,num_collections=50,maxttl=10,num_indexes=100,pc=10,index_nodes=0,cbas_nodes=0,fts_nodes=0,ops_rate=200000,ramQuota=102400,doc_ops=create:update:delete:read,mutation_perc=100,rebl_ops_rate=50000,key_type=RandomKey -m rest'
      

      Attachments

        1. gdb_session_useful_stats_dump.txt
          23 kB
        2. lsmtreestats_page1.png
          lsmtreestats_page1.png
          2.14 MB
        3. lsmtreestats_page2.png
          lsmtreestats_page2.png
          2.29 MB
        4. magma_dump_kvstore245.txt
          26 kB
        5. MB-49877_gdb_session.txt
          61 kB
        6. MB-49877_gdb_thread_apply_all_bt_full.txt
          5.18 MB
        7. Rebalance Failed during vB movement.png
          Rebalance Failed during vB movement.png
          202 kB
        8. TestLogs.txt
          5.22 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          ritesh.agarwal Ritesh Agarwal added a comment - - edited

          Seeing the same issue in another run where we have more nodes in the test.

          this time test failed at crash recovery step where memc is killed on all kv nodes in the cluster and this step is repeated 10 times.
          Between each iteration test wait for memc to come up on all nodes and bucket in ready state. During this all, doc ops are running with durability=majority. It was the 8th iteration where bucket is stuck in warmup. Previous 7 iterations were all went fine.

          Logs:
          http://supportal.couchbase.com/snapshot/14a503ccd5f7841f1fe3c7b9bc106259::0

          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.115.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.116.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.123.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.130.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.131.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.132.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.133.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.134.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.120.170.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.124.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.126.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.127.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.128.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.129.zip
          s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.135.zip

          ritesh.agarwal Ritesh Agarwal added a comment - - edited Seeing the same issue in another run where we have more nodes in the test. this time test failed at crash recovery step where memc is killed on all kv nodes in the cluster and this step is repeated 10 times. Between each iteration test wait for memc to come up on all nodes and bucket in ready state. During this all, doc ops are running with durability=majority. It was the 8th iteration where bucket is stuck in warmup. Previous 7 iterations were all went fine. Logs: http://supportal.couchbase.com/snapshot/14a503ccd5f7841f1fe3c7b9bc106259::0 s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.115.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.116.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.123.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.130.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.131.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.132.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.133.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.134.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.120.170.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.124.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.126.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.127.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.128.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.129.zip s3://cb-customers-secure/mb-49877/2021-12-03/collectinfo-2021-12-03t143058-ns_1@172.23.121.135.zip
          rohan.suri Rohan Suri added a comment - - edited

          We seem to continuously loop in magma::LSMTree::tryLevel0BlockingCompaction().

          Every time we call tryCompactOnWrite it returns a non-OK status due to "Unable to find compaction candidates".

          (gdb) print status.s._M_t._M_t._M_head_impl.code
          $13 = magma::Status::NotFound(
           
          gdb) print status.s._M_t._M_t._M_head_impl.msg
          $14 = {static npos = 18446744073709551615,   _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},     _M_p = 0x7f3455bda290 "Unable to find compaction candidates"}, _M_string_length = 36, {    _M_local_buf = "$\000\000\000\000\000\000\000ates\000\177\000", _M_allocated_capacity = 36}}

          Which is strange as we only do tryLevel0BlockingCompaction if L0 really needs compaction. Then why don't we find any compaction candidates?

          The compaction trigger ratio of L0 is absurdly high. Since it never goes down, we continuously loop and hold the locks which stops all other threads from making any progress.

           

          (gdb) call getCompactionTrigger(0)
          $107 = {Ratio = 4398046511103, Reason = magma::LSMTree::CompactionTrigger::Size}

          The ratio for L0 is computed as getLevelSize(0)/4MB. The LevelSize is absurdly high too. 

          (gdb) call getLevelSize(0, false)
          18446744073708972894

          levelsize=logicalSize-compactingSize. Since logicalSize is 0, due to integer overflow we see those large absurd values.

          (gdb) p currentTreeSnapshot._M_ptr.data._M_impl._M_start._M_t._M_t._M_head_impl.logicalSize
           $166 = 0
           
          (gdb) p compactionState._M_impl._M_start.compactingSize
           $161 = 578722

          Ideally logicalSize should always be greater than compactingSize. We need to understand what led to this.

          Some other helpful state like number of tables, sstables in treesnapshot, etc are dumped in file gdb_session_useful_stats_dump.txt

          Attached dump of LSMTreeStats (lsmtreestats_page1, lsmtreestats_page2).

          Attached magma_dump for the particular kvstore being debugged in gdb magma.29/kvstore245.

          rohan.suri Rohan Suri added a comment - - edited We seem to continuously loop in magma::LSMTree::tryLevel0BlockingCompaction(). Every time we call tryCompactOnWrite it returns a non-OK status due to "Unable to find compaction candidates". (gdb) print status.s._M_t._M_t._M_head_impl.code $13 = magma::Status::NotFound(   gdb) print status.s._M_t._M_t._M_head_impl.msg $14 = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7f3455bda290 "Unable to find compaction candidates"}, _M_string_length = 36, { _M_local_buf = "$\000\000\000\000\000\000\000ates\000\177\000", _M_allocated_capacity = 36}} Which is strange as we only do tryLevel0BlockingCompaction if L0 really needs compaction. Then why don't we find any compaction candidates? The compaction trigger ratio of L0 is absurdly high. Since it never goes down, we continuously loop and hold the locks which stops all other threads from making any progress.   (gdb) call getCompactionTrigger(0) $107 = {Ratio = 4398046511103, Reason = magma::LSMTree::CompactionTrigger::Size} The ratio for L0 is computed as getLevelSize(0)/4MB. The LevelSize is absurdly high too.  (gdb) call getLevelSize(0, false) 18446744073708972894 levelsize=logicalSize-compactingSize. Since logicalSize is 0, due to integer overflow we see those large absurd values. (gdb) p currentTreeSnapshot._M_ptr.data._M_impl._M_start._M_t._M_t._M_head_impl.logicalSize $166 = 0   (gdb) p compactionState._M_impl._M_start.compactingSize $161 = 578722 Ideally logicalSize should always be greater than compactingSize. We need to understand what led to this. Some other helpful state like number of tables, sstables in treesnapshot, etc are dumped in file gdb_session_useful_stats_dump.txt Attached dump of LSMTreeStats (lsmtreestats_page1, lsmtreestats_page2). Attached magma_dump for the particular kvstore being debugged in gdb magma.29/kvstore245.
          rohan.suri Rohan Suri added a comment - - edited

          I think I've understood where the bug is. As of this commit (7.1.0-1742 upwards), we now support  LSMTree::newGCFilter to return a non-OK status (common during warmup as stated in another change). However, in case that happens, we return early and don't restore the LSMTree::compactionState. Hence any tables that were accounted as compacting, will forever be accounted as such. I'll add a fix and unit test for this.

          rohan.suri Rohan Suri added a comment - - edited I think I've understood where the bug is. As of this commit (7.1.0-1742 upwards), we now support  LSMTree::newGCFilter to return a non-OK status (common during warmup as stated in another change ). However, in case that happens, we return early and don't restore the LSMTree::compactionState. Hence any tables that were accounted as compacting, will forever be accounted as such. I'll add a fix and unit test for this.

          Build couchbase-server-7.1.0-1844 contains magma commit 0a8b45d with commit message:
          MB-49877 magma: Restore compaction state upon failure

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1844 contains magma commit 0a8b45d with commit message: MB-49877 magma: Restore compaction state upon failure

          Build couchbase-server-7.1.0-1851 contains magma commit 3d747c7 with commit message:
          MB-49877 magma: Assert compactionState <= tree snapshot state

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1851 contains magma commit 3d747c7 with commit message: MB-49877 magma: Assert compactionState <= tree snapshot state

          People

            rohan.suri Rohan Suri
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty