Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41300

[Collections] Monotonic exception on PassiveDurabilityMonitor::State::updateHighPreparedSeqno

    XMLWordPrintable

    Details

      Description

      Script to Repo

      ./testrunner -i /tmp/durability_volume.ini rerun=False -t bucket_collections.collections_network_split.CollectionsNetworkSplit.test_collections_crud_with_network_split,nodes_init=4,bucket_spec=single_bucket.buckets_all_membase_for_rebalance_tests_more_collections,override_spec_params=durability;replicas,durability=PERSIST_TO_MAJORITY,replicas=2,subsequent_action=rebalance-out

      Steps to Reproduce
      1. Create a 4 node cluster
      2020-09-04 04:01:10,806 | test | INFO | pool-2-thread-7 | [table_view:display:72] Rebalance Overview
      -----------------------++-------------

      Nodes Services Status

      -----------------------++-------------

      172.23.105.211 kv Cluster node
      172.23.105.212 None <--- IN —
      172.23.105.213 None <--- IN —
      172.23.105.215 None <--- IN —

      -----------------------++-------------
      2. Initial data load into bucket
      2020-09-04 04:05:12,655 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      -----------------+----------------------------------------------------+----------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      -----------------+----------------------------------------------------+----------

      default couchbase 2 none 0 500000 8388608000 508642048 597462790

      -----------------+----------------------------------------------------+----------
      3. Perform a network split by blocking .212 traffic on .211 and vice versa with parallel data load

      4. Hard failover .212 with data load in parallel
      5. Rebalance out .212 with data load in parallel
      2020-09-04 04:18:06,657 | test | INFO | pool-2-thread-26 | [table_view:display:72] Rebalance Overview
      -----------------------++-------------

      Nodes Services Status

      -----------------------++-------------

      172.23.105.215 kv Cluster node
      172.23.105.212 [u'kv'] — OUT --->
      172.23.105.213 kv Cluster node
      172.23.105.211 kv Cluster node

      -----------------------++-------------
      Rebalance op fails with coredumps on .211

      BT 23ee0a42-688b-4cd0-7e3764b7-7b4a649f.dmp 

      (gdb) bt full
      #0  0x00007ff1ab51a387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
              resultvar = 0
              pid = 32357
              selftid = 32670
      #1  0x00007ff1ab51ba78 in __GI_abort () at abort.c:90
              save_stage = 2
              act = {__sigaction_handler = {sa_handler = 0x7ff1ab8ac1c0 <_IO_2_1_stderr_>, sa_sigaction = 0x7ff1ab8ac1c0 <_IO_2_1_stderr_>}, sa_mask = {__val = {140675938859561, 0, 140675938378339, 140675752100568, 140675941843392, 1, 
                    140675941843523, 140675941827456, 140675938384510, 140675941843392, 10, 140674540258608, 140674967992256, 140674967992320, 140675938385779, 140675941843392}}, sa_flags = -1408806528, sa_restorer = 0x7ff1a03b82d8}
              sigs = {__val = {32, 0 <repeats 15 times>}}
      #2  0x00007ff1ac078195 in __gnu_cxx::__verbose_terminate_handler() () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #3  0x000000000054edb2 in backtrace_terminate_handler() ()
      No symbol table info available.
      #4  0x00007ff1ac075f86 in __cxxabiv1::__terminate(void (*)()) () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #5  0x00007ff1ac075fd1 in std::terminate() () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #6  0x00007ff1ac076213 in __cxa_throw () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #7  0x00007ff1af80f256 in ThrowExceptionPolicy<long>::nonMonotonic(long const&, long const&) () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #8  0x00007ff1af8999b2 in PassiveDurabilityMonitor::State::updateHighPreparedSeqno() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #9  0x00007ff1af89c1b8 in PassiveDurabilityMonitor::notifyLocalPersistence() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #10 0x00007ff1af9691f6 in VBucket::notifyPersistenceToDurabilityMonitor() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #11 0x00007ff1af8a6be8 in EPBucket::flushVBucket(Vbid) () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #12 0x00007ff1af8fe6bc in Flusher::flushVB() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #13 0x00007ff1af8ff899 in Flusher::step(GlobalTask*) () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #14 0x00007ff1af9025f3 in GlobalTask::execute() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #15 0x00007ff1af808faf in CB3ExecutorThread::run() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #16 0x00007ff1ae27c777 in platform_thread_wrap(void*) () from /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0
      No symbol table info available.
      #17 0x00007ff1ab8b9ea5 in start_thread (arg=0x7ff1717fa700) at pthread_create.c:307
              __res = <optimized out>
              pd = 0x7ff1717fa700
              now = <optimized out>
              unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140674968037120, -2338991119659410976, 0, 8392704, 0, 140674968037120, 2335355026998319584, 2335517656178907616}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
                    prev = 0x0, cleanup = 0x0, canceltype = 0}}}
              not_first_call = <optimized out>
              pagesize_m1 = <optimized out>
              sp = <optimized out>
              freesize = <optimized out>
      #18 0x00007ff1ab5e28dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

      other core dump 74a76e88-5cdb-4946-095eaf94-a5e70a07.dmp is similar to 
      https://issues.couchbase.com/browse/MB-41235

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          paolo.cocchi Paolo Cocchi added a comment -

          Sumedh Basarkod BPs in for both 6.5.2 and 6.6.2, assigning back to you, thanks.

          Show
          paolo.cocchi Paolo Cocchi added a comment - Sumedh Basarkod BPs in for both 6.5.2 and 6.6.2, assigning back to you, thanks.
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-6.6.2-9419 contains kv_engine commit 0841dbc with commit message:
          MB-41300 [BP]: Reposition the HPS correctly in PDM at Prepare dedup

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.2-9419 contains kv_engine commit 0841dbc with commit message: MB-41300 [BP] : Reposition the HPS correctly in PDM at Prepare dedup
          Hide
          ashwin.govindarajulu Ashwin Govindarajulu added a comment -

          Validated the fix on master using 7.0.0-4202-enterprise.

          Closing this ticket for 6.5.2 and 6.6.2 based on unit-tests.

          Show
          ashwin.govindarajulu Ashwin Govindarajulu added a comment - Validated the fix on master using 7.0.0-4202-enterprise. Closing this ticket for 6.5.2 and 6.6.2 based on unit-tests.
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-7.0.0-4255 contains kv_engine commit 0841dbc with commit message:
          MB-41300 [BP]: Reposition the HPS correctly in PDM at Prepare dedup

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4255 contains kv_engine commit 0841dbc with commit message: MB-41300 [BP] : Reposition the HPS correctly in PDM at Prepare dedup
          Hide
          paolo.cocchi Paolo Cocchi added a comment - - edited

          Description for release notes
          Summary: Fixing a potential issue where sanity checks may trigger at Replica and cause a crash when the node receives temporary mutations for Sync Replication.

          Show
          paolo.cocchi Paolo Cocchi added a comment - - edited Description for release notes Summary: Fixing a potential issue where sanity checks may trigger at Replica and cause a crash when the node receives temporary mutations for Sync Replication.

            People

            Assignee:
            ashwin.govindarajulu Ashwin Govindarajulu
            Reporter:
            sumedh.basarkod Sumedh Basarkod
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty