Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41300

Monotonic exception on PassiveDurabilityMonitor::State::updateHighPreparedSeqno

    XMLWordPrintable

Details

    Description

      Script to Repo

      ./testrunner -i /tmp/durability_volume.ini rerun=False -t bucket_collections.collections_network_split.CollectionsNetworkSplit.test_collections_crud_with_network_split,nodes_init=4,bucket_spec=single_bucket.buckets_all_membase_for_rebalance_tests_more_collections,override_spec_params=durability;replicas,durability=PERSIST_TO_MAJORITY,replicas=2,subsequent_action=rebalance-out

      Steps to Reproduce
      1. Create a 4 node cluster
      2020-09-04 04:01:10,806 | test | INFO | pool-2-thread-7 | [table_view:display:72] Rebalance Overview
      -----------------------++-------------

      Nodes Services Status

      -----------------------++-------------

      172.23.105.211 kv Cluster node
      172.23.105.212 None <--- IN —
      172.23.105.213 None <--- IN —
      172.23.105.215 None <--- IN —

      -----------------------++-------------
      2. Initial data load into bucket
      2020-09-04 04:05:12,655 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      -----------------+----------------------------------------------------+----------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      -----------------+----------------------------------------------------+----------

      default couchbase 2 none 0 500000 8388608000 508642048 597462790

      -----------------+----------------------------------------------------+----------
      3. Perform a network split by blocking .212 traffic on .211 and vice versa with parallel data load

      4. Hard failover .212 with data load in parallel
      5. Rebalance out .212 with data load in parallel
      2020-09-04 04:18:06,657 | test | INFO | pool-2-thread-26 | [table_view:display:72] Rebalance Overview
      -----------------------++-------------

      Nodes Services Status

      -----------------------++-------------

      172.23.105.215 kv Cluster node
      172.23.105.212 [u'kv'] — OUT --->
      172.23.105.213 kv Cluster node
      172.23.105.211 kv Cluster node

      -----------------------++-------------
      Rebalance op fails with coredumps on .211

      BT 23ee0a42-688b-4cd0-7e3764b7-7b4a649f.dmp 

      (gdb) bt full
      #0  0x00007ff1ab51a387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
              resultvar = 0
              pid = 32357
              selftid = 32670
      #1  0x00007ff1ab51ba78 in __GI_abort () at abort.c:90
              save_stage = 2
              act = {__sigaction_handler = {sa_handler = 0x7ff1ab8ac1c0 <_IO_2_1_stderr_>, sa_sigaction = 0x7ff1ab8ac1c0 <_IO_2_1_stderr_>}, sa_mask = {__val = {140675938859561, 0, 140675938378339, 140675752100568, 140675941843392, 1, 
                    140675941843523, 140675941827456, 140675938384510, 140675941843392, 10, 140674540258608, 140674967992256, 140674967992320, 140675938385779, 140675941843392}}, sa_flags = -1408806528, sa_restorer = 0x7ff1a03b82d8}
              sigs = {__val = {32, 0 <repeats 15 times>}}
      #2  0x00007ff1ac078195 in __gnu_cxx::__verbose_terminate_handler() () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #3  0x000000000054edb2 in backtrace_terminate_handler() ()
      No symbol table info available.
      #4  0x00007ff1ac075f86 in __cxxabiv1::__terminate(void (*)()) () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #5  0x00007ff1ac075fd1 in std::terminate() () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #6  0x00007ff1ac076213 in __cxa_throw () from /opt/couchbase/bin/../lib/libstdc++.so.6
      No symbol table info available.
      #7  0x00007ff1af80f256 in ThrowExceptionPolicy<long>::nonMonotonic(long const&, long const&) () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #8  0x00007ff1af8999b2 in PassiveDurabilityMonitor::State::updateHighPreparedSeqno() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #9  0x00007ff1af89c1b8 in PassiveDurabilityMonitor::notifyLocalPersistence() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #10 0x00007ff1af9691f6 in VBucket::notifyPersistenceToDurabilityMonitor() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #11 0x00007ff1af8a6be8 in EPBucket::flushVBucket(Vbid) () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #12 0x00007ff1af8fe6bc in Flusher::flushVB() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #13 0x00007ff1af8ff899 in Flusher::step(GlobalTask*) () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #14 0x00007ff1af9025f3 in GlobalTask::execute() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #15 0x00007ff1af808faf in CB3ExecutorThread::run() () from /opt/couchbase/bin/../lib/libep.so
      No symbol table info available.
      #16 0x00007ff1ae27c777 in platform_thread_wrap(void*) () from /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0
      No symbol table info available.
      #17 0x00007ff1ab8b9ea5 in start_thread (arg=0x7ff1717fa700) at pthread_create.c:307
              __res = <optimized out>
              pd = 0x7ff1717fa700
              now = <optimized out>
              unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140674968037120, -2338991119659410976, 0, 8392704, 0, 140674968037120, 2335355026998319584, 2335517656178907616}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
                    prev = 0x0, cleanup = 0x0, canceltype = 0}}}
              not_first_call = <optimized out>
              pagesize_m1 = <optimized out>
              sp = <optimized out>
              freesize = <optimized out>
      #18 0x00007ff1ab5e28dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

      other core dump 74a76e88-5cdb-4946-095eaf94-a5e70a07.dmp is similar to 
      https://issues.couchbase.com/browse/MB-41235

      Attachments

        For Gerrit Dashboard: MB-41300
        # Subject Branch Project Status CR V

        Activity

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            sumedh.basarkod Sumedh Basarkod (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty