Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41089

Ephemeral HTCleaner may remove in-flight SyncWrite from the HashTable

    XMLWordPrintable

Details

    Description

      Node 172.23.107.63 - Core dump seen: 1
      172.23.107.63 : Stack Trace of first crash: c1d8239e-ad57-4f6d-74ec5b87-4acca00c.dmp
      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
       #0  0x00007efc2c8a8207 in raise () from /lib64/libc.so.6
       #0  0x00007efc2c8a8207 in raise () from /lib64/libc.so.6
       #1  0x00007efc2c8a98f8 in abort () from /lib64/libc.so.6
       #2  0x00007efc2d405195 in __gnu_cxx::__verbose_terminate_handler() () from /opt/couchbase/bin/../lib/libstdc++.so.6
       #3  0x000000000054ebb2 in backtrace_terminate_handler() ()
       #4  0x00007efc2d402f86 in __cxxabiv1::__terminate(void (*)()) () from /opt/couchbase/bin/../lib/libstdc++.so.6
       #5  0x00007efc2d402fd1 in std::terminate() () from /opt/couchbase/bin/../lib/libstdc++.so.6
       #6  0x00007efc2d403213 in __cxa_throw () from /opt/couchbase/bin/../lib/libstdc++.so.6
       #7  0x00007efc30c0cc60 in void ActiveDurabilityMonitor::throwException<std::logic_error>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /opt/couchbase/bin/../lib/libep.so
       #8  0x00007efc30c07546 in ActiveDurabilityMonitor::commit(DurabilityMonitor::ActiveSyncWrite const&) () from /opt/couchbase/bin/../lib/libep.so
       #9  0x00007efc30c07683 in ActiveDurabilityMonitor::processCompletedSyncWriteQueue() () from /opt/couchbase/bin/../lib/libep.so
       #10 0x00007efc30cd88c0 in VBucket::processResolvedSyncWrites() () from /opt/couchbase/bin/../lib/libep.so
       #11 0x00007efc30c0e68d in DurabilityCompletionTask::run() () from /opt/couchbase/bin/../lib/libep.so
       #12 0x00007efc30c73583 in GlobalTask::execute() () from /opt/couchbase/bin/../lib/libep.so
       #13 0x00007efc30b867df in CB3ExecutorThread::run() () from /opt/couchbase/bin/../lib/libep.so
       #14 0x00007efc2f609777 in platform_thread_wrap(void*) () from /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0
       #15 0x00007efc2cc46dd5 in start_thread () from /lib64/libpthread.so.0
       #16 0x00007efc2c96fead in clone () from /lib64/libc.so.6
      

      Memcached logs on .63:

      [root@sa1601-cnt76 ~]# grep CRITICAL /opt/couchbase/var/lib/couchbase/logs/memcached.log.00000*
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.039657-07:00 CRITICAL *** Fatal error encountered during exception handling ***
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.039712-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ActiveDurabilityMonitor::commit vb:672 failed with status:1
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191278-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-2908). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/c1d8239e-ad57-4f6d-74ec5b87-4acca00c.dmp before terminating.
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191292-07:00 CRITICAL Stack backtrace of crashed thread:
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191470-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x13e11d]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191482-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x1541ea]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191491-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x154528]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191500-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7efc2cc3f000+0xf5d0]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191523-07:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7efc2c872000+0x36207]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191544-07:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7efc2c872000+0x378f8]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191575-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7efc2d374000+0x91195]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191585-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x14ebb2]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191601-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7efc2d374000+0x8ef86]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191616-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7efc2d374000+0x8efd1]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191632-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7efc2d374000+0x8f213]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191643-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7efc30b1c000+0xf0c60]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191652-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7efc30b1c000+0xeb546]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191658-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7efc30b1c000+0xeb683]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191668-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7efc30b1c000+0x1bc8c0]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191675-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7efc30b1c000+0xf268d]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191683-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7efc30b1c000+0x157583]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191690-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7efc30b1c000+0x6a7df]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191696-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7efc2f5f9000+0x10777]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191704-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7efc2cc3f000+0x7dd5]
      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000005.txt:2020-08-22T18:42:52.191736-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7efc2c872000+0xfdead]
      

      QE Test/Notes:

      num_items=5000,doc_size=512,transaction_timeout=30,nodes_init=2,replicas=1,bucket_size=100,GROUP=P0_Set1,durability=MAJORITY,get-cbcollect-info=False,infra_log_level=critical,log_level=error,upgrade_version=7.0.0-2908 -t Atomicity.basic_ops.basic_ops.test_basic_commit,nodes_init=2,replicas=1,transaction_commit=True,op_type=create;update,bucket_type=ephemeral,num_buckets=2,default_bucket=False,GROUP=P0_Set1
      

      Jenkin Job: centos-p0-transaction-vset00-00-multi_bucket_durability_majority_6.5_P0_Set1
      http://qa.sc.couchbase.com/job/test_suite_executor-TAF/52139/consoleFull

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            owend Daniel Owen added a comment -

            Hi Ritesh Agarwal,
            Thanks for raising the bug, and including the stack trace.
            Quick question... is this a regression and if so do you have the last build the test successfully passed on?

            thanks

            owend Daniel Owen added a comment - Hi Ritesh Agarwal , Thanks for raising the bug, and including the stack trace. Quick question... is this a regression and if so do you have the last build the test successfully passed on? thanks

            Hi Daniel Owen: Last weekly build was 7.0.0-2840 on which this test was passing.

            ritesh.agarwal Ritesh Agarwal added a comment - Hi Daniel Owen : Last weekly build was 7.0.0-2840 on which this test was passing.
            owend Daniel Owen added a comment -

            great thanks Ritesh Agarwal

            owend Daniel Owen added a comment - great thanks Ritesh Agarwal
            ritesh.agarwal Ritesh Agarwal added a comment - - edited

            Daniel Owen: Scanned all the logs we had on machines and found this :

            /data/workspace/centos-p0-transaction-vset00-00-concurrency_durability_majority_6.5_P0_Set2/logs/testrunner-20-Aug-08_15-44-36/test_1/collectinfo-2020-08-09T061227-ns_1@172.23.97.64.zip
            cbcollect_info_ns_1@172.23.97.64_20200809-061227/6a8e025f-72d9-48bb-dc8fa089-e8728174.dmp
            2020-08-08T15:46:44.172332-07:00 CRITICAL *** Fatal error encountered during exception handling ***
            2020-08-08T15:46:44.172381-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ActiveDurabilityMonitor::commit vb:828 failed with status:1
            2020-08-08T15:46:44.269613-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-2792). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/4430c65d-990a-4839-d8b935b4-06c1c24d.dmp before terminating.
            2020-08-08T15:46:44.269626-07:00 CRITICAL Stack backtrace of crashed thread:
            2020-08-08T15:46:44.269776-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x13d24d]
            2020-08-08T15:46:44.269786-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x15327a]
            2020-08-08T15:46:44.269792-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x1535b8]
            2020-08-08T15:46:44.269798-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f97aa01d000+0xf5d0]
            2020-08-08T15:46:44.269818-07:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7f97a9c50000+0x36207]
            2020-08-08T15:46:44.269834-07:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7f97a9c50000+0x378f8]
            2020-08-08T15:46:44.269863-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f97aa752000+0x91195]
            2020-08-08T15:46:44.269871-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x14dce2]
            2020-08-08T15:46:44.269886-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f97aa752000+0x8ef86]
            2020-08-08T15:46:44.269900-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f97aa752000+0x8efd1]
            2020-08-08T15:46:44.269912-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f97aa752000+0x8f213]
            2020-08-08T15:46:44.269920-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xfb580]
            2020-08-08T15:46:44.269925-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xf5e42]
            2020-08-08T15:46:44.269929-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xf5fa3]
            2020-08-08T15:46:44.269934-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0x1bd220]
            2020-08-08T15:46:44.269939-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xfcf8d]
            2020-08-08T15:46:44.269944-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0x161983]
            2020-08-08T15:46:44.269949-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0x7386f]
            2020-08-08T15:46:44.269953-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f97ac9d7000+0x10777]
            2020-08-08T15:46:44.269958-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f97aa01d000+0x7dd5]
            2020-08-08T15:46:44.269983-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f97a9c50000+0xfdead]
            

            That means Couchbase version 7.0.0-2792 also has this issue.

            ritesh.agarwal Ritesh Agarwal added a comment - - edited Daniel Owen : Scanned all the logs we had on machines and found this : /data/workspace/centos-p0-transaction-vset00-00-concurrency_durability_majority_6.5_P0_Set2/logs/testrunner-20-Aug-08_15-44-36/test_1/collectinfo-2020-08-09T061227-ns_1@172.23.97.64.zip cbcollect_info_ns_1@172.23.97.64_20200809-061227/6a8e025f-72d9-48bb-dc8fa089-e8728174.dmp 2020-08-08T15:46:44.172332-07:00 CRITICAL *** Fatal error encountered during exception handling *** 2020-08-08T15:46:44.172381-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ActiveDurabilityMonitor::commit vb:828 failed with status:1 2020-08-08T15:46:44.269613-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-2792). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/4430c65d-990a-4839-d8b935b4-06c1c24d.dmp before terminating. 2020-08-08T15:46:44.269626-07:00 CRITICAL Stack backtrace of crashed thread: 2020-08-08T15:46:44.269776-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x13d24d] 2020-08-08T15:46:44.269786-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x15327a] 2020-08-08T15:46:44.269792-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x1535b8] 2020-08-08T15:46:44.269798-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f97aa01d000+0xf5d0] 2020-08-08T15:46:44.269818-07:00 CRITICAL /lib64/libc.so.6(gsignal+0x37) [0x7f97a9c50000+0x36207] 2020-08-08T15:46:44.269834-07:00 CRITICAL /lib64/libc.so.6(abort+0x148) [0x7f97a9c50000+0x378f8] 2020-08-08T15:46:44.269863-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f97aa752000+0x91195] 2020-08-08T15:46:44.269871-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x14dce2] 2020-08-08T15:46:44.269886-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f97aa752000+0x8ef86] 2020-08-08T15:46:44.269900-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f97aa752000+0x8efd1] 2020-08-08T15:46:44.269912-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f97aa752000+0x8f213] 2020-08-08T15:46:44.269920-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xfb580] 2020-08-08T15:46:44.269925-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xf5e42] 2020-08-08T15:46:44.269929-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xf5fa3] 2020-08-08T15:46:44.269934-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0x1bd220] 2020-08-08T15:46:44.269939-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0xfcf8d] 2020-08-08T15:46:44.269944-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0x161983] 2020-08-08T15:46:44.269949-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f97adec3000+0x7386f] 2020-08-08T15:46:44.269953-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f97ac9d7000+0x10777] 2020-08-08T15:46:44.269958-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f97aa01d000+0x7dd5] 2020-08-08T15:46:44.269983-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f97a9c50000+0xfdead] That means Couchbase version 7.0.0-2792 also has this issue.
            owend Daniel Owen added a comment -

            Many thanks Ritesh Agarwal for checking it out for us. Good information, as it looks like its an intermittent issue.

            owend Daniel Owen added a comment - Many thanks Ritesh Agarwal for checking it out for us. Good information, as it looks like its an intermittent issue.

            Couchbase version 7.0.0-2741 also have it

            ritesh.agarwal Ritesh Agarwal added a comment - Couchbase version 7.0.0-2741 also have it
            ritesh.agarwal Ritesh Agarwal added a comment - - edited

            Seeing it on 3425 build:
            172.23.121.131 : Stack Trace of first crash: a9520cb7-3ed2-4bd4-a07da09e-a846ad0e.dmp

            Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
             #0  0x00007f32387e2387 in raise () from /lib64/libc.so.6
             #0  0x00007f32387e2387 in raise () from /lib64/libc.so.6
             #1  0x00007f32387e3a78 in abort () from /lib64/libc.so.6
             #2  0x00007f3239340195 in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc:95
             #3  0x00000000005a6dd2 in backtrace_terminate_handler () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:86
             #4  0x00007f323933df86 in __cxxabiv1::__terminate (handler=<optimized out>) at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:47
             #5  0x00007f323933dfd1 in std::terminate () at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc:57
             #6  0x00007f323933e213 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7f3164003180, tinfo=0x86fd50 <typeinfo for std::logic_error>, dest=0x428760 <_ZNSt11logic_errorD1Ev@plt>) at /tmp/deploy/gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc:93
             #7  0x00007f323c9a90f0 in ActiveDurabilityMonitor::throwException<std::logic_error> (this=this@entry=0x7f31d4184d20, thrower=..., error=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc:1552
             #8  0x00007f323c9a41de in ActiveDurabilityMonitor::commit (this=this@entry=0x7f31d4184d20, sw=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc:850
             #9  0x00007f323c9a4323 in ActiveDurabilityMonitor::processCompletedSyncWriteQueue (this=0x7f31d4184d20) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc:447
             #10 0x00007f323ca82183 in VBucket::processResolvedSyncWrites (this=0x7f31d2e12600) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/vbucket.cc:709
             #11 0x00007f323c9abd5d in DurabilityCompletionTask::run (this=0x7f31d3705050) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/durability_completion_task.cc:53
             #12 0x00007f323ca1b5c3 in GlobalTask::execute (this=0x7f31d3705050) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc:73
             #13 0x00007f323c91d25f in CB3ExecutorThread::run (this=0x7f323737c420) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/cb3_executorthread.cc:174
             #14 0x00007f323b342947 in run (this=0x7f322184c550) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
             #15 platform_thread_wrap (arg=0x7f322184c550) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
             #16 0x00007f3238b81ea5 in start_thread () from /lib64/libpthread.so.0
             #17 0x00007f32388aa8dd in clone () from /lib64/libc.so.6
            

            QE Test:

            guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/test_job_magma.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t volumetests.Magma.volume.SystemTestMagma,nodes_init=17,replicas=1,skip_cleanup=True,num_items=500000000,num_buckets=1,bucket_names=GleamBook,doc_size=256,bucket_type=membase,compression_mode=off,iterations=1,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=info,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=True,randomize_value=True,assert_crashes_on_load=True,maxttl=60,num_collections=20,doc_ops=expiry,durability=Majority,fragmentation=20,process_concurrency=2,crashes=10 -m rest'
            

            Logs:
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.120.170.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.115.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.116.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.123.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.124.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.126.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.127.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.128.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.129.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.130.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.131.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.132.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.133.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.134.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.135.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.136.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.139.zip
            s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.48.zip
            CC: Dave Rigby, Ritam Sharma

            ritesh.agarwal Ritesh Agarwal added a comment - - edited Seeing it on 3425 build: 172.23.121.131 : Stack Trace of first crash: a9520cb7-3ed2-4bd4-a07da09e-a846ad0e.dmp Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'. # 0 0x00007f32387e2387 in raise () from /lib64/libc.so. 6 # 0 0x00007f32387e2387 in raise () from /lib64/libc.so. 6 # 1 0x00007f32387e3a78 in abort () from /lib64/libc.so. 6 # 2 0x00007f3239340195 in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/gcc- 7.3 . 0 /libstdc++-v3/libsupc++/vterminate.cc: 95 # 3 0x00000000005a6dd2 in backtrace_terminate_handler () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc: 86 # 4 0x00007f323933df86 in __cxxabiv1::__terminate (handler=<optimized out>) at /tmp/deploy/gcc- 7.3 . 0 /libstdc++-v3/libsupc++/eh_terminate.cc: 47 # 5 0x00007f323933dfd1 in std::terminate () at /tmp/deploy/gcc- 7.3 . 0 /libstdc++-v3/libsupc++/eh_terminate.cc: 57 # 6 0x00007f323933e213 in __cxxabiv1::__cxa_throw (obj=obj @entry = 0x7f3164003180 , tinfo= 0x86fd50 <typeinfo for std::logic_error>, dest= 0x428760 <_ZNSt11logic_errorD1Ev @plt >) at /tmp/deploy/gcc- 7.3 . 0 /libstdc++-v3/libsupc++/eh_throw.cc: 93 # 7 0x00007f323c9a90f0 in ActiveDurabilityMonitor::throwException<std::logic_error> ( this = this @entry = 0x7f31d4184d20 , thrower=..., error=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc: 1552 # 8 0x00007f323c9a41de in ActiveDurabilityMonitor::commit ( this = this @entry = 0x7f31d4184d20 , sw=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc: 850 # 9 0x00007f323c9a4323 in ActiveDurabilityMonitor::processCompletedSyncWriteQueue ( this = 0x7f31d4184d20 ) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc: 447 # 10 0x00007f323ca82183 in VBucket::processResolvedSyncWrites ( this = 0x7f31d2e12600 ) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/vbucket.cc: 709 # 11 0x00007f323c9abd5d in DurabilityCompletionTask::run ( this = 0x7f31d3705050 ) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/durability_completion_task.cc: 53 # 12 0x00007f323ca1b5c3 in GlobalTask::execute ( this = 0x7f31d3705050 ) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc: 73 # 13 0x00007f323c91d25f in CB3ExecutorThread::run ( this = 0x7f323737c420 ) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/cb3_executorthread.cc: 174 # 14 0x00007f323b342947 in run ( this = 0x7f322184c550 ) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc: 58 # 15 platform_thread_wrap (arg= 0x7f322184c550 ) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc: 71 # 16 0x00007f3238b81ea5 in start_thread () from /lib64/libpthread.so. 0 # 17 0x00007f32388aa8dd in clone () from /lib64/libc.so. 6 QE Test: guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/test_job_magma.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t volumetests.Magma.volume.SystemTestMagma,nodes_init=17,replicas=1,skip_cleanup=True,num_items=500000000,num_buckets=1,bucket_names=GleamBook,doc_size=256,bucket_type=membase,compression_mode=off,iterations=1,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=info,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=True,randomize_value=True,assert_crashes_on_load=True,maxttl=60,num_collections=20,doc_ops=expiry,durability=Majority,fragmentation=20,process_concurrency=2,crashes=10 -m rest' Logs: s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.120.170.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.115.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.116.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.123.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.124.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.126.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.127.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.128.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.129.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.130.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.131.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.132.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.133.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.134.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.135.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.136.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.139.zip s3://cb-customers-secure/mb-41089/2020-10-15/collectinfo-2020-10-15t154851-ns_1@172.23.121.48.zip CC: Dave Rigby , Ritam Sharma
            drigby Dave Rigby added a comment -

            From the log file there's an error just before the crash:

            2020-10-15T08:39:37.262358-07:00 ERROR (GleamBook) VBucket::commit (vb:609) failed as no HashTable item found with key:<ud>cid:0x11:random-00000133019</ud> prepare_seqno:11474, commit_seqno:none
            2020-10-15T08:39:37.262535-07:00 CRITICAL *** Fatal error encountered during exception handling ***
            2020-10-15T08:39:37.262591-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ActiveDurabilityMonitor::commit vb:609 failed with status:1
            2020-10-15T08:39:38.491223-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-3425). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/a9520cb7-3ed2-4bd4-a07da09e-a846ad0e.dmp before terminating.
            2020-10-15T08:39:38.491251-07:00 CRITICAL Stack backtrace of crashed thread:
            2020-10-15T08:39:38.491470-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x19747d]
            ...
            

            Ritesh Agarwal I would suggest also scanning for any messages at level "ERROR" as those are often also relevant (and can precede the actual crash).

            drigby Dave Rigby added a comment - From the log file there's an error just before the crash: 2020-10-15T08:39:37.262358-07:00 ERROR (GleamBook) VBucket::commit (vb:609) failed as no HashTable item found with key:<ud>cid:0x11:random-00000133019</ud> prepare_seqno:11474, commit_seqno:none 2020-10-15T08:39:37.262535-07:00 CRITICAL *** Fatal error encountered during exception handling *** 2020-10-15T08:39:37.262591-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ActiveDurabilityMonitor::commit vb:609 failed with status:1 2020-10-15T08:39:38.491223-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-3425). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/a9520cb7-3ed2-4bd4-a07da09e-a846ad0e.dmp before terminating. 2020-10-15T08:39:38.491251-07:00 CRITICAL Stack backtrace of crashed thread: 2020-10-15T08:39:38.491470-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x19747d] ... Ritesh Agarwal I would suggest also scanning for any messages at level "ERROR" as those are often also relevant (and can precede the actual crash).

            Not 100% sure, but looks like this MB could be related to MB-40286.

            richard.demellow Richard deMellow added a comment - Not 100% sure, but looks like this MB could be related to MB-40286 .
            paolo.cocchi Paolo Cocchi added a comment -

            Hi Ritesh Agarwal,
            looking at the first set of logs as it is smaller and thus more manageable when trying to reproduce.
            Could you check the test description for that please? That states "bucket_type=ephemeral", but logs show that both couchstore-0 and couchstore-1 are couchdb buckets, so not sure if that's really the command that produced the logs uploaded.
            If you could also provide the full command + ini file please, thanks.

            paolo.cocchi Paolo Cocchi added a comment - Hi Ritesh Agarwal , looking at the first set of logs as it is smaller and thus more manageable when trying to reproduce. Could you check the test description for that please? That states "bucket_type=ephemeral", but logs show that both couchstore-0 and couchstore-1 are couchdb buckets, so not sure if that's really the command that produced the logs uploaded. If you could also provide the full command + ini file please, thanks.
            paolo.cocchi Paolo Cocchi added a comment - - edited

            Confusion comes from that logs seem to catch multiple/unrelated tests apart from the one of interest here..
            Relevant couchstore-1 definition is:

            2020-08-22T18:41:48.621233-07:00 INFO 50 Create bucket [couchstore-1]
            2020-08-22T18:41:48.622142-07:00 INFO (couchstore-1) EPEngine::initialize: using configuration:"max_size=976224256;dbname=/data/couchstore-1;backend=couchdb;couch_bucket=couchstore-1;max_vbuckets=1024;alog_path=/data/couchstore-1/access.log;data_traffic_enabled=false;max_num_workers=3;uuid=599ffa630991645649a2c6fdfa0eada9;conflict_resolution_type=seqno;bucket_type=ephemeral;durability_min_level=none;magma_fragmentation_percentage=50;ephemeral_full_policy=fail_new_data;ephemeral_metadata_purge_age=86400;max_ttl=0;ht_locks=47;compression_mode=active;failpartialwarmup=false"
            

            paolo.cocchi Paolo Cocchi added a comment - - edited Confusion comes from that logs seem to catch multiple/unrelated tests apart from the one of interest here.. Relevant couchstore-1 definition is: 2020-08-22T18:41:48.621233-07:00 INFO 50 Create bucket [couchstore-1] 2020-08-22T18:41:48.622142-07:00 INFO (couchstore-1) EPEngine::initialize: using configuration:"max_size=976224256;dbname=/data/couchstore-1;backend=couchdb;couch_bucket=couchstore-1;max_vbuckets=1024;alog_path=/data/couchstore-1/access.log;data_traffic_enabled=false;max_num_workers=3;uuid=599ffa630991645649a2c6fdfa0eada9;conflict_resolution_type=seqno;bucket_type=ephemeral;durability_min_level=none;magma_fragmentation_percentage=50;ephemeral_full_policy=fail_new_data;ephemeral_metadata_purge_age=86400;max_ttl=0;ht_locks=47;compression_mode=active;failpartialwarmup=false"
            paolo.cocchi Paolo Cocchi added a comment -

            Same errors on both .59 and .63:

            2020-08-22T18:42:48.622298-07:00 ERROR 59: (couchstore-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.63->ns_1@172.23.107.59:couchstore-1 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":428,"opcode":"DCP_COMMIT","status":"Not found"}
            2020-08-22T18:42:48.622330-07:00 ERROR 59: (couchstore-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.63->ns_1@172.23.107.59:couchstore-1 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":236,"opcode":"DCP_COMMIT","status":"Not found"}
            2020-08-22T18:42:52.039480-07:00 ERROR (couchstore-1) VBucket::commit (vb:672) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-772</ud> prepare_seqno:1, commit_seqno:none
            2020-08-22T18:42:52.188605-07:00 ERROR 59: (couchstore-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.63->ns_1@172.23.107.59:couchstore-1 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":236,"opcode":"DCP_COMMIT","status":"Not found"}
            

            2020-08-22T18:42:48.620128-07:00 ERROR (couchstore-1) VBucket::commit (vb:937) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-771</ud> prepare_seqno:1, commit_seqno:2
            2020-08-22T18:42:48.620180-07:00 ERROR (couchstore-1) VBucket::commit (vb:745) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-3029</ud> prepare_seqno:1, commit_seqno:2
            2020-08-22T18:42:51.949153-07:00 ERROR (couchstore-1) VBucket::commit (vb:745) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-3029</ud> prepare_seqno:1, commit_seqno:2
            2020-08-22T18:42:51.986225-07:00 ERROR (couchstore-1) VBucket::commit (vb:937) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-771</ud> prepare_seqno:1, commit_seqno:2
            

            paolo.cocchi Paolo Cocchi added a comment - Same errors on both .59 and .63: 2020-08-22T18:42:48.622298-07:00 ERROR 59: (couchstore-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.63->ns_1@172.23.107.59:couchstore-1 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":428,"opcode":"DCP_COMMIT","status":"Not found"} 2020-08-22T18:42:48.622330-07:00 ERROR 59: (couchstore-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.63->ns_1@172.23.107.59:couchstore-1 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":236,"opcode":"DCP_COMMIT","status":"Not found"} 2020-08-22T18:42:52.039480-07:00 ERROR (couchstore-1) VBucket::commit (vb:672) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-772</ud> prepare_seqno:1, commit_seqno:none 2020-08-22T18:42:52.188605-07:00 ERROR 59: (couchstore-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.107.63->ns_1@172.23.107.59:couchstore-1 - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":236,"opcode":"DCP_COMMIT","status":"Not found"} 2020-08-22T18:42:48.620128-07:00 ERROR (couchstore-1) VBucket::commit (vb:937) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-771</ud> prepare_seqno:1, commit_seqno:2 2020-08-22T18:42:48.620180-07:00 ERROR (couchstore-1) VBucket::commit (vb:745) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-3029</ud> prepare_seqno:1, commit_seqno:2 2020-08-22T18:42:51.949153-07:00 ERROR (couchstore-1) VBucket::commit (vb:745) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-3029</ud> prepare_seqno:1, commit_seqno:2 2020-08-22T18:42:51.986225-07:00 ERROR (couchstore-1) VBucket::commit (vb:937) failed as no HashTable item found with key:<ud>cid:0x0:test_docs-771</ud> prepare_seqno:1, commit_seqno:2
            paolo.cocchi Paolo Cocchi added a comment -

            Ritesh Agarwal
            I've tried to make a command from what I see in description but it doesn't run:

            Paolos-MacBook-Pro:TAF paolococchi$ ./gradlew --refresh-dependencies testrunner -P jython=/Users/paolococchi/dev/jython2.7.2/bin/jython -P transaction_version=1.1.0 -P client_version=3.0.8 -P 'args=-i /Users/paolococchi/dev/testrunner_config/nodes.ini -t Atomicity.basic_ops.basic_ops.test_basic_commit,nodes_init=2,replicas=1,transaction_commit=True,op_type=create;update,bucket_type=ephemeral,num_buckets=2,default_bucket=False,num_items=5000,doc_size=512,transaction_timeout=30,bucket_size=100,GROUP=P0_Set1,durability=MAJORITY,get-cbcollect-info=False,infra_log_level=critical,log_level=error,upgrade_version=7.0.0-2908 -m rest'
            ..
            > Task :testrunner
            Traceback (most recent call last):
              File "testrunner.py", line 27, in <module>
                from sdk_client3 import SDKClient
              File "lib/sdk_client3.py", line 67, in <module>
                from couchbase_helper.durability_helper import DurabilityHelper
              File "lib/couchbase_helper/durability_helper.py", line 6, in <module>
                from BucketLib.BucketOperations import BucketHelper
              File "lib/BucketLib/BucketOperations.py", line 13, in <module>
                from BucketOperations_Rest import BucketHelper as bucketlib
              File "lib/BucketLib/BucketOperations_Rest.py", line 16, in <module>
                from Rest_Connection import RestConnection
              File "connections/Rest_Connection.py", line 18, in <module>
                import requests
            ImportError: No module named requests
             
            > Task :testrunner FAILED
            

            paolo.cocchi Paolo Cocchi added a comment - Ritesh Agarwal I've tried to make a command from what I see in description but it doesn't run: Paolos-MacBook-Pro:TAF paolococchi$ ./gradlew --refresh-dependencies testrunner -P jython=/Users/paolococchi/dev/jython2.7.2/bin/jython -P transaction_version=1.1.0 -P client_version=3.0.8 -P 'args=-i /Users/paolococchi/dev/testrunner_config/nodes.ini -t Atomicity.basic_ops.basic_ops.test_basic_commit,nodes_init=2,replicas=1,transaction_commit=True,op_type=create;update,bucket_type=ephemeral,num_buckets=2,default_bucket=False,num_items=5000,doc_size=512,transaction_timeout=30,bucket_size=100,GROUP=P0_Set1,durability=MAJORITY,get-cbcollect-info=False,infra_log_level=critical,log_level=error,upgrade_version=7.0.0-2908 -m rest' .. > Task :testrunner Traceback (most recent call last): File "testrunner.py", line 27, in <module> from sdk_client3 import SDKClient File "lib/sdk_client3.py", line 67, in <module> from couchbase_helper.durability_helper import DurabilityHelper File "lib/couchbase_helper/durability_helper.py", line 6, in <module> from BucketLib.BucketOperations import BucketHelper File "lib/BucketLib/BucketOperations.py", line 13, in <module> from BucketOperations_Rest import BucketHelper as bucketlib File "lib/BucketLib/BucketOperations_Rest.py", line 16, in <module> from Rest_Connection import RestConnection File "connections/Rest_Connection.py", line 18, in <module> import requests ImportError: No module named requests   > Task :testrunner FAILED
            richard.demellow Richard deMellow added a comment - - edited

            Paolo Cocchi think you might need to use the following command to install the python package

            jython -m pip install requests
             
            jython -m pip install <dependancy>
            

            richard.demellow Richard deMellow added a comment - - edited Paolo Cocchi think you might need to use the following command to install the python package jython -m pip install requests   jython -m pip install <dependancy>
            paolo.cocchi Paolo Cocchi added a comment -

            Thanks Richard deMellow, 'pip/pip3 install ..' didn't work, while 'jython -m pip install ..' did.

            Ritesh Agarwal
            Now failing with this, it sounds like the test is not at origin/master (note that I'm running the latest TAF checkout) ?:

            > Task :testrunner
            Global Test input params:
            {'cluster_name': 'nodes',
             'ini': '/Users/paolococchi/dev/testrunner_config/nodes.ini',
             'num_nodes': 2}
            Logs will be stored at /Users/paolococchi/dev/couchbaselabs/TAF/logs/testrunner-20-Oct-19_17-22-02/test_1
             
            ./testrunner -i /Users/paolococchi/dev/testrunner_config/nodes.ini  -t Atomicity.basic_ops.basic_ops.test_basic_commit,nodes_init=2,replicas=1,transaction_commit=True,op_type=create;update,bucket_type=ephemeral,num_buckets=2,default_bucket=False,num_items=5000,doc_size=512,transaction_timeout=30,bucket_size=100,GROUP=P0_Set1,durability=MAJORITY,get-cbcollect-info=False,infra_log_level=critical,log_level=error,upgrade_version=7.0.0-2908
             
            Test Input params:
            {'doc_size': '512', 'cluster_name': 'nodes', 'bucket_type': 'ephemeral', 'get-cbcollect-info': 'False', 'ini': '/Users/paolococchi/dev/testrunner_config/nodes.ini', 'replicas': '1', 'transaction_timeout': '30', 'num_buckets': '2', 'durability': 'MAJORITY', 'log_level': 'error', 'upgrade_version': '7.0.0-2908', 'default_bucket': 'False', 'num_nodes': 2, 'logs_folder': '/Users/paolococchi/dev/couchbaselabs/TAF/logs/testrunner-20-Oct-19_17-22-02/test_1', 'rerun': True, 'nodes_init': '2', 'GROUP': 'P0_Set1', 'num_items': '5000', 'transaction_commit': 'True', 'case_number': 1, 'op_type': 'create;update', 'bucket_size': '100', 'infra_log_level': 'critical'}
            WARNING: Failed to import concurrent.futures module
            Test Atomicity.basic_ops.basic_ops.test_basic_commit was not found: 'module' object has no attribute 'basic_ops'
            During the test, Remote Connections: 0, Disconnections: 0
            SDK Connections: 0, Disconnections: 0
            summary so far suite Atomicity.basic_ops.basic_ops , pass 0 , fail 1
            

            paolo.cocchi Paolo Cocchi added a comment - Thanks Richard deMellow , 'pip/pip3 install ..' didn't work, while 'jython -m pip install ..' did. Ritesh Agarwal Now failing with this, it sounds like the test is not at origin/master (note that I'm running the latest TAF checkout) ?: > Task :testrunner Global Test input params: {'cluster_name': 'nodes', 'ini': '/Users/paolococchi/dev/testrunner_config/nodes.ini', 'num_nodes': 2} Logs will be stored at /Users/paolococchi/dev/couchbaselabs/TAF/logs/testrunner-20-Oct-19_17-22-02/test_1   ./testrunner -i /Users/paolococchi/dev/testrunner_config/nodes.ini -t Atomicity.basic_ops.basic_ops.test_basic_commit,nodes_init=2,replicas=1,transaction_commit=True,op_type=create;update,bucket_type=ephemeral,num_buckets=2,default_bucket=False,num_items=5000,doc_size=512,transaction_timeout=30,bucket_size=100,GROUP=P0_Set1,durability=MAJORITY,get-cbcollect-info=False,infra_log_level=critical,log_level=error,upgrade_version=7.0.0-2908   Test Input params: {'doc_size': '512', 'cluster_name': 'nodes', 'bucket_type': 'ephemeral', 'get-cbcollect-info': 'False', 'ini': '/Users/paolococchi/dev/testrunner_config/nodes.ini', 'replicas': '1', 'transaction_timeout': '30', 'num_buckets': '2', 'durability': 'MAJORITY', 'log_level': 'error', 'upgrade_version': '7.0.0-2908', 'default_bucket': 'False', 'num_nodes': 2, 'logs_folder': '/Users/paolococchi/dev/couchbaselabs/TAF/logs/testrunner-20-Oct-19_17-22-02/test_1', 'rerun': True, 'nodes_init': '2', 'GROUP': 'P0_Set1', 'num_items': '5000', 'transaction_commit': 'True', 'case_number': 1, 'op_type': 'create;update', 'bucket_size': '100', 'infra_log_level': 'critical'} WARNING: Failed to import concurrent.futures module Test Atomicity.basic_ops.basic_ops.test_basic_commit was not found: 'module' object has no attribute 'basic_ops' During the test, Remote Connections: 0, Disconnections: 0 SDK Connections: 0, Disconnections: 0 summary so far suite Atomicity.basic_ops.basic_ops , pass 0 , fail 1
            ritesh.agarwal Ritesh Agarwal added a comment - - edited

            Hi Paolo Cocchi, you should be able to hit this issue through these steps:

            Step1: jython -m pip install requirements.txt

            Step2: Create an ini file:

            [global]
            username:root
            password:couchbase
            index_port:9102
            n1ql_port:8093
            data_path:/data
            cli:/Users/riteshagarwal/CB/source/build/kv_engine  #######Change it accordingly and remove this comment part.
             
            [membase]
            rest_username:Administrator
            rest_password:password
             
            [servers]
            1:_1
            2:_2
            3:_3
            4:_4
             
            [_1]
            ip:127.0.0.1
            port:9000
             
            [_2]
            ip:127.0.0.1
            port:9001
             
            [_3]
            ip:127.0.0.1
            port:9002
             
            [_4]
            ip:127.0.0.1
            port:9003
            

            Step 3: Run the test.

            guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/test_job_magma.ini -t magma.magma_crash_recovery.MagmaCrashTests.test_crash_during_ops,num_items=5000000,infra_log_level=debug,log_level=debug,rerun=False,skip_cleanup=true,doc_size=1024,randomize_value=False,nodes_init=20,num_crashes=20,sdk_timeout=60,bucket_storage=couchstore,replicas=1,vbuckets=1024,graceful=False,doc_ops=create:update:delete:expiry,wait_warmup=False,maxttl=10,get-cbcollect-info=True,multiplier=100,process_concurrency=5,durability=Majority -m rest'
            

            Provide the right jython binary path, ini path in the command above. Just now i tried the same test and it worked for me on cluster run.

            The test mentioned in this comment, I saw this issue on 3449 on a 20 node cluster: http://qa.sc.couchbase.com/job/test_job_magma/428/console
            You can search for `ActiveDurabilityMonitor`

            ritesh.agarwal Ritesh Agarwal added a comment - - edited Hi Paolo Cocchi , you should be able to hit this issue through these steps: Step1: jython -m pip install requirements.txt Step2: Create an ini file: [global] username:root password:couchbase index_port: 9102 n1ql_port: 8093 data_path:/data cli:/Users/riteshagarwal/CB/source/build/kv_engine #######Change it accordingly and remove this comment part. [membase] rest_username:Administrator rest_password:password [servers] 1 :_1 2 :_2 3 :_3 4 :_4 [_1] ip: 127.0 . 0.1 port: 9000 [_2] ip: 127.0 . 0.1 port: 9001 [_3] ip: 127.0 . 0.1 port: 9002 [_4] ip: 127.0 . 0.1 port: 9003 Step 3: Run the test. guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/test_job_magma.ini -t magma.magma_crash_recovery.MagmaCrashTests.test_crash_during_ops,num_items=5000000,infra_log_level=debug,log_level=debug,rerun=False,skip_cleanup=true,doc_size=1024,randomize_value=False,nodes_init=20,num_crashes=20,sdk_timeout=60,bucket_storage=couchstore,replicas=1,vbuckets=1024,graceful=False,doc_ops=create:update:delete:expiry,wait_warmup=False,maxttl=10,get-cbcollect-info=True,multiplier=100,process_concurrency=5,durability=Majority -m rest' Provide the right jython binary path, ini path in the command above. Just now i tried the same test and it worked for me on cluster run. The test mentioned in this comment, I saw this issue on 3449 on a 20 node cluster: http://qa.sc.couchbase.com/job/test_job_magma/428/console You can search for `ActiveDurabilityMonitor`
            paolo.cocchi Paolo Cocchi added a comment -

            Thanks Ritesh Agarwal.
            I have run locally the test provided in the previous comment but unable to reproduce the issue.
            Note that the test is quite heavy and long to run. As I've mentioned in my previous comments, I am particular interested in the very first test reported in description as the test is much lighter and logs show that we fail very quickly (prepare_seqno:1). Next step is to run again the lighter test.

            paolo.cocchi Paolo Cocchi added a comment - Thanks Ritesh Agarwal . I have run locally the test provided in the previous comment but unable to reproduce the issue. Note that the test is quite heavy and long to run. As I've mentioned in my previous comments, I am particular interested in the very first test reported in description as the test is much lighter and logs show that we fail very quickly (prepare_seqno:1). Next step is to run again the lighter test.
            paolo.cocchi Paolo Cocchi added a comment - - edited

            The "lighter" test runs and reproduces the issue locally, thanks Ritesh Agarwal for providing command line and details.

            paolo.cocchi Paolo Cocchi added a comment - - edited The "lighter" test runs and reproduces the issue locally, thanks Ritesh Agarwal for providing command line and details.
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - - edited

            Seeing similar crash on durability-replica update scenario on Ephemeral bucket.

            Build - 7.0.0-3583

            bucket_param_tests.bucket_param_update.BucketParamTest.test_replica_update,nodes_init=5,replicas=0,doc_ops=update;create;delete,bucket_type=ephemeral,durability=MAJORITY,sdk_timeout=60,num_items=500000

            Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
            #0 0x00007fc62873a5f7 in raise () from /lib64/libc.so.6
            #0 0x00007fc62873a5f7 in raise () from /lib64/libc.so.6
            #1 0x00007fc62873bce8 in abort () from /lib64/libc.so.6
            #2 0x00007fc62928c195 in __gnu_cxx::__verbose_terminate_handler() () from /opt/couchbase/bin/../lib/libstdc++.so.6
            #3 0x00000000005aa282 in backtrace_terminate_handler() ()
            #4 0x00007fc629289f86 in __cxxabiv1::__terminate(void (*)()) () from /opt/couchbase/bin/../lib/libstdc++.so.6
            #5 0x00007fc629289fd1 in std::terminate() () from /opt/couchbase/bin/../lib/libstdc++.so.6
            #6 0x00007fc62928a213 in __cxa_throw () from /opt/couchbase/bin/../lib/libstdc++.so.6
            #7 0x00007fc62c8f9230 in void ActiveDurabilityMonitor::throwException<std::logic_error>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /opt/couchbase/bin/../lib/libep.so
            #8 0x00007fc62c8f431e in ActiveDurabilityMonitor::commit(DurabilityMonitor::ActiveSyncWrite const&) () from /opt/couchbase/bin/../lib/libep.so
            #9 0x00007fc62c8f4463 in ActiveDurabilityMonitor::processCompletedSyncWriteQueue() () from /opt/couchbase/bin/../lib/libep.so
            #10 0x00007fc62c9d2943 in VBucket::processResolvedSyncWrites() () from /opt/couchbase/bin/../lib/libep.so
            #11 0x00007fc62c8fbe9d in DurabilityCompletionTask::run() () from /opt/couchbase/bin/../lib/libep.so
            #12 0x00007fc62c96bac3 in GlobalTask::execute() () from /opt/couchbase/bin/../lib/libep.so
            #13 0x00007fc62c86cf4f in CB3ExecutorThread::run() () from /opt/couchbase/bin/../lib/libep.so
            #14 0x00007fc62b28e947 in platform_thread_wrap(void*) () from /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0
            #15 0x00007fc628acddc5 in start_thread () from /lib64/libpthread.so.0
            #16 0x00007fc6287fb21d in clone () from /lib64/libc.so.6

            Jenkins job: centos-p0-durability-vset00-00-replica_update_6.5_ephemeral_P0_majority

            http://qa.sc.couchbase.com/job/test_suite_executor-TAF/67163

            Cbcollect_info http://qa.sc.couchbase.com/job/test_suite_executor-TAF/67163/artifact/job_logs/testrunner-20-Oct-31_04-28-11/test_2/

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - - edited Seeing similar crash on durability-replica update scenario on Ephemeral bucket. Build - 7.0.0-3583 bucket_param_tests.bucket_param_update.BucketParamTest.test_replica_update,nodes_init=5,replicas=0,doc_ops=update;create;delete,bucket_type=ephemeral,durability=MAJORITY,sdk_timeout=60,num_items=500000 Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'. #0 0x00007fc62873a5f7 in raise () from /lib64/libc.so.6 #0 0x00007fc62873a5f7 in raise () from /lib64/libc.so.6 #1 0x00007fc62873bce8 in abort () from /lib64/libc.so.6 #2 0x00007fc62928c195 in __gnu_cxx::__verbose_terminate_handler() () from /opt/couchbase/bin/../lib/libstdc++.so.6 #3 0x00000000005aa282 in backtrace_terminate_handler() () #4 0x00007fc629289f86 in __cxxabiv1::__terminate(void (*)()) () from /opt/couchbase/bin/../lib/libstdc++.so.6 #5 0x00007fc629289fd1 in std::terminate() () from /opt/couchbase/bin/../lib/libstdc++.so.6 #6 0x00007fc62928a213 in __cxa_throw () from /opt/couchbase/bin/../lib/libstdc++.so.6 #7 0x00007fc62c8f9230 in void ActiveDurabilityMonitor::throwException<std::logic_error>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /opt/couchbase/bin/../lib/libep.so #8 0x00007fc62c8f431e in ActiveDurabilityMonitor::commit(DurabilityMonitor::ActiveSyncWrite const&) () from /opt/couchbase/bin/../lib/libep.so #9 0x00007fc62c8f4463 in ActiveDurabilityMonitor::processCompletedSyncWriteQueue() () from /opt/couchbase/bin/../lib/libep.so #10 0x00007fc62c9d2943 in VBucket::processResolvedSyncWrites() () from /opt/couchbase/bin/../lib/libep.so #11 0x00007fc62c8fbe9d in DurabilityCompletionTask::run() () from /opt/couchbase/bin/../lib/libep.so #12 0x00007fc62c96bac3 in GlobalTask::execute() () from /opt/couchbase/bin/../lib/libep.so #13 0x00007fc62c86cf4f in CB3ExecutorThread::run() () from /opt/couchbase/bin/../lib/libep.so #14 0x00007fc62b28e947 in platform_thread_wrap(void*) () from /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0 #15 0x00007fc628acddc5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007fc6287fb21d in clone () from /lib64/libc.so.6 Jenkins job: centos-p0-durability-vset00-00-replica_update_6.5_ephemeral_P0_majority http://qa.sc.couchbase.com/job/test_suite_executor-TAF/67163 Cbcollect_info http://qa.sc.couchbase.com/job/test_suite_executor-TAF/67163/artifact/job_logs/testrunner-20-Oct-31_04-28-11/test_2/
            ritam.sharma Ritam Sharma added a comment -

            Daniel Owen - The bug also appears with simple test cases as well, on a simple replica update with ephemeral bucket for durability level = Majority. I looked at 6.5 must pass set of tests, and replica update tests were must pass. I would prefer to leave this defect as 70mustpass.

            ritam.sharma Ritam Sharma added a comment - Daniel Owen - The bug also appears with simple test cases as well, on a simple replica update with ephemeral bucket for durability level = Majority. I looked at 6.5 must pass set of tests, and replica update tests were must pass. I would prefer to leave this defect as 70mustpass.
            drigby Dave Rigby added a comment -

            Ritam Sharma

            The bug also appears with simple test cases as well, on a simple replica update with ephemeral bucket for durability level = Majority. I looked at 6.5 must pass set of tests, and replica update tests were must pass. I would prefer to leave this defect as 70mustpass.

            Could you raise a separate defect for this simplified test case (if you have the test command-line that would be great?)

            drigby Dave Rigby added a comment - Ritam Sharma The bug also appears with simple test cases as well, on a simple replica update with ephemeral bucket for durability level = Majority. I looked at 6.5 must pass set of tests, and replica update tests were must pass. I would prefer to leave this defect as 70mustpass. Could you raise a separate defect for this simplified test case (if you have the test command-line that would be great?)
            paolo.cocchi Paolo Cocchi added a comment - - edited

            Dave Rigby The simplified test is just a TX load on a 2-node cluster, 1 Ephe bucket, 1 replica. Crash/failure scenarios unrelated. I'm currently reproducing and debugging it on a simple cluster_run + Java SDK client program.
            Not yet to the root of the issue, but this is likely a longstanding issue on SyncDelete that we are seeing only now as from 6.6.0 TX uses SyncDelete heavily.
            We have already fixed something like this on persistent bucket (MB-40480). Similar only on the symptom though, as there we had a bug in the persistence-callback.

            paolo.cocchi Paolo Cocchi added a comment - - edited Dave Rigby The simplified test is just a TX load on a 2-node cluster, 1 Ephe bucket, 1 replica. Crash/failure scenarios unrelated. I'm currently reproducing and debugging it on a simple cluster_run + Java SDK client program. Not yet to the root of the issue, but this is likely a longstanding issue on SyncDelete that we are seeing only now as from 6.6.0 TX uses SyncDelete heavily. We have already fixed something like this on persistent bucket ( MB-40480 ). Similar only on the symptom though, as there we had a bug in the persistence-callback.
            paolo.cocchi Paolo Cocchi added a comment - - edited

            StoredValue with CommittedState::pending (StoredValue::committed 0x2) removed from the HashTable by the EphTombstoneHTCleaner.
            Note at the bottom that this is the SV of a TX staged doc (SyncDelete carrying the txn UserXattr).

            $ lldb -p $(ps aux | grep -v "grep" | grep "paolo.*bin/memcached.*n_0" | tr -s ' ' | cut -d ' ' -f2)
            (lldb) process attach --pid 19375
            ^[[A^[[AProcess 19375 stopped
            * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
                frame #0: 0x00007fff7241378e libsystem_kernel.dylib` kevent  + 10
            libsystem_kernel.dylib`kevent:
            ->  0x7fff7241378e <+10>: jae    0x7fff72413798            ; <+20>
                0x7fff72413790 <+12>: movq   %rax, %rdi
                0x7fff72413793 <+15>: jmp    0x7fff7240d457            ; cerror_nocancel
                0x7fff72413798 <+20>: retq   
            Target 0: (memcached) stopped.
             
            Executable module set to "/Users/paolococchi/dev/source/install/bin/memcached".
            Architecture set to: x86_64h-apple-macosx-.
            (lldb) br s -n HashTable::unlocked_release -c 'valueToRelease->committed == 0x2 || valueToRelease->committed == 0x3'
            Breakpoint 1: where = libep.dylib`HashTable::unlocked_release(HashTable::HashBucketLock const&, StoredValue*) + 30 at hash_table.cc:981:10, address = 0x00000001116ab52e
            (lldb) c
            Process 19375 resuming
            Process 19375 stopped
            * thread #60, name = 'mc:NonIO_0', stop reason = breakpoint 1.1
                frame #0: 0x00000001116ab52e libep.dylib` HashTable::unlocked_release(this=0x00000001199e9820, hbl=0x00007000046d67f0, valueToRelease=0x000000011a52bc40)  + 30 at hash_table.cc:981
               978 	StoredValue::UniquePtr HashTable::unlocked_release(
               979 	        const HashBucketLock& hbl,
               980 	        StoredValue* valueToRelease) {
            -> 981 	    if (!hbl.getHTLock()) {
               982 	        throw std::invalid_argument(
               983 	                "HashTable::unlocked_release_base: htLock not held");
               984 	    }
            Target 0: (memcached) stopped.
            (lldb) p valueToRelease
            (StoredValue *) $26 = 0x000000011a52bc40
            (lldb) p *valueToRelease
            (StoredValue) $27 = {
              value = {
                value = (raw = 1125904631569216)
              }
              chain_next_or_replacement = {
                __ptr_ = {
                  std::__1::__compressed_pair_elem<TaggedPtr<StoredValue>, 0, false> = {
                    __value_ = (raw = 4731530880)
                  }
                }
              }
              cas = 1604666825097019392
              bySeqno = {
                value = 1511
              }
              lock_expiry_or_delete_or_complete_time = (lock_expiry = 0, delete_or_complete_time = 0)
              exptime = 0
              flags = 0
              revSeqno = {
                counter = (__elems_ = unsigned char [6] @ 0x00007fa96353fb5c)
              }
              datatype = '\x04'
              bits = {
                data_ = {
                  __elems_ = {
                    [0] = 'K'
                  }
                }
              }
              deletionSource = '\0'
              committed = '\x02'
            }
            (lldb) bt
            * thread #60, name = 'mc:NonIO_0', stop reason = breakpoint 1.1
              * frame #0: 0x00000001116ab52e libep.dylib` HashTable::unlocked_release(this=0x00000001199e9820, hbl=0x00007000046d67f0, valueToRelease=0x000000011a52bc40)  + 30 at hash_table.cc:981
                frame #1: 0x0000000111629af1 libep.dylib` EphemeralVBucket::HTTombstonePurger::visit(this=0x000000011a4e7cc0, hbl=0x00007000046d67f0, v=0x000000011a52bc40)  + 209 at ephemeral_tombstone_purger.cc:59
                frame #2: 0x00000001116ae447 libep.dylib` HashTable::pauseResumeVisit(this=0x00000001199e9820, visitor=0x000000011a4e7cc0, start_pos=0x00007000046d68e0)  + 1111 at hash_table.cc:1222
                frame #3: 0x00000001117e0164 libep.dylib` PauseResumeVBAdapter::visit(this=0x0000000118f70820, vb=0x00000001199e9800)  + 212 at vb_visitors.cc:55
                frame #4: 0x0000000111718380 libep.dylib` KVBucket::pauseResumeVisit(this=0x0000000118f97000, visitor=0x0000000118f70820, start_pos=0x0000000118fa3a78)  + 176 at kv_bucket.cc:2318
                frame #5: 0x000000011162a329 libep.dylib` EphTombstoneHTCleaner::run(this=0x0000000118fa3a18)  + 665 at ephemeral_tombstone_purger.cc:118
                frame #6: 0x00000001116a4214 libep.dylib` GlobalTask::execute(this=0x0000000118fa3a18)  + 52 at globaltask.cc:73
                frame #7: 0x0000000111319835 libep.dylib` CB3ExecutorThread::run(this=0x00000001175a0f80)  + 1653 at cb3_executorthread.cc:174
                frame #8: 0x0000000111318e0d libep.dylib` launch_executor_thread(arg=0x00000001175a0f80)  + 29 at cb3_executorthread.cc:34
                frame #9: 0x0000000115b3c431 libplatform_so.0.1.0.dylib` CouchbaseThread::run(this=0x000000011757e470)  + 113 at cb_pthreads.cc:58
                frame #10: 0x0000000115b3b95e libplatform_so.0.1.0.dylib` platform_thread_wrap(arg=0x000000011757e470)  + 46 at cb_pthreads.cc:71
                frame #11: 0x00007fff724cc2eb libsystem_pthread.dylib` _pthread_body  + 126
                frame #12: 0x00007fff724cf249 libsystem_pthread.dylib` _pthread_start  + 66
                frame #13: 0x00007fff724cb40d libsystem_pthread.dylib` thread_start  + 13
            (lldb) memory read --count `valueToRelease->value.value->valueSize()` `&valueToRelease->value.value->data`        
            0x1199d9b49: 00 00 01 08 00 00 01 04 74 78 6e 00 7b 22 69 64  ........txn.{"id
            0x1199d9b59: 22 3a 7b 22 74 78 6e 22 3a 22 31 39 64 35 31 34  ":{"txn":"19d514
            0x1199d9b69: 36 62 2d 30 36 35 35 2d 34 39 32 32 2d 61 39 37  6b-0655-4922-a97
            0x1199d9b79: 31 2d 38 63 63 30 39 38 38 39 63 65 34 33 22 2c  1-8cc09889ce43",
            0x1199d9b89: 22 61 74 6d 70 74 22 3a 22 31 37 31 63 32 63 61  "atmpt":"171c2ca
            0x1199d9b99: 37 2d 66 62 39 65 2d 34 34 39 38 2d 61 30 31 39  7-fb9e-4498-a019
            0x1199d9ba9: 2d 33 32 39 30 34 30 34 33 36 63 39 61 22 7d 2c  -329040436c9a"},
            0x1199d9bb9: 22 61 74 72 22 3a 7b 22 69 64 22 3a 22 5f 74 78  "atr":{"id":"_tx
            0x1199d9bc9: 6e 3a 61 74 72 2d 31 38 34 2d 23 33 63 39 22 2c  n:atr-184-#3c9",
            0x1199d9bd9: 22 62 6b 74 22 3a 22 65 78 61 6d 70 6c 65 22 2c  "bkt":"example",
            0x1199d9be9: 22 73 63 70 22 3a 22 5f 64 65 66 61 75 6c 74 22  "scp":"_default"
            0x1199d9bf9: 2c 22 63 6f 6c 6c 22 3a 22 5f 64 65 66 61 75 6c  ,"coll":"_defaul
            0x1199d9c09: 74 22 7d 2c 22 6f 70 22 3a 7b 22 73 74 67 64 22  t"},"op":{"stgd"
            0x1199d9c19: 3a 7b 22 62 6f 64 79 5f 6b 22 3a 22 62 6f 64 79  :{"body_k":"body
            0x1199d9c29: 5f 76 22 7d 2c 22 74 79 70 65 22 3a 22 69 6e 73  _v"},"type":"ins
            0x1199d9c39: 65 72 74 22 2c 22 63 72 63 33 32 22 3a 22 30 78  ert","crc32":"0x
            0x1199d9c49: 30 30 30 30 30 30 30 30 22 7d 7d 00              00000000"}}.
            (lldb) p valueToRelease->isDeleted()
            (bool) $31 = true
            

            paolo.cocchi Paolo Cocchi added a comment - - edited StoredValue with CommittedState::pending (StoredValue::committed 0x2) removed from the HashTable by the EphTombstoneHTCleaner. Note at the bottom that this is the SV of a TX staged doc (SyncDelete carrying the txn UserXattr). $ lldb -p $(ps aux | grep -v "grep" | grep "paolo.*bin/memcached.*n_0" | tr -s ' ' | cut -d ' ' -f2) (lldb) process attach --pid 19375 ^[[A^[[AProcess 19375 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP frame #0: 0x00007fff7241378e libsystem_kernel.dylib` kevent + 10 libsystem_kernel.dylib`kevent: -> 0x7fff7241378e <+10>: jae 0x7fff72413798 ; <+20> 0x7fff72413790 <+12>: movq %rax, %rdi 0x7fff72413793 <+15>: jmp 0x7fff7240d457 ; cerror_nocancel 0x7fff72413798 <+20>: retq Target 0: (memcached) stopped.   Executable module set to "/Users/paolococchi/dev/source/install/bin/memcached". Architecture set to: x86_64h-apple-macosx-. (lldb) br s -n HashTable::unlocked_release -c 'valueToRelease->committed == 0x2 || valueToRelease->committed == 0x3' Breakpoint 1: where = libep.dylib`HashTable::unlocked_release(HashTable::HashBucketLock const&, StoredValue*) + 30 at hash_table.cc:981:10, address = 0x00000001116ab52e (lldb) c Process 19375 resuming Process 19375 stopped * thread #60, name = 'mc:NonIO_0', stop reason = breakpoint 1.1 frame #0: 0x00000001116ab52e libep.dylib` HashTable::unlocked_release(this=0x00000001199e9820, hbl=0x00007000046d67f0, valueToRelease=0x000000011a52bc40) + 30 at hash_table.cc:981 978 StoredValue::UniquePtr HashTable::unlocked_release( 979 const HashBucketLock& hbl, 980 StoredValue* valueToRelease) { -> 981 if (!hbl.getHTLock()) { 982 throw std::invalid_argument( 983 "HashTable::unlocked_release_base: htLock not held"); 984 } Target 0: (memcached) stopped. (lldb) p valueToRelease (StoredValue *) $26 = 0x000000011a52bc40 (lldb) p *valueToRelease (StoredValue) $27 = { value = { value = (raw = 1125904631569216) } chain_next_or_replacement = { __ptr_ = { std::__1::__compressed_pair_elem<TaggedPtr<StoredValue>, 0, false> = { __value_ = (raw = 4731530880) } } } cas = 1604666825097019392 bySeqno = { value = 1511 } lock_expiry_or_delete_or_complete_time = (lock_expiry = 0, delete_or_complete_time = 0) exptime = 0 flags = 0 revSeqno = { counter = (__elems_ = unsigned char [6] @ 0x00007fa96353fb5c) } datatype = '\x04' bits = { data_ = { __elems_ = { [0] = 'K' } } } deletionSource = '\0' committed = '\x02' } (lldb) bt * thread #60, name = 'mc:NonIO_0', stop reason = breakpoint 1.1 * frame #0: 0x00000001116ab52e libep.dylib` HashTable::unlocked_release(this=0x00000001199e9820, hbl=0x00007000046d67f0, valueToRelease=0x000000011a52bc40) + 30 at hash_table.cc:981 frame #1: 0x0000000111629af1 libep.dylib` EphemeralVBucket::HTTombstonePurger::visit(this=0x000000011a4e7cc0, hbl=0x00007000046d67f0, v=0x000000011a52bc40) + 209 at ephemeral_tombstone_purger.cc:59 frame #2: 0x00000001116ae447 libep.dylib` HashTable::pauseResumeVisit(this=0x00000001199e9820, visitor=0x000000011a4e7cc0, start_pos=0x00007000046d68e0) + 1111 at hash_table.cc:1222 frame #3: 0x00000001117e0164 libep.dylib` PauseResumeVBAdapter::visit(this=0x0000000118f70820, vb=0x00000001199e9800) + 212 at vb_visitors.cc:55 frame #4: 0x0000000111718380 libep.dylib` KVBucket::pauseResumeVisit(this=0x0000000118f97000, visitor=0x0000000118f70820, start_pos=0x0000000118fa3a78) + 176 at kv_bucket.cc:2318 frame #5: 0x000000011162a329 libep.dylib` EphTombstoneHTCleaner::run(this=0x0000000118fa3a18) + 665 at ephemeral_tombstone_purger.cc:118 frame #6: 0x00000001116a4214 libep.dylib` GlobalTask::execute(this=0x0000000118fa3a18) + 52 at globaltask.cc:73 frame #7: 0x0000000111319835 libep.dylib` CB3ExecutorThread::run(this=0x00000001175a0f80) + 1653 at cb3_executorthread.cc:174 frame #8: 0x0000000111318e0d libep.dylib` launch_executor_thread(arg=0x00000001175a0f80) + 29 at cb3_executorthread.cc:34 frame #9: 0x0000000115b3c431 libplatform_so.0.1.0.dylib` CouchbaseThread::run(this=0x000000011757e470) + 113 at cb_pthreads.cc:58 frame #10: 0x0000000115b3b95e libplatform_so.0.1.0.dylib` platform_thread_wrap(arg=0x000000011757e470) + 46 at cb_pthreads.cc:71 frame #11: 0x00007fff724cc2eb libsystem_pthread.dylib` _pthread_body + 126 frame #12: 0x00007fff724cf249 libsystem_pthread.dylib` _pthread_start + 66 frame #13: 0x00007fff724cb40d libsystem_pthread.dylib` thread_start + 13 (lldb) memory read --count `valueToRelease->value.value->valueSize()` `&valueToRelease->value.value->data` 0x1199d9b49: 00 00 01 08 00 00 01 04 74 78 6e 00 7b 22 69 64 ........txn.{"id 0x1199d9b59: 22 3a 7b 22 74 78 6e 22 3a 22 31 39 64 35 31 34 ":{"txn":"19d514 0x1199d9b69: 36 62 2d 30 36 35 35 2d 34 39 32 32 2d 61 39 37 6b-0655-4922-a97 0x1199d9b79: 31 2d 38 63 63 30 39 38 38 39 63 65 34 33 22 2c 1-8cc09889ce43", 0x1199d9b89: 22 61 74 6d 70 74 22 3a 22 31 37 31 63 32 63 61 "atmpt":"171c2ca 0x1199d9b99: 37 2d 66 62 39 65 2d 34 34 39 38 2d 61 30 31 39 7-fb9e-4498-a019 0x1199d9ba9: 2d 33 32 39 30 34 30 34 33 36 63 39 61 22 7d 2c -329040436c9a"}, 0x1199d9bb9: 22 61 74 72 22 3a 7b 22 69 64 22 3a 22 5f 74 78 "atr":{"id":"_tx 0x1199d9bc9: 6e 3a 61 74 72 2d 31 38 34 2d 23 33 63 39 22 2c n:atr-184-#3c9", 0x1199d9bd9: 22 62 6b 74 22 3a 22 65 78 61 6d 70 6c 65 22 2c "bkt":"example", 0x1199d9be9: 22 73 63 70 22 3a 22 5f 64 65 66 61 75 6c 74 22 "scp":"_default" 0x1199d9bf9: 2c 22 63 6f 6c 6c 22 3a 22 5f 64 65 66 61 75 6c ,"coll":"_defaul 0x1199d9c09: 74 22 7d 2c 22 6f 70 22 3a 7b 22 73 74 67 64 22 t"},"op":{"stgd" 0x1199d9c19: 3a 7b 22 62 6f 64 79 5f 6b 22 3a 22 62 6f 64 79 :{"body_k":"body 0x1199d9c29: 5f 76 22 7d 2c 22 74 79 70 65 22 3a 22 69 6e 73 _v"},"type":"ins 0x1199d9c39: 65 72 74 22 2c 22 63 72 63 33 32 22 3a 22 30 78 ert","crc32":"0x 0x1199d9c49: 30 30 30 30 30 30 30 30 22 7d 7d 00 00000000"}}. (lldb) p valueToRelease->isDeleted() (bool) $31 = true
            paolo.cocchi Paolo Cocchi added a comment - - edited

            http://review.couchbase.org/c/kv_engine/+/139724 fixes the bug in EphTombstoneHTCleaner but my local cluster_run crashes in other paths now, I'm carrying on on those.

            UPDATE
            New crashes are caused by a ThrowExceptionUnderflowPolicy. Tracked in MB-42568.
            All local tests for this MB complete successfully on top of http://review.couchbase.org/c/kv_engine/+/139724 by disabling dev-asserts.

            paolo.cocchi Paolo Cocchi added a comment - - edited http://review.couchbase.org/c/kv_engine/+/139724 fixes the bug in EphTombstoneHTCleaner but my local cluster_run crashes in other paths now, I'm carrying on on those. UPDATE New crashes are caused by a ThrowExceptionUnderflowPolicy. Tracked in MB-42568 . All local tests for this MB complete successfully on top of http://review.couchbase.org/c/kv_engine/+/139724 by disabling dev-asserts.
            paolo.cocchi Paolo Cocchi added a comment -

            Ashwin Govindarajulu I've reproduced the issue in 6.6.0 with the same simple test that I've mentioned in a previous comment (ie, TX load on a 2-node cluster, 1 Ephemeral bucket, 1 replica). FYI I'm loading 100k docs via TX in batches of 1000 items per TX (ie, 100 TXs of 1000 docs each) and always hitting the issue.

            paolo.cocchi Paolo Cocchi added a comment - Ashwin Govindarajulu I've reproduced the issue in 6.6.0 with the same simple test that I've mentioned in a previous comment (ie, TX load on a 2-node cluster, 1 Ephemeral bucket, 1 replica). FYI I'm loading 100k docs via TX in batches of 1000 items per TX (ie, 100 TXs of 1000 docs each) and always hitting the issue.

            Build couchbase-server-6.6.1-9180 contains kv_engine commit 67cd140 with commit message:
            MB-41089: HTTombstonePurger skips Pending stored-values

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9180 contains kv_engine commit 67cd140 with commit message: MB-41089 : HTTombstonePurger skips Pending stored-values
            paolo.cocchi Paolo Cocchi added a comment - - edited

            Daniel Owen Ashwin Govindarajulu Ritesh Agarwal Fix is in for 6.6.1.
            I'm keeping the MB open as it tracks the fix for CC too.

            paolo.cocchi Paolo Cocchi added a comment - - edited Daniel Owen Ashwin Govindarajulu Ritesh Agarwal Fix is in for 6.6.1. I'm keeping the MB open as it tracks the fix for CC too.
            paolo.cocchi Paolo Cocchi added a comment -

            Resolved on 6.6.1.
            ETA for the fix merged into CC: Wedn Nov the 11th.

            paolo.cocchi Paolo Cocchi added a comment - Resolved on 6.6.1. ETA for the fix merged into CC: Wedn Nov the 11th.

            Validated the fix using Enterprise Edition 6.6.1 build 9182.

            Closing this ticket.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Validated the fix using Enterprise Edition 6.6.1 build 9182. Closing this ticket.

            Build couchbase-server-7.0.0-3701 contains kv_engine commit 80cc532 with commit message:
            MB-41089: HTTombstonePurger skips Pending stored-values

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3701 contains kv_engine commit 80cc532 with commit message: MB-41089 : HTTombstonePurger skips Pending stored-values

            Build couchbase-server-7.0.0-3860 contains kv_engine commit 67cd140 with commit message:
            MB-41089: HTTombstonePurger skips Pending stored-values

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3860 contains kv_engine commit 67cd140 with commit message: MB-41089 : HTTombstonePurger skips Pending stored-values
            drigby Dave Rigby added a comment -

            Paolo Cocchi Please can you update the title of this MB to describe the actual bug. Additionally, I think this should be included in the 6.6.1 release notes, so can you also add an appropriate comment here - see https://hub.internal.couchbase.com/confluence/display/PM/Release+Notes+-+How+to for details.

            drigby Dave Rigby added a comment - Paolo Cocchi Please can you update the title of this MB to describe the actual bug. Additionally, I think this should be included in the 6.6.1 release notes, so can you also add an appropriate comment here - see https://hub.internal.couchbase.com/confluence/display/PM/Release+Notes+-+How+to for details.
            paolo.cocchi Paolo Cocchi added a comment - - edited

            Description for release notes
            Summary: The HTCleaner in Ephemeral is responsible for purging tombstones and also Completed (Committed / Aborted) SyncWrites. A bug in that component may lead to removing in-flight SyncWrites from internal data-structures, which would cause a crash on the node when/if it tries to complete the SyncWrite.

            paolo.cocchi Paolo Cocchi added a comment - - edited Description for release notes Summary: The HTCleaner in Ephemeral is responsible for purging tombstones and also Completed (Committed / Aborted) SyncWrites. A bug in that component may lead to removing in-flight SyncWrites from internal data-structures, which would cause a crash on the node when/if it tries to complete the SyncWrite.

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty