Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37745

Magma crashed during rollback.

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      1. Create a 2 node(220,221) cluster
      2. Create a magma bucket
      3. Load 1M items
      4. Stop persistence on node 220. Load 100k items on nodeA so that they are replicated and presisted on node 221
      5. Kill memcached on Node 220 which will trigger rollback on Node 221
      6. Repeat steps 4,5 multiple times.
      7. Magma crashed -> memcached crashed in a loop on node .221

      /opt/couchbase/var/lib/couchbase/logs/memcached.log.000007.txt:2020-01-30T19:42:00.334178-08:00 CRITICAL WAL::Attempting to start new txn while already in txn
      

      2020-01-30T19:51:54.202851-08:00 INFO 49117: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.105.220->ns_1@172.23.105.221:default - (vb:7) Received rollback request. Rolling back to seqno:0
      2020-01-30T19:51:54.202937-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      2020-01-30T19:51:54.203005-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): kvstore- Seqno (5) should always increase monotonically between commit batches (1184)
      

      2020-01-30T19:51:54.471100-08:00 CRITICAL Breakpad caught a crash (Couchbase version 1006.5.1-1039). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/3a324307-d0e4-62fc-322ba0fd-6832eaa4.dmp before terminating.
      2020-01-30T19:51:54.471145-08:00 CRITICAL Stack backtrace of crashed thread:
      2020-01-30T19:51:54.471364-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x1324cd]
      2020-01-30T19:51:54.471376-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14a50e]
      2020-01-30T19:51:54.471385-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14a824]
      2020-01-30T19:51:54.471392-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7ff34787e000+0xf5d0]
      2020-01-30T19:51:54.471417-08:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7ff3474b1000+0x36207]
      2020-01-30T19:51:54.471437-08:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7ff3474b1000+0x378f8]
      2020-01-30T19:51:54.471468-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7ff347fb3000+0x91195]
      2020-01-30T19:51:54.471477-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x145e02]
      2020-01-30T19:51:54.471493-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7ff347fb3000+0x8ef86]
      2020-01-30T19:51:54.471508-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7ff347fb3000+0x8efd1]
      2020-01-30T19:51:54.471522-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7ff347fb3000+0x8f213]
      2020-01-30T19:51:54.471530-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/../lib/libmagma.so(_ZN5magma5Magma11CommitBatch11updateSeqnoEm+0x252) [0x7ff34250b000+0x6e102]
      2020-01-30T19:51:54.471535-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/../lib/libmagma.so(_ZN5magma5Magma11CommitBatch3SetERKNS_5SliceES4_S4_PbS5_+0x101) [0x7ff34250b000+0x6e791]
      2020-01-30T19:51:54.471546-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x1fe136]
      2020-01-30T19:51:54.471547-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:594) Creating stream with start seqno 978 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
      2020-01-30T19:51:54.471553-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x200f45]
      2020-01-30T19:51:54.471555-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:594) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default" backfill:true, seqno:979
      2020-01-30T19:51:54.471559-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:594) ActiveStream::transitionState: Transitioning from backfilling to in-memory
      2020-01-30T19:51:54.471560-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0xdb9f2]
      2020-01-30T19:51:54.471567-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0xe0513]
      2020-01-30T19:51:54.471575-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x131ba0]
      2020-01-30T19:51:54.471581-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x132a71]
      2020-01-30T19:51:54.471587-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x12c614]
      2020-01-30T19:51:54.471589-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:595) Creating stream with start seqno 988 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
      2020-01-30T19:51:54.471591-08:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7ff349e5c000+0x8ee7]
      

      Finally, WAL corruption:

      2020-01-30T20:07:08.362788-08:00 CRITICAL WALIterator corruption error:Corruption: 24:42041344
      2020-01-30T20:07:09.050033-08:00 INFO Magma recovered successfully with tailOffset:24:43114496 and flushOffset:24:43114496
      2020-01-30T20:07:09.052035-08:00 INFO Magma::SetMemoryQuota quota:37329308 lastBFSize:530541 lastIndexSize:70632524
      2020-01-30T20:07:09.052595-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      2020-01-30T20:07:09.052649-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): MagmaKVStore vbstate vbid:17 not found. Status:NotFound: MagmaKVStore::readLocalDoc(vbid:vb:17 key:_vbstate not found.
      2020-01-30T20:07:09.139794-08:00 CRITICAL Breakpad caught a crash (Couchbase version 1006.5.1-1039). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/394c39cc-a20d-063d-203a826e-208fc08c.dmp before terminating.
      2020-01-30T20:07:09.139827-08:00 CRITICAL Stack backtrace of crashed thread:
      2020-01-30T20:07:09.139999-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x1324cd]
      2020-01-30T20:07:09.140011-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14a50e]
      2020-01-30T20:07:09.140020-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14a824]
      2020-01-30T20:07:09.140029-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f93867d0000+0xf5d0]
      2020-01-30T20:07:09.140052-08:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7f9386403000+0x36207]
      2020-01-30T20:07:09.140073-08:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7f9386403000+0x378f8]
      2020-01-30T20:07:09.140103-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f9386f05000+0x91195]
      2020-01-30T20:07:09.140112-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x145e02]
      2020-01-30T20:07:09.140129-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f9386f05000+0x8ef86]
      2020-01-30T20:07:09.140143-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f9386f05000+0x8efd1]
      2020-01-30T20:07:09.140157-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f9386f05000+0x8f213]
      2020-01-30T20:07:09.140168-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1fb7fc]
      2020-01-30T20:07:09.140176-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1466a7]
      2020-01-30T20:07:09.140183-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x160b54]
      2020-01-30T20:07:09.140190-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1a3f93]
      2020-01-30T20:07:09.140196-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1518ae]
      2020-01-30T20:07:09.140202-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0xd9596]
      2020-01-30T20:07:09.140208-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0xef189]
      2020-01-30T20:07:09.140213-08:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x10254f]
      2020-01-30T20:07:09.140223-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x24118]
      2020-01-30T20:07:09.140230-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x245d9]
      2020-01-30T20:07:09.140236-08:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0(_ZN9Couchbase6Thread12thread_entryEv+0xf) [0x7f9388dae000+0x1477f]
      2020-01-30T20:07:09.140240-08:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f9388dae000+0x8ee7]
      2020-01-30T20:07:09.140246-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f93867d0000+0x7dd5]
      2020-01-30T20:07:09.140276-08:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f9386403000+0xfdead]
      

      Initial logs when first crash dump is observed:
      [^collectinfo-2020-01-31T035045-ns_1@172.23.105.220.zip] [^collectinfo-2020-01-31T035406-ns_1@172.23.105.221.zip]

      Logs collected just a little late as they are rolling over super fast as memcached is crashing in a loop:
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-37355/collectinfo-2020-01-31T040838-ns_1%40172.23.105.220.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-37355/collectinfo-2020-01-31T040838-ns_1%40172.23.105.221.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apaar.gupta Apaar Gupta
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty