Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
1006.5.1-1039
-
Untriaged
-
Unknown
Description
1. Create a 2 node(220,221) cluster
2. Create a magma bucket
3. Load 1M items
4. Stop persistence on node 220. Load 100k items on nodeA so that they are replicated and presisted on node 221
5. Kill memcached on Node 220 which will trigger rollback on Node 221
6. Repeat steps 4,5 multiple times.
7. Magma crashed -> memcached crashed in a loop on node .221
/opt/couchbase/var/lib/couchbase/logs/memcached.log.000007.txt:2020-01-30T19:42:00.334178-08:00 CRITICAL WAL::Attempting to start new txn while already in txn
|
2020-01-30T19:51:54.202851-08:00 INFO 49117: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.105.220->ns_1@172.23.105.221:default - (vb:7) Received rollback request. Rolling back to seqno:0
|
2020-01-30T19:51:54.202937-08:00 CRITICAL *** Fatal error encountered during exception handling ***
|
2020-01-30T19:51:54.203005-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): kvstore- Seqno (5) should always increase monotonically between commit batches (1184)
|
2020-01-30T19:51:54.471100-08:00 CRITICAL Breakpad caught a crash (Couchbase version 1006.5.1-1039). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/3a324307-d0e4-62fc-322ba0fd-6832eaa4.dmp before terminating.
|
2020-01-30T19:51:54.471145-08:00 CRITICAL Stack backtrace of crashed thread:
|
2020-01-30T19:51:54.471364-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x1324cd]
|
2020-01-30T19:51:54.471376-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14a50e]
|
2020-01-30T19:51:54.471385-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14a824]
|
2020-01-30T19:51:54.471392-08:00 CRITICAL /lib64/libpthread.so.0() [0x7ff34787e000+0xf5d0]
|
2020-01-30T19:51:54.471417-08:00 CRITICAL /lib64/libc.so.6(gsignal+0x37) [0x7ff3474b1000+0x36207]
|
2020-01-30T19:51:54.471437-08:00 CRITICAL /lib64/libc.so.6(abort+0x148) [0x7ff3474b1000+0x378f8]
|
2020-01-30T19:51:54.471468-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7ff347fb3000+0x91195]
|
2020-01-30T19:51:54.471477-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x145e02]
|
2020-01-30T19:51:54.471493-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7ff347fb3000+0x8ef86]
|
2020-01-30T19:51:54.471508-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7ff347fb3000+0x8efd1]
|
2020-01-30T19:51:54.471522-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7ff347fb3000+0x8f213]
|
2020-01-30T19:51:54.471530-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/../lib/libmagma.so(_ZN5magma5Magma11CommitBatch11updateSeqnoEm+0x252) [0x7ff34250b000+0x6e102]
|
2020-01-30T19:51:54.471535-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/../lib/libmagma.so(_ZN5magma5Magma11CommitBatch3SetERKNS_5SliceES4_S4_PbS5_+0x101) [0x7ff34250b000+0x6e791]
|
2020-01-30T19:51:54.471546-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x1fe136]
|
2020-01-30T19:51:54.471547-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:594) Creating stream with start seqno 978 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
|
2020-01-30T19:51:54.471553-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x200f45]
|
2020-01-30T19:51:54.471555-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:594) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default" backfill:true, seqno:979
|
2020-01-30T19:51:54.471559-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:594) ActiveStream::transitionState: Transitioning from backfilling to in-memory
|
2020-01-30T19:51:54.471560-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0xdb9f2]
|
2020-01-30T19:51:54.471567-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0xe0513]
|
2020-01-30T19:51:54.471575-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x131ba0]
|
2020-01-30T19:51:54.471581-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x132a71]
|
2020-01-30T19:51:54.471587-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7ff342a8a000+0x12c614]
|
2020-01-30T19:51:54.471589-08:00 INFO 49118: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.221->ns_1@172.23.105.220:default - (vb:595) Creating stream with start seqno 988 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
|
2020-01-30T19:51:54.471591-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7ff349e5c000+0x8ee7]
|
Finally, WAL corruption:
2020-01-30T20:07:08.362788-08:00 CRITICAL WALIterator corruption error:Corruption: 24:42041344
|
2020-01-30T20:07:09.050033-08:00 INFO Magma recovered successfully with tailOffset:24:43114496 and flushOffset:24:43114496
|
2020-01-30T20:07:09.052035-08:00 INFO Magma::SetMemoryQuota quota:37329308 lastBFSize:530541 lastIndexSize:70632524
|
2020-01-30T20:07:09.052595-08:00 CRITICAL *** Fatal error encountered during exception handling ***
|
2020-01-30T20:07:09.052649-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): MagmaKVStore vbstate vbid:17 not found. Status:NotFound: MagmaKVStore::readLocalDoc(vbid:vb:17 key:_vbstate not found.
|
2020-01-30T20:07:09.139794-08:00 CRITICAL Breakpad caught a crash (Couchbase version 1006.5.1-1039). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/394c39cc-a20d-063d-203a826e-208fc08c.dmp before terminating.
|
2020-01-30T20:07:09.139827-08:00 CRITICAL Stack backtrace of crashed thread:
|
2020-01-30T20:07:09.139999-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x1324cd]
|
2020-01-30T20:07:09.140011-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14a50e]
|
2020-01-30T20:07:09.140020-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14a824]
|
2020-01-30T20:07:09.140029-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f93867d0000+0xf5d0]
|
2020-01-30T20:07:09.140052-08:00 CRITICAL /lib64/libc.so.6(gsignal+0x37) [0x7f9386403000+0x36207]
|
2020-01-30T20:07:09.140073-08:00 CRITICAL /lib64/libc.so.6(abort+0x148) [0x7f9386403000+0x378f8]
|
2020-01-30T20:07:09.140103-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f9386f05000+0x91195]
|
2020-01-30T20:07:09.140112-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x145e02]
|
2020-01-30T20:07:09.140129-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f9386f05000+0x8ef86]
|
2020-01-30T20:07:09.140143-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f9386f05000+0x8efd1]
|
2020-01-30T20:07:09.140157-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f9386f05000+0x8f213]
|
2020-01-30T20:07:09.140168-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1fb7fc]
|
2020-01-30T20:07:09.140176-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1466a7]
|
2020-01-30T20:07:09.140183-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x160b54]
|
2020-01-30T20:07:09.140190-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1a3f93]
|
2020-01-30T20:07:09.140196-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x1518ae]
|
2020-01-30T20:07:09.140202-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0xd9596]
|
2020-01-30T20:07:09.140208-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0xef189]
|
2020-01-30T20:07:09.140213-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f938188a000+0x10254f]
|
2020-01-30T20:07:09.140223-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x24118]
|
2020-01-30T20:07:09.140230-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x245d9]
|
2020-01-30T20:07:09.140236-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0(_ZN9Couchbase6Thread12thread_entryEv+0xf) [0x7f9388dae000+0x1477f]
|
2020-01-30T20:07:09.140240-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f9388dae000+0x8ee7]
|
2020-01-30T20:07:09.140246-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f93867d0000+0x7dd5]
|
2020-01-30T20:07:09.140276-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f9386403000+0xfdead]
|
Initial logs when first crash dump is observed:
[^collectinfo-2020-01-31T035045-ns_1@172.23.105.220.zip] [^collectinfo-2020-01-31T035406-ns_1@172.23.105.221.zip]
Logs collected just a little late as they are rolling over super fast as memcached is crashing in a loop:
https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-37355/collectinfo-2020-01-31T040838-ns_1%40172.23.105.220.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-37355/collectinfo-2020-01-31T040838-ns_1%40172.23.105.221.zip