Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35192

KV-Engine failing expectation after SIGKILL & restart: lastCommittedSeqno <= highPreparedSeqno

    XMLWordPrintable

Details

    • Untriaged
    • No
    • KV-Engine MH Beta part 2

    Description

      One of our Jepsen tests sends a SIGKILL to memcached while performing PersistToMajority durable writes. When memcached comes back up, we start seeing errors in the log followed by a memcached crash.

      2019-07-19T04:19:11.218023-07:00 ERROR 50: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 172.28.128.139:33179 - 172.28.128.201:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f40a4a5bc10","ewouldblock":false,"packet":{"bodylen":8,"cas":0,"datatype":"raw","extlen":8,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":1,"opcode":"DCP_SEQNO_ACKNOWLEDGED","vbucket":48},"refcount":1}] - closing connection ([ 172.28.128.139:33179 - 172.28.128.201:11209 (<ud>@ns_server</ud>) ]): GSL: Postcondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc: 959
      ...
      2019-07-19T04:19:12.245213-07:00 ERROR 52: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 172.28.128.135:41259 - 172.28.128.201:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f40a4b9ec10","ewouldblock":false,"packet":{"bodylen":8,"cas":0,"datatype":"raw","extlen":8,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":10,"opcode":"DCP_SEQNO_ACKNOWLEDGED","vbucket":55},"refcount":1}] - closing connection ([ 172.28.128.135:41259 - 172.28.128.201:11209 (<ud>@ns_server</ud>) ]): GSL: Postcondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc: 959
      ...
      2019-07-19T04:19:12.249106-07:00 CRITICAL *** Fatal error encountered during exception handling ***
      2019-07-19T04:19:12.249171-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): GSL: Postcondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/durability/active_durability_monitor.cc: 959
      2019-07-19T04:19:12.421307-07:00 CRITICAL Breakpad caught a crash (Couchbase version 6.5.0-3814). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/650d46d7-5635-8322-15e09307-40ac6b65.dmp before terminating.
      2019-07-19T04:19:12.421354-07:00 CRITICAL Stack backtrace of crashed thread:
      2019-07-19T04:19:12.421583-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x133e60]
      2019-07-19T04:19:12.421627-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14b39e]
      2019-07-19T04:19:12.421665-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14b6b4]
      2019-07-19T04:19:12.421692-07:00 CRITICAL     /lib/x86_64-linux-gnu/libpthread.so.0() [0x7f40b14f2000+0x11390]
      2019-07-19T04:19:12.421739-07:00 CRITICAL     /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7f40b1128000+0x35428]
      2019-07-19T04:19:12.421797-07:00 CRITICAL     /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f40b1128000+0x3702a]
      2019-07-19T04:19:12.421877-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f40b1c2e000+0x90d25]
      2019-07-19T04:19:12.421917-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x146e7d]
      2019-07-19T04:19:12.422056-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f40b1c2e000+0x8eb16]
      2019-07-19T04:19:12.422127-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f40b1c2e000+0x8eb61]
      2019-07-19T04:19:12.422187-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f40b1c2e000+0x8eda3]
      2019-07-19T04:19:12.422229-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f40ac970000+0xdced8]
      2019-07-19T04:19:12.422265-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f40ac970000+0xdb877]
      2019-07-19T04:19:12.422307-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f40ac970000+0xdc372]
      2019-07-19T04:19:12.422344-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f40ac970000+0xdcb50]
      2019-07-19T04:19:12.422380-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f40ac970000+0x19416e]
      2019-07-19T04:19:12.422418-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f40ac970000+0x15a034]
      2019-07-19T04:19:12.422454-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f40ac970000+0x1353ef]
      2019-07-19T04:19:12.422478-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f40b3ac3000+0x8e77]
      2019-07-19T04:19:12.422491-07:00 CRITICAL     /lib/x86_64-linux-gnu/libpthread.so.0() [0x7f40b14f2000+0x76ba]
      2019-07-19T04:19:12.422549-07:00 CRITICAL     /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f40b1128000+0x10741d]
      2019-07-19T04:19:12.426015-07:00 INFO ---------- Closing logfile
      

      This appears to make some keys in those vBuckets remain stuck in pending state.

      Zip with full logs is at http://172.23.120.13:8080/files/Couchbase-jenkins-kv-engine-jepsen-nightly-185/20190719T041050.000-0700.zip (crashed node is 172.28.128.201).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sven.signer Sven Signer (Inactive)
            sven.signer Sven Signer (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty