Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47004

memcached crashed during eventing + kv rebalance.

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Cannot Reproduce
    • 6.6.3
    • 6.6.3
    • couchbase-bucket
    • 6.6.3-9756

    Description

      QE Test

      ./testrunner -i /tmp/testexec.17980.ini -p get-cbcollect-info=True,GROUP=failover_curl,java_sdk_client=True,get-cbcollect-info=False -t eventing.eventing_rebalance.EventingRebalance.test_kv_eventing_failover_and_kv_eventing_rebalance_simultaneously,doc-per-day=20,GROUP=failover_curl,services_in=kv,server_failed_over=1,reset_services=True,dataset=default,host=http://qa.sc.couchbase.com/,groups=simple,curl=True,java_sdk_client=True,handler_code=timer_op_curl_jenkins,services_init=kv-kv-eventing-eventing-eventing-index:n1ql,nodes_init=6,get-cbcollect-info=False,server_out=4
      

      Error seen in memcahced logs for .10

      172.23.105.10 : Found message in /opt/couchbase/var/lib/couchbase/logs/memcached.log.000014.txt
      2021-06-16T22:37:44.726381-07:00 CRITICAL Breakpad caught a crash (Couchbase version 6.6.3-9756). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/5ebda1fd-fd9d-8aa9-0edc78be-0fdd4f68.dmp before terminating.
      2021-06-16T22:37:44.726406-07:00 CRITICAL Stack backtrace of crashed thread:
      2021-06-16T22:37:44.726568-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x13735d]
      2021-06-16T22:37:44.726577-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14fe8e]
      2021-06-16T22:37:44.726583-07:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x1501a4]
      2021-06-16T22:37:44.726589-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fb6e0a86000+0xf5d0]
      2021-06-16T22:37:44.726596-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x1f30cc]
      2021-06-16T22:37:44.726601-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x1f3942]
      2021-06-16T22:37:44.726607-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x1f4a48]
      2021-06-16T22:37:44.726610-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x1f4fc8]
      2021-06-16T22:37:44.726615-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x14a6f9]
      2021-06-16T22:37:44.726620-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x1681b7]
      2021-06-16T22:37:44.726624-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x1afb73]
      2021-06-16T22:37:44.726628-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x15636e]
      2021-06-16T22:37:44.726632-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0xda846]
      2021-06-16T22:37:44.726636-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0xf0d19]
      2021-06-16T22:37:44.726640-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7fb6dbc9b000+0x1043b6]
      2021-06-16T22:37:44.726646-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x241d8]
      2021-06-16T22:37:44.726652-07:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x24699]
      2021-06-16T22:37:44.726656-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0(_ZN9Couchbase6Thread12thread_entryEv+0xf) [0x7fb6e3271000+0x147bf]
      2021-06-16T22:37:44.726659-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7fb6e3271000+0x8f17]
      2021-06-16T22:37:44.726664-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7fb6e0a86000+0x7dd5]
      

      Stack trace of the crash seen on .10:

      172.23.105.10 : Looking for crash dump files
      ['6.6.3-9756\n']
      Node 172.23.105.10 - Core dump seen: 1
      172.23.105.10 : Stack Trace of first crash: 5ebda1fd-fd9d-8aa9-0edc78be-0fdd4f68.dmp
      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
       #0  0x00007fb6dbe8e0cc in CouchKVStore::populateFileNameMap(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<Vbid, std::allocator<Vbid> >*) () from /opt/couchbase/lib/ep.so
       #0  0x00007fb6dbe8e0cc in CouchKVStore::populateFileNameMap(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<Vbid, std::allocator<Vbid> >*) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #1  0x00007fb6dbe8e942 in CouchKVStore::initialize() () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #2  0x00007fb6dbe8fa48 in CouchKVStore::CouchKVStore(KVStoreConfig&, FileOpsInterface&, bool, std::shared_ptr<folly::Synchronized<std::vector<AtomicMonotonic<unsigned long, ThrowExceptionPolicy, cb::greater>, std::allocator<AtomicMonotonic<unsigned long, ThrowExceptionPolicy, cb::greater> > >, folly::SharedMutexImpl<false, void, std::atomic, false, false> > >) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #3  0x00007fb6dbe8ffc8 in CouchKVStore::CouchKVStore(KVStoreConfig&) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #4  0x00007fb6dbde56f9 in KVStoreFactory::create(KVStoreConfig&) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #5  0x00007fb6dbe031b7 in KVShard::KVShard(unsigned short, unsigned short, Configuration&) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #6  0x00007fb6dbe4ab73 in VBucketMap::VBucketMap(Configuration&, KVBucket&) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #7  0x00007fb6dbdf136e in KVBucket::KVBucket(EventuallyPersistentEngine&) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #8  0x00007fb6dbd75846 in EPBucket::EPBucket(EventuallyPersistentEngine&) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #9  0x00007fb6dbd8bd19 in EventuallyPersistentEngine::makeBucket(Configuration&) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #10 0x00007fb6dbd9f3b6 in EventuallyPersistentEngine::initialize(char const*) () from /opt/couchbase/lib/ep.so
       No symbol table info available.
       #11 0x00000000004241d8 in CreateBucketThread::create() ()
       No symbol table info available.
       #12 0x0000000000424699 in CreateBucketThread::run() ()
       No symbol table info available.
       #13 0x00007fb6e32857bf in Couchbase::Thread::thread_entry() () from /opt/couchbase/lib/libplatform_so.so.0.1.0
       No symbol table info available.
       #14 0x00007fb6e3279f17 in platform_thread_wrap(void*) () from /opt/couchbase/lib/libplatform_so.so.0.1.0
       No symbol table info available.
       #15 0x00007fb6e0a8ddd5 in start_thread (arg=0x7fb6d63ff700) at pthread_create.c:307
               __res = <optimized out>
               pd = 0x7fb6d63ff700
               now = <optimized out>
               unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140423255291648, 3098079429588024574, 0, 8392704, 0, 140423255291648, -3057035211598856962, -3057003505832952578}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
               not_first_call = <optimized out>
               pagesize_m1 = <optimized out>
               sp = <optimized out>
               freesize = <optimized out>
       #16 0x00007fb6e07b6ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
       No locals.
      

      Jenkins Run details: http://qa.sc.couchbase.com/job/test_suite_executor/355498/consoleFull

      Test crashes as exception is not handled properly in case of failure and hence there are no logs collected.

      ======================================================================
      ERROR: test_kv_eventing_failover_and_kv_eventing_rebalance_simultaneously (eventing.eventing_rebalance.EventingRebalance)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "pytests/eventing/eventing_rebalance.py", line 1450, in test_kv_eventing_failover_and_kv_eventing_rebalance_simultaneously
          task.result()
        File "lib/tasks/future.py", line 160, in result
          return self.__get_result()
        File "lib/tasks/future.py", line 112, in __get_result
          raise self._exception
      MemcachedError: Memcached error #1 'Not found'
       
      ----------------------------------------------------------------------
      

      CC: Chanabasappa Ghali, Sujay Gad, Ritam Sharma

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          owend Daniel Owen added a comment -

          Thanks Ritesh Agarwal. Dave Rigby I will leave it how to proceed - but it looks like we might need to close as cannot-reproduce.

          owend Daniel Owen added a comment - Thanks Ritesh Agarwal . Dave Rigby I will leave it how to proceed - but it looks like we might need to close as cannot-reproduce.
          drigby Dave Rigby added a comment -

          I was thinking if are confident about the issue and the fix we can take it in and should run regression again to see nothing breaks else we can drop it. Thoughts?

          We don't really know what the cause of the issue is. From the limited analysis I managed to do from the information we do have, it looks like some kind of corruption to the logger; but I don't know what actually caused that.

          I think it's worth still trying to reproduce a few more times (I don't know how long the test takes) - given while it seems intermittent it's a serious problem if it occurs.

          drigby Dave Rigby added a comment - I was thinking if are confident about the issue and the fix we can take it in and should run regression again to see nothing breaks else we can drop it. Thoughts? We don't really know what the cause of the issue is. From the limited analysis I managed to do from the information we do have, it looks like some kind of corruption to the logger; but I don't know what actually caused that. I think it's worth still trying to reproduce a few more times (I don't know how long the test takes) - given while it seems intermittent it's a serious problem if it occurs.
          owend Daniel Owen added a comment -

          Assigning to QE to run a few (min 3?) more times - then if not seen please close as "cannot reproduce"

          thanks

          owend Daniel Owen added a comment - Assigning to QE to run a few (min 3?) more times - then if not seen please close as "cannot reproduce" thanks
          ritam.sharma Ritam Sharma added a comment -

          Sujay Gad and Ritesh Agarwal - Please close on this by running test in a loop and see if the issues is reproducible. Chanabasappa Ghali

          ritam.sharma Ritam Sharma added a comment - Sujay Gad and Ritesh Agarwal - Please close on this by running test in a loop and see if the issues is reproducible. Chanabasappa Ghali
          sujay.gad Sujay Gad added a comment -

          Issue not observed after running the test 5 times. Hence closing this issue.
          Test run - http://qa.sc.couchbase.com/job/dev_testbed_blr3/758/

          sujay.gad Sujay Gad added a comment - Issue not observed after running the test 5 times. Hence closing this issue. Test run - http://qa.sc.couchbase.com/job/dev_testbed_blr3/758/

          People

            ritesh.agarwal Ritesh Agarwal
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty