Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6036

memcached core due to TapProducer::completeBGFetchJob

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 2.0-beta
    • 2.0-beta
    • couchbase-bucket
    • Security Level: Public
    • None
    • centos 6.2 64bit

    Description

      Install couchbase server 2.0.0-1492 on 12 nodes centos 6.2 64bit to test large and longevity cluster
      Load 72 million items to default bucket.
      Do rebalance in and out node cluster as following:
      Remove node 26 and 28
      Add back node 26 and 28, remove nodes 24 and 25 (swap rebalance)
      Reboot centos server on node 14
      Add node 24. Rebalance. After few minutes, stop rebalance, add node 25 and remove node 13. Then rebalance.
      Rebalance failed.
      After rebalance failed, I did not do anything, just load running.
      Then memcached on node 14 crashed, reopen bug 6020
      Then later, memcached on node 20 crashed as I attach stack trace in the following

      Link to diag https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/12-nodes-memcached-crashed-20120726.tgz

      Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
      Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
      Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
      Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libnss_files.so.2
      Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
      Program terminated with signal 6, Aborted.
      #0 0x0000003971e32885 in raise () from /lib64/libc.so.6

      Thread 13 (Thread 0x7f84dfbc0700 (LWP 9462)):
      #0 0x0000003971ee62c3 in epoll_wait () from /lib64/libc.so.6
      #1 0x00007f84e3451576 in epoll_dispatch (base=0x5324780, tv=<value optimized out>) at epoll.c:404
      #2 0x00007f84e343ce44 in event_base_loop (base=0x5324780, flags=<value optimized out>) at event.c:1558
      #3 0x00000000004144c4 in worker_libevent (arg=0xec08e0) at daemon/thread.c:301
      #4 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #5 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 12 (Thread 0x7f84e2fd1700 (LWP 9457)):
      #0 0x0000003971ed89cd in read () from /lib64/libc.so.6
      #1 0x0000003971e71128 in _IO_new_file_underflow () from /lib64/libc.so.6
      #2 0x0000003971e72c2e in _IO_default_uflow_internal () from /lib64/libc.so.6
      #3 0x0000003971e6e11b in getc () from /lib64/libc.so.6
      #4 0x00007f84e2fd2879 in check_stdin_thread (arg=0x403420) at extensions/daemon/stdin_check.c:19
      #5 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #6 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 11 (Thread 0x7f84e05c1700 (LWP 9461)):
      #0 0x0000003971ee62c3 in epoll_wait () from /lib64/libc.so.6
      #1 0x00007f84e3451576 in epoll_dispatch (base=0x5324a00, tv=<value optimized out>) at epoll.c:404
      #2 0x00007f84e343ce44 in event_base_loop (base=0x5324a00, flags=<value optimized out>) at event.c:1558
      #3 0x00000000004144c4 in worker_libevent (arg=0xec07e8) at daemon/thread.c:301
      #4 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #5 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 10 (Thread 0x7f84deb9d700 (LWP 9463)):
      #0 0x0000003971eab15d in nanosleep () from /lib64/libc.so.6
      #1 0x0000003971edf124 in usleep () from /lib64/libc.so.6
      #2 0x00007f84df063c72 in updateStatsThread (arg=0xebe4c0) at memory_tracker.cc:31
      #3 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #4 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 9 (Thread 0x7f84ddf83700 (LWP 9464)):
      #0 0x000000397260b75b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00007f84df0248d0 in wait (this=0x538a000, d=...) at syncobject.hh:47
      #2 IdleTask::run (this=0x538a000, d=...) at dispatcher.cc:341
      #3 0x00007f84df026f34 in Dispatcher::run (this=0x536ec40) at dispatcher.cc:169
      #4 0x00007f84df02774b in launch_dispatcher_thread (arg=0x536ec40) at dispatcher.cc:28
      #5 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #6 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 8 (Thread 0x7f84e23c4700 (LWP 9458)):
      #0 0x000000397260dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x0000003972609328 in _L_lock_854 () from /lib64/libpthread.so.0
      #2 0x00000039726091f7 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3 0x00007f84df064baa in Mutex::acquire (this=0x5372240) at mutex.cc:69
      #4 0x00007f84df082664 in lock (this=0x5372240, cookie=0x52cb340) at locks.hh:48
      #5 LockHolder (this=0x5372240, cookie=0x52cb340) at locks.hh:26
      #6 TapConnMap::newConsumer (this=0x5372240, cookie=0x52cb340) at tapconnmap.cc:220
      #7 0x00007f84df04f86d in EventuallyPersistentEngine::tapNotify (this=0x533ca00, cookie=0x52cb340, engine_specific=0x16ea95020, nengine=4, tap_flags=1, tap_eve
      nt=TAP_OPAQUE, tap_seqno=2, key=0x16ea95024, nkey=0, flags=0, exptime=0, cas=0, data=0x16ea95024, ndata=0, vbucket=0) at ep_engine.cc:1897
      #8 0x00007f84df050258 in EvpTapNotify (handle=0x533ca00, cookie=0x52cb340, engine_specific=0x16ea95020, nengine=4, ttl=254 '\376', tap_flags=1, tap_event=TAP_
      OPAQUE, tap_seqno=2, key=0x16ea95024, nkey=0, flags=0, exptime=0, cas=0, data=0x16ea95024, ndata=0, vbucket=0) at ep_engine.cc:1012
      #9 0x00007f84e23cb2c4 in bucket_tap_notify (handle=<value optimized out>, cookie=0x52cb340, engine_specific=0x16ea95020, nengine=4, ttl=254 '\376', tap_flags=
      <value optimized out>, tap_event=TAP_OPAQUE, tap_seqno=2, key=0x16ea95024, nkey=0, flags=0, exptime=0, cas=0, data=0x16ea95024, ndata=0, vbucket=0) at bucket_e
      ngine.c:1941
      #10 0x000000000040cb52 in process_bin_tap_packet (event=TAP_OPAQUE, c=0x52cb340) at daemon/memcached.c:3040
      #11 0x000000000041189b in process_bin_packet (c=0x52cb340) at daemon/memcached.c:3156
      #12 complete_nread_binary (c=0x52cb340) at daemon/memcached.c:3744
      #13 complete_nread (c=0x52cb340) at daemon/memcached.c:3826
      #14 conn_nread (c=0x52cb340) at daemon/memcached.c:5675
      #15 0x0000000000405e55 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x52cb340) at daemon/memcached.c:5938
      #16 0x00007f84e343cf3c in event_process_active_single_queue (base=0x5324500, flags=<value optimized out>) at event.c:1308
      #17 event_process_active (base=0x5324500, flags=<value optimized out>) at event.c:1375
      #18 event_base_loop (base=0x5324500, flags=<value optimized out>) at event.c:1572
      #19 0x00000000004144c4 in worker_libevent (arg=0xec0500) at daemon/thread.c:301
      #20 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #21 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 7 (Thread 0x7f84e19c3700 (LWP 9459)):
      #0 0x000000397260dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x0000003972609328 in _L_lock_854 () from /lib64/libpthread.so.0
      #2 0x00000039726091f7 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3 0x00007f84df064baa in Mutex::acquire (this=0x2054b0c50) at mutex.cc:69
      #4 0x00007f84df077ad2 in lock (this=0x2054b0c00, event=TAP_OPAQUE, vbucket=0) at locks.hh:48
      #5 LockHolder (this=0x2054b0c00, event=TAP_OPAQUE, vbucket=0) at locks.hh:26
      #6 TapProducer::requestAck (this=0x2054b0c00, event=TAP_OPAQUE, vbucket=0) at tapconnection.cc:562
      #7 0x00007f84df048373 in EventuallyPersistentEngine::walkTapQueue (this=0x533ca00, cookie=0x52a4dc0, itm=0x7f84e19c2cf8, es=0x7f84e19c2cf0, nes=0x7f84e19c2d0c
      , ttl=0x7f84e19c2d0f "\377", flags=0x7f84e19c2d0a, seqno=0x7f84e19c2d04, vbucket=0x7f84e19c2d08) at ep_engine.cc:1730
      #8 0x00007f84df04846d in EvpTapIterator (handle=<value optimized out>, cookie=0x52a4dc0, itm=0x7f84e19c2cf8, es=0x7f84e19c2cf0, nes=<value optimized out>, ttl
      =<value optimized out>, flags=0x7f84e19c2d0a, seqno=0x7f84e19c2d04, vbucket=0x7f84e19c2d08) at ep_engine.cc:1023
      #9 0x00007f84e23ca0b9 in bucket_tap_iterator_shim (handle=0x7f84e25d0480, cookie=0x52a4dc0, itm=0x7f84e19c2cf8, engine_specific=0x7f84e19c2cf0, nengine_specif
      ic=<value optimized out>, ttl=<value optimized out>, flags=0x7f84e19c2d0a, seqno=0x7f84e19c2d04, vbucket=0x7f84e19c2d08) at bucket_engine.c:1970
      #10 0x000000000040adc6 in ship_tap_log (c=0x52a4dc0) at daemon/memcached.c:2623
      #11 0x00000000004139d7 in conn_ship_log (c=0x52a4dc0) at daemon/memcached.c:5525
      #12 0x0000000000405e55 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x52a4dc0) at daemon/memcached.c:5938
      #13 0x00007f84e343cf3c in event_process_active_single_queue (base=0x5324280, flags=<value optimized out>) at event.c:1308
      #14 event_process_active (base=0x5324280, flags=<value optimized out>) at event.c:1375
      #15 event_base_loop (base=0x5324280, flags=<value optimized out>) at event.c:1572
      #16 0x00000000004144c4 in worker_libevent (arg=0xec05f8) at daemon/thread.c:301
      #17 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #18 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 6 (Thread 0x7f84db77f700 (LWP 9468)):
      #0 0x000000397260dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x0000003972610dbd in _L_cond_lock_886 () from /lib64/libpthread.so.0
      #2 0x0000003972610c97 in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
      #3 0x000000397260b815 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #4 0x00007f84df0443ff in wait (this=0x533ca00) at syncobject.hh:47
      #5 wait (this=0x533ca00) at syncobject.hh:63
      #6 wait (this=0x533ca00) at tapconnmap.hh:169
      #7 EventuallyPersistentEngine::notifyPendingConnections (this=0x533ca00) at ep_engine.cc:3445
      #8 0x00007f84df0444e3 in EvpNotifyPendingConns (arg=0x533ca00) at ep_engine.cc:1114
      #9 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #10 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 5 (Thread 0x7f84dd582700 (LWP 9465)):
      #0 0x000000397260b75b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00007f84df0248d0 in wait (this=0x538a100, d=...) at syncobject.hh:47
      #2 IdleTask::run (this=0x538a100, d=...) at dispatcher.cc:341
      #3 0x00007f84df026f34 in Dispatcher::run (this=0x536ea80) at dispatcher.cc:169
      #4 0x00007f84df02774b in launch_dispatcher_thread (arg=0x536ea80) at dispatcher.cc:28
      #5 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #6 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 4 (Thread 0x7f84e0fc2700 (LWP 9460)):
      #0 0x0000003971ee62c3 in epoll_wait () from /lib64/libc.so.6
      #1 0x00007f84e3451576 in epoll_dispatch (base=0x5324c80, tv=<value optimized out>) at epoll.c:404
      #2 0x00007f84e343ce44 in event_base_loop (base=0x5324c80, flags=<value optimized out>) at event.c:1558
      #3 0x00000000004144c4 in worker_libevent (arg=0xec06f0) at daemon/thread.c:301
      #4 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #5 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 3 (Thread 0x7f84dc180700 (LWP 9467)):
      #0 0x000000397260dff4 in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x0000003972609328 in _L_lock_854 () from /lib64/libpthread.so.0
      #2 0x00000039726091f7 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3 0x00007f84df064baa in Mutex::acquire (this=0x5372240) at mutex.cc:69
      #4 0x00007f84df0802c7 in lock (this=0x5372240, name="eq_tapq:replication_ns_1@10.3.121.16") at locks.hh:48
      #5 LockHolder (this=0x5372240, name="eq_tapq:replication_ns_1@10.3.121.16") at locks.hh:26
      #6 TapConnMap::checkConnectivity (this=0x5372240, name="eq_tapq:replication_ns_1@10.3.121.16") at tapconnmap.cc:322
      #7 0x00007f84df01584c in BackFillVisitor::checkValidity (this=0xa756b540) at backfill.cc:212
      #8 0x00007f84df06f5ca in HashTable::visit (this=0x5664808, visitor=...) at stored-value.cc:408
      #9 0x00007f84df02b693 in VBCBAdaptor::callback (this=0x1cfaf0000, d=..., t=std::tr1::shared_ptr (count 3) 0x1193a5b80) at ep.cc:2633
      #10 0x00007f84df02800f in Task::run (this=<value optimized out>, d=<value optimized out>, t=<value optimized out>) at dispatcher.hh:142
      #11 0x00007f84df026f34 in Dispatcher::run (this=0x536f6c0) at dispatcher.cc:169
      #12 0x00007f84df02774b in launch_dispatcher_thread (arg=0x536f6c0) at dispatcher.cc:28
      #13 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #14 0x0000003971ee5ccd in clone () from /lib64/libc.so.6

      Thread 2 (Thread 0x7f84e31d4720 (LWP 9448)):
      #0 0x0000003971ee62c3 in epoll_wait () from /lib64/libc.so.6
      #1 0x00007f84e3451576 in epoll_dispatch (base=0x5324000, tv=<value optimized out>) at epoll.c:404
      #2 0x00007f84e343ce44 in event_base_loop (base=0x5324000, flags=<value optimized out>) at event.c:1558
      #3 0x0000000000409746 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7920

      Thread 1 (Thread 0x7f84dcb81700 (LWP 9466)):
      #0 0x0000003971e32885 in raise () from /lib64/libc.so.6
      #1 0x0000003971e34065 in abort () from /lib64/libc.so.6
      #2 0x0000003971e2b9fe in __assert_fail_base () from /lib64/libc.so.6
      #3 0x0000003971e2bac0 in __assert_fail () from /lib64/libc.so.6
      #4 0x00007f84df074123 in TapProducer::completeBGFetchJob (this=0x2054b0c00, itm=0x18ed99040, vbid=450, implicitEnqueue=false) at tapconnection.cc:1204
      #5 0x00007f84df07d139 in TapConnMap::performTapOp<Item*> (this=<value optimized out>, name=<value optimized out>, tapop=..., arg=0x18ed99040) at tapconnmap.hh
      :117
      #6 0x00007f84df07e115 in TapBGFetchCallback::callback(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
      #7 0x00007f84df02800f in Task::run (this=<value optimized out>, d=<value optimized out>, t=<value optimized out>) at dispatcher.hh:142
      #8 0x00007f84df026f34 in Dispatcher::run (this=0x536f880) at dispatcher.cc:169
      #9 0x00007f84df02774b in launch_dispatcher_thread (arg=0x536f880) at dispatcher.cc:28
      #10 0x00000039726077f1 in start_thread () from /lib64/libpthread.so.0
      #11 0x0000003971ee5ccd in clone () from /lib64/libc.so.6
      --------------------------------------------------------------------------------
      Module information:
      /opt/couchbase/lib/memcached/libmemcached_utilities.so.0:
      lrwxrwxrwx. 1 bin bin 31 Jul 24 23:05 /opt/couchbase/lib/memcached/libmemcached_utilities.so.0 -> libmemcached_utilities.so.0.0.0
      8816230bf4814d8b0d82c3c7e9ca07d4 /opt/couchbase/lib/memcached/libmemcached_utilities.so.0
      /opt/couchbase/lib/libevent-2.0.so.5:
      lrwxrwxrwx. 1 bin bin 21 Jul 24 23:05 /opt/couchbase/lib/libevent-2.0.so.5 -> libevent-2.0.so.5.1.0
      f0dc63a3e615aaf24088f10273a0a3ed /opt/couchbase/lib/libevent-2.0.so.5
      /lib64/libdl.so.2:
      lrwxrwxrwx. 1 root root 13 May 22 14:11 /lib64/libdl.so.2 -> libdl-2.12.so
      11652c5d0ba3bf86eb3580c27e5d52c3 /lib64/libdl.so.2
      /lib64/libm.so.6:

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thuan Thuan Nguyen
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty