Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-932

Server becomes unresponsive after a handful of tests

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • mbz M1-1
    • 1.6.0 beta4
    • couchbase-bucket
    • None
    • Operating System: All
      Platform: X86

    Description

      build: zmemcached-1.5.3_rc1_4_gf277b90-1.x86_64.rpm

      Running the basic acceptance tests, which hang part-way through. Telnet to localhost:11211 responds, but any command, such as version hangs indefinitely.

      A server stack trace

      (gdb) thread apply all bt

      Thread 8 (Thread 1082132800 (LWP 27662)):
      #0 0x00002b5e3c31cbe8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
      #1 0x00002b5e3c2b50e9 in _L_lock_14295 () from /lib64/libc.so.6
      #2 0x00002b5e3c2b3e71 in free () from /lib64/libc.so.6
      #3 0x00002b5e3c7bc35c in EventuallyPersistentStore::beginFlush (this=0x64e4a0)
      at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:94
      #4 0x00002b5e3c7d00a8 in Flusher::doFlush (this=0x649380) at flusher.cc:173
      #5 0x00002b5e3c7d0a9a in Flusher::step (this=0x2b5e3c58e9a0, d=@0x0, tid=
      {_M_ptr = 0x407fffa0, _M_refcount = {_M_pi = 0xffffffffffffffff}}) at flusher.cc:130
      #6 0x00002b5e3c7d133e in FlusherStepper::callback (this=0x659b90, d=@0x649250, t=<value optimized out>)
      at flusher.cc:6
      #7 0x00002b5e3c7b69f3 in Dispatcher::run (this=0x649250) at dispatcher.hh:65
      #8 0x00002b5e3c7b7393 in launch_dispatcher_thread (arg=0x2b5e3c58e9a0) at dispatcher.cc:10
      #9 0x00002b5e3c02a2f7 in start_thread () from /lib64/libpthread.so.0
      #10 0x00002b5e3c310e3d in clone () from /lib64/libc.so.6

      Thread 7 (Thread 1090525504 (LWP 27663)):
      #0 0x00002b5e3c02e496 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00002b5e3c7cc5e6 in EventuallyPersistentEngine::notifyTapIoThread (this=0x636770) at syncobject.hh:31
      #2 0x00002b5e3c7c4cf9 in EvpNotifyTapIo (arg=0x63694c) at ep_engine.cc:416
      #3 0x00002b5e3c02a2f7 in start_thread () from /lib64/libpthread.so.0
      #4 0x00002b5e3c310e3d in clone () from /lib64/libc.so.6

      Thread 6 (Thread 1098918208 (LWP 27664)):
      #0 0x00002b5e3c31cbe8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
      #1 0x00002b5e3c2b50e9 in _L_lock_14295 () from /lib64/libc.so.6
      #2 0x00002b5e3c2b3e71 in free () from /lib64/libc.so.6
      #3 0x00002b5e3cb48c1a in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string ()
      from /usr/lib64/libstdc++.so.6
      #4 0x00002b5e3c7c6c12 in std::tr1::_Sp_counted_base_impl<std::string*, std::tr1::_Sp_deleter<std::string> >::dispose (this=<value optimized out>)
      at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/boost_shared_ptr.h:93
      #5 0x00002b5e3c7d4db7 in HashTable::clear (this=0x64e4d0)
      at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/boost_shared_ptr.h:153
      #6 0x00002b5e3c7baf6a in EventuallyPersistentStore::reset (this=0x64e4a0) at ep.cc:245
      #7 0x00002b5e3c7c4a97 in EvpFlush (handle=0x636770, cookie=0x0, when=46912529243664) at ep_engine.h:754
      #8 0x000000000040d0d2 in process_command ()
      #9 0x000000000040d762 in conn_parse_cmd ()
      #10 0x00000000004046fd in event_handler ()
      #11 0x000000000041408a in event_base_loop ()
      #12 0x0000000000410c94 in worker_libevent ()
      #13 0x00002b5e3c02a2f7 in start_thread () from /lib64/libpthread.so.0
      #14 0x00002b5e3c310e3d in clone () from /lib64/libc.so.6

      Thread 5 (Thread 1107310912 (LWP 27665)):
      #0 0x00002b5e3c31cbe8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
      #1 0x00002b5e3c2b50e9 in _L_lock_14295 () from /lib64/libc.so.6
      #2 0x00002b5e3c2b3e71 in free () from /lib64/libc.so.6
      #3 0x00002b5e3c7d4db7 in HashTable::clear (this=0x64e4d0)
      at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/boost_shared_ptr.h:153
      #4 0x00002b5e3c7baf6a in EventuallyPersistentStore::reset (this=0x64e4a0) at ep.cc:245
      #5 0x00002b5e3c7c4a97 in EvpFlush (handle=0x636770, cookie=0x0, when=46912790266144) at ep_engine.h:754
      #6 0x000000000040d0d2 in process_command ()
      #7 0x000000000040d762 in conn_parse_cmd ()
      #8 0x00000000004046fd in event_handler ()
      #9 0x000000000041408a in event_base_loop ()
      #10 0x0000000000410c94 in worker_libevent ()
      #11 0x00002b5e3c02a2f7 in start_thread () from /lib64/libpthread.so.0
      #12 0x00002b5e3c310e3d in clone () from /lib64/libc.so.6

      Thread 4 (Thread 1115703616 (LWP 27666)):
      #0 0x00002b5e3c31cbe8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
      #1 0x00002b5e3c2b5120 in _L_lock_14769 () from /lib64/libc.so.6
      #2 0x00002b5e3c2b4187 in realloc () from /lib64/libc.so.6
      #3 0x000000000041127c in cache_free ()
      #4 0x000000000040c60e in conn_mwrite ()
      #5 0x00000000004046fd in event_handler ()
      #6 0x000000000041408a in event_base_loop ()
      #7 0x0000000000410c94 in worker_libevent ()
      #8 0x00002b5e3c02a2f7 in start_thread () from /lib64/libpthread.so.0
      #9 0x00002b5e3c310e3d in clone () from /lib64/libc.so.6

      Thread 3 (Thread 1124096320 (LWP 27667)):
      #0 0x00002b5e3c31cbe8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
      --Type <return> to continue, or q <return> to quit--
      #1 0x00002b5e3c2b50e9 in _L_lock_14295 () from /lib64/libc.so.6
      #2 0x00002b5e3c2b3e71 in free () from /lib64/libc.so.6
      #3 0x00002b5e3b356209 in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
      #4 0x00002b5e3b35acbd in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
      #5 0x00002b5e3b356ea6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
      #6 0x00002b5e3b35a65c in _dl_open () from /lib64/ld-linux-x86-64.so.2
      #7 0x00002b5e3c345500 in do_dlopen () from /lib64/libc.so.6
      #8 0x00002b5e3b356ea6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
      #9 0x00002b5e3c345667 in __libc_dlopen_mode () from /lib64/libc.so.6
      #10 0x00002b5e3c322dda in init () from /lib64/libc.so.6
      #11 0x00002b5e3c02f49d in pthread_once () from /lib64/libpthread.so.0
      #12 0x00002b5e3c322e77 in backtrace () from /lib64/libc.so.6
      #13 0x00002b5e3c2a939f in __libc_message () from /lib64/libc.so.6
      #14 0x00002b5e3c2b0834 in _int_free () from /lib64/libc.so.6
      #15 0x00002b5e3c2b3e7c in free () from /lib64/libc.so.6
      #16 0x00002b5e3c7d4e03 in HashTable::clear (this=0x64e4d0)
      at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/basic_string.h:233
      #17 0x00002b5e3c7baf6a in EventuallyPersistentStore::reset (this=0x64e4a0) at ep.cc:245
      #18 0x00002b5e3c7c4a97 in EvpFlush (handle=0x636770, cookie=0x0, when=6513088) at ep_engine.h:754
      #19 0x000000000040d0d2 in process_command ()
      #20 0x000000000040d762 in conn_parse_cmd ()
      #21 0x00000000004046fd in event_handler ()
      #22 0x000000000041408a in event_base_loop ()
      #23 0x0000000000410c94 in worker_libevent ()
      #24 0x00002b5e3c02a2f7 in start_thread () from /lib64/libpthread.so.0
      #25 0x00002b5e3c310e3d in clone () from /lib64/libc.so.6

      Thread 2 (Thread 1132489024 (LWP 27668)):
      #0 0x00002b5e3c311228 in epoll_wait () from /lib64/libc.so.6
      #1 0x0000000000415a90 in epoll_dispatch ()
      #2 0x0000000000413fc1 in event_base_loop ()
      #3 0x0000000000410c94 in worker_libevent ()
      #4 0x00002b5e3c02a2f7 in start_thread () from /lib64/libpthread.so.0
      #5 0x00002b5e3c310e3d in clone () from /lib64/libc.so.6

      Thread 1 (Thread 47683739400944 (LWP 27656)):
      #0 0x00002b5e3c311228 in epoll_wait () from /lib64/libc.so.6
      #1 0x0000000000415a90 in epoll_dispatch ()
      #2 0x0000000000413fc1 in event_base_loop ()
      #3 0x0000000000408c1a in main ()
      #0 0x00002b5e3c311228 in epoll_wait () from /lib64/libc.so.6
      (gdb)

      shows many threads in EvpFlush() and waiting on a lock.

      The EC2 instance running this server is at ec2-67-202-30-128.compute-1.amazonaws.com. I will leave it running and deadlocked if you want to poke at it.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dustin@sallings.org Dustin Sallings (Inactive)
            rwygand@gmail.com Rob Wygand
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty