Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-3777

bucket_engine.c:1413 core dump while possibly retrieving TAP stats during rebalance tests

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.7 alpha 2
    • Fix Version/s: 1.7.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None
    • Environment:
      basestar-223-g75d647b

      Description

      I will look at the test logs later to find out during which test this crash happened
      i think it is related to some new calls i added to the tests which get tap stats more ofte during test runs.

      Thread 1 (Thread 15489):
      #0 0x00c87402 in __kernel_vsyscall ()
      #1 0x001efdf0 in raise () from /lib/libc.so.6
      #2 0x001f1701 in abort () from /lib/libc.so.6
      #3 0x001e926b in __assert_fail () from /lib/libc.so.6
      #4 0x004e24de in must_lock (mutex=<value optimized out>) at bucket_engine.c:305
      #5 0x004e2963 in bucket_get_stats_struct (handle=0x4e7200, cookie=0x9d26278) at bucket_engine.c:1413
      #6 0x08055bd7 in transmit (c=0x9d26278) at daemon/memcached.c:6334
      #7 conn_mwrite (c=0x9d26278) at daemon/memcached.c:5375
      #8 0x08055ed5 in conn_write (c=0x9d26278) at daemon/memcached.c:5362
      #9 0x0805b606 in thread_libevent_process (fd=10, which=2, arg=0x9cb3ff0) at daemon/thread.c:381
      #10 0x0019b473 in event_process_active_single_queue (base=0x9cb42f0, flags=0) at event.c:1308
      #11 event_process_active (base=0x9cb42f0, flags=0) at event.c:1375
      #12 event_base_loop (base=0x9cb42f0, flags=0) at event.c:1572
      #13 0x0805be97 in worker_libevent (arg=0x9cb3ff0) at daemon/thread.c:304
      #14 0x00b3e5ab in start_thread () from /lib/libpthread.so.0
      #15 0x00298cfe in clone () from /lib/libc.so.6

      1. mycore.memcached.15486.log
        8 kB
        Farshid Ghods
      2. mycore.memcached.27383.log
        11 kB
        Farshid Ghods
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        seeing this crash again on a different run on a different machine.

        Thread 1 (Thread 27387):
        #0 0xb77a7430 in __kernel_vsyscall ()
        #1 0xb75f64d1 in raise () from /lib/tls/i686/cmov/libc.so.6
        #2 0xb75f9932 in abort () from /lib/tls/i686/cmov/libc.so.6
        #3 0xb75ef648 in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
        #4 0xb65c055e in must_lock (mutex=<value optimized out>) at bucket_engine.c:305
        #5 0xb65c1173 in bucket_get_stats_struct (handle=0xb65c70e0, cookie=0x967eef0) at bucket_engine.c:1413
        #6 0x080583e7 in get_independent_stats (c=0x967eef0) at daemon/memcached.c:6334
        #7 get_thread_stats (c=0x967eef0) at daemon/memcached.c:6344
        #8 transmit (c=0x967eef0) at daemon/memcached.c:4978
        #9 conn_mwrite (c=0x967eef0) at daemon/memcached.c:5375
        #10 0x0805c55e in thread_libevent_process (fd=15, which=2, arg=0x87640b0) at daemon/thread.c:381
        #11 0xb77719af in event_process_active_single_queue (base=0x8764888, flags=<value optimized out>) at event.c:1308
        #12 event_process_active (base=0x8764888, flags=<value optimized out>) at event.c:1375
        #13 event_base_loop (base=0x8764888, flags=<value optimized out>) at event.c:1572
        #14 0x0805cec7 in worker_libevent (arg=0x87640b0) at daemon/thread.c:304
        #15 0xb771680e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
        #16 0xb7698a0e in clone () from /lib/tls/i686/cmov/libc.so.6

        Show
        farshid Farshid Ghods (Inactive) added a comment - seeing this crash again on a different run on a different machine. Thread 1 (Thread 27387): #0 0xb77a7430 in __kernel_vsyscall () #1 0xb75f64d1 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb75f9932 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0xb75ef648 in __assert_fail () from /lib/tls/i686/cmov/libc.so.6 #4 0xb65c055e in must_lock (mutex=<value optimized out>) at bucket_engine.c:305 #5 0xb65c1173 in bucket_get_stats_struct (handle=0xb65c70e0, cookie=0x967eef0) at bucket_engine.c:1413 #6 0x080583e7 in get_independent_stats (c=0x967eef0) at daemon/memcached.c:6334 #7 get_thread_stats (c=0x967eef0) at daemon/memcached.c:6344 #8 transmit (c=0x967eef0) at daemon/memcached.c:4978 #9 conn_mwrite (c=0x967eef0) at daemon/memcached.c:5375 #10 0x0805c55e in thread_libevent_process (fd=15, which=2, arg=0x87640b0) at daemon/thread.c:381 #11 0xb77719af in event_process_active_single_queue (base=0x8764888, flags=<value optimized out>) at event.c:1308 #12 event_process_active (base=0x8764888, flags=<value optimized out>) at event.c:1375 #13 event_base_loop (base=0x8764888, flags=<value optimized out>) at event.c:1572 #14 0x0805cec7 in worker_libevent (arg=0x87640b0) at daemon/thread.c:304 #15 0xb771680e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #16 0xb7698a0e in clone () from /lib/tls/i686/cmov/libc.so.6
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        the test code now tries to get 'tap' stats from moxi which is unsupported.

        I need to fix the test to get 'tap' stats directly from memcached and once we fix the test we shouldn't see this crash anymore

        Show
        farshid Farshid Ghods (Inactive) added a comment - the test code now tries to get 'tap' stats from moxi which is unsupported. I need to fix the test to get 'tap' stats directly from memcached and once we fix the test we shouldn't see this crash anymore
        Hide
        trond Trond Norbye added a comment -

        The problem here is the design of the get_stats_struct someone added to the engine API. It doesn't support returning an error message or kick out the connection if the bucket isn't valid anymore..

        Show
        trond Trond Norbye added a comment - The problem here is the design of the get_stats_struct someone added to the engine API. It doesn't support returning an error message or kick out the connection if the bucket isn't valid anymore..
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        still seeing this crash in today's run after Trond's recent changes to bucket_engine

        #0 0x00000033f7cd21d8 in epoll_wait () from /lib64/libc.so.6
        #1 0x00002b846ead9c28 in epoll_dispatch (base=0x1e62fb10, tv=<value optimized out>) at epoll.c:404
        #2 0x00002b846eac8a4c in event_base_loop (base=0x1e62fb10, flags=0) at event.c:1558
        #3 0x0000000000413394 in worker_libevent (arg=0x1e62ded0) at daemon/thread.c:304
        #4 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0
        #5 0x00000033f7cd1ded in clone () from /lib64/libc.so.6

        Thread 3 (Thread 27609):
        #0 0x00000033f7cd21d8 in epoll_wait () from /lib64/libc.so.6
        #1 0x00002b846ead9c28 in epoll_dispatch (base=0x1e6304b0, tv=<value optimized out>) at epoll.c:404
        --Type <return> to continue, or q <return> to quit--
        #2 0x00002b846eac8a4c in event_base_loop (base=0x1e6304b0, flags=0) at event.c:1558
        #3 0x0000000000413394 in worker_libevent (arg=0x1e62dfd0) at daemon/thread.c:304
        #4 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0
        #5 0x00000033f7cd1ded in clone () from /lib64/libc.so.6

        Thread 2 (Thread 27631):
        #0 0x00000033f6c13ff7 in munmap () from /lib64/ld-linux-x86-64.so.2
        #1 0x00000033f6c120db in _dl_close_worker () from /lib64/ld-linux-x86-64.so.2
        #2 0x00000033f6c127dc in _dl_close () from /lib64/ld-linux-x86-64.so.2
        #3 0x00000033f6c0ce56 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
        #4 0x00000033f840150d in _dlerror_run () from /lib64/libdl.so.2
        #5 0x00000033f840104f in dlclose () from /lib64/libdl.so.2
        #6 0x00002aaaaaaaeb09 in uninit_engine_handle (arg=0x1e6aac50) at bucket_engine.c:722
        #7 free_engine_handle (arg=0x1e6aac50) at bucket_engine.c:731
        #8 engine_shutdown_thread (arg=0x1e6aac50) at bucket_engine.c:1472
        #9 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0
        #10 0x00000033f7cd1ded in clone () from /lib64/libc.so.6

        Thread 1 (Thread 27605):
        #0 0x00000033f7c30155 in raise () from /lib64/libc.so.6
        #1 0x00000033f7c31bf0 in abort () from /lib64/libc.so.6
        #2 0x00002aaaaaaade4d in must_unlock (handle=<value optimized out>, cookie=<value optimized out>) at bucket_engine.c:345
        #3 bucket_get_stats_struct (handle=<value optimized out>, cookie=<value optimized out>) at bucket_engine.c:1791
        #4 0x000000000040d841 in transmit (c=0x1e6aa6d8) at daemon/memcached.c:6334
        #5 conn_mwrite (c=0x1e6aa6d8) at daemon/memcached.c:5375
        --Type <return> to continue, or q <return> to quit--
        #6 0x000000000040db3f in conn_write (c=0x1e6aa6d8) at daemon/memcached.c:5362
        #7 0x0000000000412bc9 in thread_libevent_process (fd=<value optimized out>, which=<value optimized out>,
        arg=<value optimized out>) at daemon/thread.c:381
        #8 0x00002b846eac8df9 in event_process_active_single_queue (base=0x1e62e130, flags=0) at event.c:1308
        #9 event_process_active (base=0x1e62e130, flags=0) at event.c:1375
        #10 event_base_loop (base=0x1e62e130, flags=0) at event.c:1572
        #11 0x0000000000413394 in worker_libevent (arg=0x1e62dbd0) at daemon/thread.c:304
        #12 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0
        #13 0x00000033f7cd1ded in clone () from /lib64/libc.so.6

        Show
        farshid Farshid Ghods (Inactive) added a comment - still seeing this crash in today's run after Trond's recent changes to bucket_engine #0 0x00000033f7cd21d8 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b846ead9c28 in epoll_dispatch (base=0x1e62fb10, tv=<value optimized out>) at epoll.c:404 #2 0x00002b846eac8a4c in event_base_loop (base=0x1e62fb10, flags=0) at event.c:1558 #3 0x0000000000413394 in worker_libevent (arg=0x1e62ded0) at daemon/thread.c:304 #4 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0 #5 0x00000033f7cd1ded in clone () from /lib64/libc.so.6 Thread 3 (Thread 27609): #0 0x00000033f7cd21d8 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b846ead9c28 in epoll_dispatch (base=0x1e6304b0, tv=<value optimized out>) at epoll.c:404 -- Type <return> to continue, or q <return> to quit -- #2 0x00002b846eac8a4c in event_base_loop (base=0x1e6304b0, flags=0) at event.c:1558 #3 0x0000000000413394 in worker_libevent (arg=0x1e62dfd0) at daemon/thread.c:304 #4 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0 #5 0x00000033f7cd1ded in clone () from /lib64/libc.so.6 Thread 2 (Thread 27631): #0 0x00000033f6c13ff7 in munmap () from /lib64/ld-linux-x86-64.so.2 #1 0x00000033f6c120db in _dl_close_worker () from /lib64/ld-linux-x86-64.so.2 #2 0x00000033f6c127dc in _dl_close () from /lib64/ld-linux-x86-64.so.2 #3 0x00000033f6c0ce56 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #4 0x00000033f840150d in _dlerror_run () from /lib64/libdl.so.2 #5 0x00000033f840104f in dlclose () from /lib64/libdl.so.2 #6 0x00002aaaaaaaeb09 in uninit_engine_handle (arg=0x1e6aac50) at bucket_engine.c:722 #7 free_engine_handle (arg=0x1e6aac50) at bucket_engine.c:731 #8 engine_shutdown_thread (arg=0x1e6aac50) at bucket_engine.c:1472 #9 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0 #10 0x00000033f7cd1ded in clone () from /lib64/libc.so.6 Thread 1 (Thread 27605): #0 0x00000033f7c30155 in raise () from /lib64/libc.so.6 #1 0x00000033f7c31bf0 in abort () from /lib64/libc.so.6 #2 0x00002aaaaaaade4d in must_unlock (handle=<value optimized out>, cookie=<value optimized out>) at bucket_engine.c:345 #3 bucket_get_stats_struct (handle=<value optimized out>, cookie=<value optimized out>) at bucket_engine.c:1791 #4 0x000000000040d841 in transmit (c=0x1e6aa6d8) at daemon/memcached.c:6334 #5 conn_mwrite (c=0x1e6aa6d8) at daemon/memcached.c:5375 -- Type <return> to continue, or q <return> to quit -- #6 0x000000000040db3f in conn_write (c=0x1e6aa6d8) at daemon/memcached.c:5362 #7 0x0000000000412bc9 in thread_libevent_process (fd=<value optimized out>, which=<value optimized out>, arg=<value optimized out>) at daemon/thread.c:381 #8 0x00002b846eac8df9 in event_process_active_single_queue (base=0x1e62e130, flags=0) at event.c:1308 #9 event_process_active (base=0x1e62e130, flags=0) at event.c:1375 #10 event_base_loop (base=0x1e62e130, flags=0) at event.c:1572 #11 0x0000000000413394 in worker_libevent (arg=0x1e62dbd0) at daemon/thread.c:304 #12 0x00000033f8806307 in start_thread () from /lib64/libpthread.so.0 #13 0x00000033f7cd1ded in clone () from /lib64/libc.so.6
        Show
        trond Trond Norbye added a comment - http://review.couchbase.org/#change,6633

          People

          • Assignee:
            trond Trond Norbye
            Reporter:
            farshid Farshid Ghods (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes