Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7448

memcached crashed in in EvpNotifyPendingConns -> Rebalance exited with reason {pre_rebalance_config_synchronization_failed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.1.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None
    • Environment:
      CentOS release 5.4

      Description

      version=2.0.1-107-rel
      http://qa.hq.northscale.net/job/centos-32-2.0-swaprebalance-tests/241/consoleFull
      ./testrunner -i /tmp/swaprebalance-cent-32.ini get-logs=True,GROUP=P0 -t swaprebalance.SwapRebalanceFailedTests.test_failed_swap_rebalance,replica=1,num-buckets=1,num-swap=3,GROUP=P0

      host 10.3.2.148

      [root@cen-0728 tmp]# gdb /opt/couchbase/bin/memcached core.memcached.2522
      GNU gdb (GDB) CentOS (7.0.1-42.el5.centos.1)
      Copyright (C) 2009 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law. Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "i386-redhat-linux-gnu".
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>...
      Reading symbols from /opt/couchbase/bin/memcached...done.

      warning: exec file is newer than core file.
      [New Thread 2565]
      [New Thread 2564]
      [New Thread 2563]
      [New Thread 2562]
      [New Thread 2561]
      [New Thread 2543]
      [New Thread 2542]
      [New Thread 2541]
      [New Thread 2540]
      [New Thread 2539]
      [New Thread 2538]
      [New Thread 2537]
      [New Thread 2536]
      [New Thread 2535]
      [New Thread 2534]
      [New Thread 2533]
      [New Thread 2532]
      [New Thread 2531]
      [New Thread 2522]

      warning: .dynamic section for "/lib/libpthread.so.0" is not at the expected address

      warning: difference appears to be caused by prelink, adjusting expectations
      t a a bt Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done.
      Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0
      Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done.
      Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
      Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib/libdl.so.2
      Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib/libm.so.6
      Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib/librt.so.1
      Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done.
      Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4
      Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
      [Thread debugging using libthread_db enabled]
      Loaded symbols for /lib/libpthread.so.0
      Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib/libc.so.6
      Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib/ld-linux.so.2
      Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
      Loaded symbols for /usr/lib/libstdc++.so.6
      Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib/libgcc_s.so.1
      Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
      Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
      Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
      Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/ep.so
      Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done.
      Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
      Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
      Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
      Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib/libnss_files.so.2
      Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
      Program terminated with signal 11, Segmentation fault.
      #0 0x0322f76d in __dynamic_cast () from /usr/lib/libstdc++.so.6
      (gdb) t a a bt

      Thread 19 (Thread 0xb7feb6d0 (LWP 2522)):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00481ae6 in epoll_wait () from /lib/libc.so.6
      #2 0x00ef8f97 in epoll_dispatch (base=0xb954000, tv=0xbf985554) at epoll.c:404
      #3 0x00ee5463 in event_base_loop (base=0xb954000, flags=0) at event.c:1558
      #4 0x08051671 in main (argc=19, argv=0xbf986b84) at daemon/memcached.c:7918

      Thread 18 (Thread 2531):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x0047201b in read () from /lib/libc.so.6
      #2 0x00411f68 in _IO_file_read_internal () from /lib/libc.so.6
      #3 0x004132c0 in _IO_new_file_underflow () from /lib/libc.so.6
      #4 0x004139bb in _IO_default_uflow_internal () from /lib/libc.so.6
      #5 0x00414d2d in __uflow () from /lib/libc.so.6
      #6 0x004086b6 in _IO_getline_info_internal () from /lib/libc.so.6
      #7 0x00408601 in _IO_getline_internal () from /lib/libc.so.6
      #8 0x0040757a in fgets () from /lib/libc.so.6
      #9 0x00c727b7 in check_stdin_thread (arg=0x804a790) at extensions/daemon/stdin_check.c:37
      #10 0x00115832 in start_thread () from /lib/libpthread.so.0
      #11 0x0048146e in clone () from /lib/libc.so.6

      Thread 17 (Thread 2532):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x00e6c9f5 in logger_thead_main (arg=0x8864040) at extensions/loggers/file_logger.c:368
      #3 0x00115832 in start_thread () from /lib/libpthread.so.0
      #4 0x0048146e in clone () from /lib/libc.so.6

      Thread 16 (Thread 2533):

      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00481ae6 in epoll_wait () from /lib/libc.so.6
      #2 0x00ef8f97 in epoll_dispatch (base=0xb954180, tv=0x0) at epoll.c:404
      #3 0x00ee5463 in event_base_loop (base=0xb954180, flags=0) at event.c:1558
      #4 0x0805cd47 in worker_libevent (arg=0x8866dc0) at daemon/thread.c:301
      #5 0x00115832 in start_thread () from /lib/libpthread.so.0
      #6 0x0048146e in clone () from /lib/libc.so.6

      Thread 15 (Thread 2534):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00481ae6 in epoll_wait () from /lib/libc.so.6
      #2 0x00ef8f97 in epoll_dispatch (base=0xb954600, tv=0x0) at epoll.c:404
      #3 0x00ee5463 in event_base_loop (base=0xb954600, flags=0) at event.c:1558
      #4 0x0805cd47 in worker_libevent (arg=0x8866e4c) at daemon/thread.c:301
      #5 0x00115832 in start_thread () from /lib/libpthread.so.0
      #6 0x0048146e in clone () from /lib/libc.so.6

      Thread 14 (Thread 2535):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00481ae6 in epoll_wait () from /lib/libc.so.6
      #2 0x00ef8f97 in epoll_dispatch (base=0xb954d80, tv=0x0) at epoll.c:404
      #3 0x00ee5463 in event_base_loop (base=0xb954d80, flags=0) at event.c:1558
      #4 0x0805cd47 in worker_libevent (arg=0x8866ed8) at daemon/thread.c:301
      #5 0x00115832 in start_thread () from /lib/libpthread.so.0
      #6 0x0048146e in clone () from /lib/libc.so.6
      --Type <return> to continue, or q <return> to quit--

      Thread 13 (Thread 2536):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00481ae6 in epoll_wait () from /lib/libc.so.6
      #2 0x00ef8f97 in epoll_dispatch (base=0xb954a80, tv=0x0) at epoll.c:404
      #3 0x00ee5463 in event_base_loop (base=0xb954a80, flags=0) at event.c:1558
      #4 0x0805cd47 in worker_libevent (arg=0x8866f64) at daemon/thread.c:301
      #5 0x00115832 in start_thread () from /lib/libpthread.so.0
      #6 0x0048146e in clone () from /lib/libc.so.6

      Thread 12 (Thread 2537):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00481ae6 in epoll_wait () from /lib/libc.so.6
      #2 0x00ef8f97 in epoll_dispatch (base=0xb955500, tv=0x0) at epoll.c:404
      #3 0x00ee5463 in event_base_loop (base=0xb955500, flags=0) at event.c:1558
      #4 0x0805cd47 in worker_libevent (arg=0x8866ff0) at daemon/thread.c:301
      #5 0x00115832 in start_thread () from /lib/libpthread.so.0
      #6 0x0048146e in clone () from /lib/libc.so.6

      Thread 11 (Thread 2538):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00440d26 in nanosleep () from /lib/libc.so.6
      #2 0x0047a89c in usleep () from /lib/libc.so.6
      #3 0x007481ef in updateStatsThread (arg=0x885e180) at src/memory_tracker.cc:31
      #4 0x00115832 in start_thread () from /lib/libpthread.so.0
      #5 0x0048146e in clone () from /lib/libc.so.6

      Thread 10 (Thread 2539):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0

      #2 0x00701889 in wait (this=0xb93caa0, d=...) at src/syncobject.hh:58
      #3 IdleTask::run (this=0xb93caa0, d=...) at src/dispatcher.cc:336
      #4 0x00703d29 in Dispatcher::run (this=0xb965100) at src/dispatcher.cc:173
      #5 0x00704715 in launch_dispatcher_thread (arg=0xb965100) at src/dispatcher.cc:28
      #6 0x00115832 in start_thread () from /lib/libpthread.so.0
      #7 0x0048146e in clone () from /lib/libc.so.6

      Thread 9 (Thread 2540):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x00701889 in wait (this=0xb93ca50, d=...) at src/syncobject.hh:58
      #3 IdleTask::run (this=0xb93ca50, d=...) at src/dispatcher.cc:336
      #4 0x00703d29 in Dispatcher::run (this=0xb965000) at src/dispatcher.cc:173
      #5 0x00704715 in launch_dispatcher_thread (arg=0xb965000) at src/dispatcher.cc:28
      #6 0x00115832 in start_thread () from /lib/libpthread.so.0
      #7 0x0048146e in clone () from /lib/libc.so.6

      Thread 8 (Thread 2541):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x00701889 in wait (this=0xb93cb90, d=...) at src/syncobject.hh:58
      #3 IdleTask::run (this=0xb93cb90, d=...) at src/dispatcher.cc:336
      #4 0x00703d29 in Dispatcher::run (this=0xb964f00) at src/dispatcher.cc:173
      #5 0x00704715 in launch_dispatcher_thread (arg=0xb964f00) at src/dispatcher.cc:28
      #6 0x00115832 in start_thread () from /lib/libpthread.so.0
      --Type <return> to continue, or q <return> to quit--
      #7 0x0048146e in clone () from /lib/libc.so.6

      Thread 7 (Thread 2542):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x00701889 in wait (this=0xb93cb40, d=...) at src/syncobject.hh:58
      #3 IdleTask::run (this=0xb93cb40, d=...) at src/dispatcher.cc:336
      #4 0x00703d29 in Dispatcher::run (this=0xb965500) at src/dispatcher.cc:173
      #5 0x00704715 in launch_dispatcher_thread (arg=0xb965500) at src/dispatcher.cc:28
      #6 0x00115832 in start_thread () from /lib/libpthread.so.0
      #7 0x0048146e in clone () from /lib/libc.so.6

      Thread 6 (Thread 2543):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x007234ee in wait (this=0xb9a0000) at src/syncobject.hh:58
      #3 wait (this=0xb9a0000) at src/syncobject.hh:74
      #4 wait (this=0xb9a0000) at src/tapconnmap.hh:169
      #5 EventuallyPersistentEngine::notifyPendingConnections (this=0xb9a0000) at src/ep_engine.cc:3406
      #6 0x00723602 in EvpNotifyPendingConns (arg=0xb9a0000) at src/ep_engine.cc:1142
      #7 0x00115832 in start_thread () from /lib/libpthread.so.0
      #8 0x0048146e in clone () from /lib/libc.so.6

      Thread 5 (Thread 2561):
      #0 0x00118e8e in pthread_mutex_unlock () from /lib/libpthread.so.0
      #1 0x00748f5c in Mutex::release (this=0xc7578e4) at src/mutex.cc:95
      #2 0x00708ccc in unlock (this=0x8867340) at src/locks.hh:58
      #3 ~LockHolder (this=0x8867340) at src/locks.hh:41
      #4 getBackfillSize (this=0x8867340) at src/vbucket.hh:190
      #5 EventuallyPersistentStore::incomingQueueSize (this=0x8867340) at src/ep.cc:2133
      #6 0x00710393 in EventuallyPersistentStore::beginFlush (this=0x8867340) at src/ep.cc:1876
      #7 0x00741cf6 in Flusher::doFlush (this=0x886d180) at src/flusher.cc:233
      #8 0x007433d0 in Flusher::step (this=0x886d180, d=..., tid=...) at src/flusher.cc:158
      #9 0x007050e2 in Task::run (this=0xba781e0, d=..., t=...) at src/dispatcher.hh:136
      #10 0x00703d29 in Dispatcher::run (this=0xb965d00) at src/dispatcher.cc:173
      #11 0x00704715 in launch_dispatcher_thread (arg=0xb965d00) at src/dispatcher.cc:28
      #12 0x00115832 in start_thread () from /lib/libpthread.so.0
      #13 0x0048146e in clone () from /lib/libc.so.6

      Thread 4 (Thread 2562):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x00701889 in wait (this=0xc0ad270, d=...) at src/syncobject.hh:58
      #3 IdleTask::run (this=0xc0ad270, d=...) at src/dispatcher.cc:336
      #4 0x00703d29 in Dispatcher::run (this=0xb965c00) at src/dispatcher.cc:173
      #5 0x00704715 in launch_dispatcher_thread (arg=0xb965c00) at src/dispatcher.cc:28
      #6 0x00115832 in start_thread () from /lib/libpthread.so.0
      #7 0x0048146e in clone () from /lib/libc.so.6

      Thread 3 (Thread 2563):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x00701889 in wait (this=0xc0acff0, d=...) at src/syncobject.hh:58
      #3 IdleTask::run (this=0xc0acff0, d=...) at src/dispatcher.cc:336
      #4 0x00703d29 in Dispatcher::run (this=0xb965f00) at src/dispatcher.cc:173
      --Type <return> to continue, or q <return> to quit--
      #5 0x00704715 in launch_dispatcher_thread (arg=0xb965f00) at src/dispatcher.cc:28
      #6 0x00115832 in start_thread () from /lib/libpthread.so.0
      #7 0x0048146e in clone () from /lib/libc.so.6

      Thread 2 (Thread 2564):
      #0 0x00540402 in __kernel_vsyscall ()
      #1 0x00119ef2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
      #2 0x00701889 in wait (this=0xc0ad1d0, d=...) at src/syncobject.hh:58
      #3 IdleTask::run (this=0xc0ad1d0, d=...) at src/dispatcher.cc:336
      #4 0x00703d29 in Dispatcher::run (this=0xb965e00) at src/dispatcher.cc:173
      #5 0x00704715 in launch_dispatcher_thread (arg=0xb965e00) at src/dispatcher.cc:28
      #6 0x00115832 in start_thread () from /lib/libpthread.so.0
      #7 0x0048146e in clone () from /lib/libc.so.6

      Thread 1 (Thread 0xad5d9b90 (LWP 2565)):
      #0 0x0322f76d in __dynamic_cast () from /usr/lib/libstdc++.so.6
      #1 0x007653c9 in TapConnMap::notifyIOThreadMain (this=0x886d040) at src/tapconnmap.cc:467
      #2 0x00723426 in EventuallyPersistentEngine::notifyPendingConnections (this=0xb9a0500) at src/ep_engine.cc:3399
      #3 0x00723602 in EvpNotifyPendingConns (arg=0xb9a0500) at src/ep_engine.cc:1142
      #4 0x00115832 in start_thread () from /lib/libpthread.so.0
      #5 0x0048146e in clone () from /lib/libc.so.6

      2012-12-18 23:57:20.304 ns_orchestrator:2:info:message(ns_1@10.3.2.145) - Rebalance exited with reason

      {pre_rebalance_config_synchronization_failed, ['ns_1@10.3.2.148']}

      please, note, that time was not sync on the vms( will fix it):

      10.3.2.145
      Wed Dec 19 03:10:12 PST 2012

      10.3.2.152
      Wed Dec 19 03:07:36 PST 2012

      10.3.2.149
      Wed Dec 19 03:07:36 PST 2012

      10.3.2.146
      Wed Dec 19 03:07:36 PST 2012

      10.3.2.147
      Wed Dec 19 03:07:36 PST 2012

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        jin Jin Lim added a comment -

        Liang, thanks much for your help on this. Can you please take a look and verify wether it is either a environment issue (i.e. rtti somehow being disabled) or memory issue? It is not a blocker though.

        Show
        jin Jin Lim added a comment - Liang, thanks much for your help on this. Can you please take a look and verify wether it is either a environment issue (i.e. rtti somehow being disabled) or memory issue? It is not a blocker though.
        Hide
        liang Liang Guo (Inactive) added a comment -

        Andrei,

        Is there a place i can pick up the core file? From the logs on node 10.3.2.148, I didn't find anything helpful.

        Show
        liang Liang Guo (Inactive) added a comment - Andrei, Is there a place i can pick up the core file? From the logs on node 10.3.2.148, I didn't find anything helpful.
        Hide
        liang Liang Guo (Inactive) added a comment -

        If gdb gives me correct info, dynamic_cast seg fault at this instruction, where both registers have inaccessible address.

        0x322f76d <__dynamic_cast+61>: mov (%edx),%ecx

        (gdb) p/x $eax
        $13 = 0xad5d919c
        (gdb) x $eax
        0xad5d919c: Cannot access memory at address 0xad5d919c
        (gdb) p/x $edx
        $14 = 0x7e462c00
        (gdb) x $edx
        0x7e462c00: Cannot access memory at address 0x7e462c00

        At caller, the instructions are

        0x7653b9 <TapConnMap::notifyIOThreadMain()+233>: mov %eax,(%esp)
        0x7653bc <TapConnMap::notifyIOThreadMain()+236>: mov %edx,0x8(%esp)
        0x7653c0 <TapConnMap::notifyIOThreadMain()+240>: mov %ecx,0x4(%esp)
        0x7653c4 <TapConnMap::notifyIOThreadMain()+244>: call 0x6ee358warning: (Internal error: pc 0x6ee358 in read in psymtab, but not in symtab.)

        <__dynamic_cast@plt>
        0x7653c9 <TapConnMap::notifyIOThreadMain()+249>: test %eax,%eax

        (gdb) p/x $eax
        $18 = 0xad5d919c
        (gdb) p/x $ecx
        $19 = 0x7e4638
        (gdb) p/x $edx
        $20 = 0x7e462c00

        (gdb) x/20 $esp
        0xad5d91c0: 0x0c42dc70 0x007e4638 0x007e4620 0x00000000
        0xad5d91d0: 0x00000000 0x00000000 0x00000000 0x00000001
        0xad5d91e0: 0x00000000 0x00000000 0x00000000 0xad5d9304

        Need to check if the memory dump above look like valid object. Of course, these and the logs tell me nothing about the possible cause.

        Show
        liang Liang Guo (Inactive) added a comment - If gdb gives me correct info, dynamic_cast seg fault at this instruction, where both registers have inaccessible address. 0x322f76d <__dynamic_cast+61>: mov (%edx),%ecx (gdb) p/x $eax $13 = 0xad5d919c (gdb) x $eax 0xad5d919c: Cannot access memory at address 0xad5d919c (gdb) p/x $edx $14 = 0x7e462c00 (gdb) x $edx 0x7e462c00: Cannot access memory at address 0x7e462c00 At caller, the instructions are 0x7653b9 <TapConnMap::notifyIOThreadMain()+233>: mov %eax,(%esp) 0x7653bc <TapConnMap::notifyIOThreadMain()+236>: mov %edx,0x8(%esp) 0x7653c0 <TapConnMap::notifyIOThreadMain()+240>: mov %ecx,0x4(%esp) 0x7653c4 <TapConnMap::notifyIOThreadMain()+244>: call 0x6ee358warning: (Internal error: pc 0x6ee358 in read in psymtab, but not in symtab.) <__dynamic_cast@plt> 0x7653c9 <TapConnMap::notifyIOThreadMain()+249>: test %eax,%eax (gdb) p/x $eax $18 = 0xad5d919c (gdb) p/x $ecx $19 = 0x7e4638 (gdb) p/x $edx $20 = 0x7e462c00 (gdb) x/20 $esp 0xad5d91c0: 0x0c42dc70 0x007e4638 0x007e4620 0x00000000 0xad5d91d0: 0x00000000 0x00000000 0x00000000 0x00000001 0xad5d91e0: 0x00000000 0x00000000 0x00000000 0xad5d9304 Need to check if the memory dump above look like valid object. Of course, these and the logs tell me nothing about the possible cause.
        Hide
        liang Liang Guo (Inactive) added a comment -

        An existing cluster of nodes [146,152, 145] were in rebalance while nodes 147, 149, 148 came up. After rebalance completed, master 145 added these new nodes to it cluster, and started rebalance. It immediately exited, because memcached 148 faulted due to seg fault. Now, memcached 148 was new because couchbase server just started on it, and it was supposedly not in any tap replication business yet. However, it apparently picked up some garbage from its tap connection map. How did this happen? No idea. I don't think we should initialize this std::map 'cause it should start empty.

        Show
        liang Liang Guo (Inactive) added a comment - An existing cluster of nodes [146,152, 145] were in rebalance while nodes 147, 149, 148 came up. After rebalance completed, master 145 added these new nodes to it cluster, and started rebalance. It immediately exited, because memcached 148 faulted due to seg fault. Now, memcached 148 was new because couchbase server just started on it, and it was supposedly not in any tap replication business yet. However, it apparently picked up some garbage from its tap connection map. How did this happen? No idea. I don't think we should initialize this std::map 'cause it should start empty.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        please reopen if you see this crash again

        Show
        farshid Farshid Ghods (Inactive) added a comment - please reopen if you see this crash again
        Hide
        maria Maria McDuff (Inactive) added a comment -

        andrei, pls check if this crash still happens in current 2.0.2 build. close if no longer happening. thanks.

        Show
        maria Maria McDuff (Inactive) added a comment - andrei, pls check if this crash still happens in current 2.0.2 build. close if no longer happening. thanks.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        not verifiable by QE.

        Show
        maria Maria McDuff (Inactive) added a comment - not verifiable by QE.

          People

          • Assignee:
            andreibaranouski Andrei Baranouski
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes