Details
-
Task
-
Resolution: Duplicate
-
Major
-
2.0.1
-
Security Level: Public
-
None
-
CentOS5.7 x64
Description
http://qa.hq.northscale.net/job/centos-64-2.0-new-rebalance-mixed-cluster/51/consoleFull
./testrunner -i /tmp/rebalance_in.ini get-logs=True,wait_timeout=180,GROUP=P0,EXCLUDE_GROUP=FROM_2_0,get-cbcollect-info=True -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_with_warming_uprebalance_out_with_warming_up (rebalance.rebalanceout.RebalanceOutTests) ... ,nodes_out=3,items=500000,replicas=2,max_verify=100000,GROUP=OUT;P0
1.8.1.8.1-937-rel
[10.3.3.92]
[10.3.3.93]
[10.3.3.94]
2.0.1-170 nodes
[10.3.3.99]
[10.3.3.82]
[10.3.3.91]
[10.3.3.97]
test logs & UI logs:
[2013-03-03 14:41:45,119] - [rest_client:804] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.3.3.91%2Cns_1%4010.3.3.97%2Cns_1%4010.3.3.94&user=Administrator&knownNodes=ns_1%4010.3.3.91%2Cns_1%4010.3.3.92%2Cns_1%4010.3.3.94%2Cns_1%4010.3.3.82%2Cns_1%4010.3.3.93%2Cns_1%4010.3.3.99%2Cns_1%4010.3.3.97
[2013-03-03 14:41:45,148] - [rest_client:808] INFO - rebalance operation started
[2013-03-03 14:41:45,190] - [rest_client:905] INFO - rebalance percentage : 0 %
[2013-03-03 14:41:55,201] - [rest_client:905] INFO - rebalance percentage : 3.18635171224 %
[2013-03-03 14:42:05,214] - [rest_client:905] INFO - rebalance percentage : 7.26072098483 %
[2013-03-03 14:42:15,228] - [rest_client:905] INFO - rebalance percentage : 11.2492449683 %
[2013-03-03 14:42:25,264] - [rest_client:905] INFO - rebalance percentage : 15.1684030474 %
[2013-03-03 14:42:35,291] - [rest_client:888] ERROR -
- rebalance failed
[2013-03-03 14:42:35,292] - [rest_client:889] INFO - Latest logs from UI:
[2013-03-03 14:42:35,362] - [rest_client:890] ERROR -
[2013-03-03 14:42:35,362] - [rest_client:890] ERROR - {u'node': u'ns_1@10.3.3.91', u'code': 2, u'text': u"Rebalance exited with reason badmatch,{error,closed,\n {gen_server,call,\n [
{'ns_memcached-default','ns_1@10.3.3.99'},\n
{set_vbucket,791,replica},\n 180000]}}\n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1362350785366.0, u'type': u'info'}
[2013-03-03 14:42:35,363] - [rest_client:890] ERROR - {u'node': u'ns_1@10.3.3.99', u'code': 4, u'text': u"Control connection to memcached on 'ns_1@10.3.3.99' disconnected: {badmatch,\n {error,\n closed}}", u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1362350785325.0, u'type': u'info'}
[2013-03-03 14:42:35,363] - [rest_client:890] ERROR -
[2013-03-03 14:42:35,364] - [rest_client:890] ERROR -
{u'node': u'ns_1@10.3.3.91', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.3.3.92','ns_1@10.3.3.82',\n 'ns_1@10.3.3.93','ns_1@10.3.3.99'], EjectNodes = ['ns_1@10.3.3.91',\n 'ns_1@10.3.3.97',\n 'ns_1@10.3.3.94']\n (repeated 1 times)", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1362350764434.0, u'type': u'info'}[2013-03-03 14:42:35,364] - [rest_client:890] ERROR -
{u'node': u'ns_1@10.3.3.91', u'code': 0, u'text': u'Bucket "default" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1362350742736.0, u'type': u'info'}[2013-03-03 14:42:35,365] - [rest_client:890] ERROR -
{u'node': u'ns_1@10.3.3.91', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'module': u'ns_rebalancer', u'tstamp': 1362350742057.0, u'type': u'info'}[2013-03-03 14:42:35,365] - [rest_client:890] ERROR -
{u'node': u'ns_1@10.3.3.82', u'code': 1, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.3.82\' in 2 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1362350739122.0, u'type': u'info'}[2013-03-03 14:42:35,366] - [rest_client:890] ERROR -
{u'node': u'ns_1@10.3.3.82', u'code': 1, u'text': u"Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.82'.", u'shortText': u'web start ok', u'module': u'menelaus_sup', u'tstamp': 1362350734988.0, u'type': u'info'}[2013-03-03 14:42:35,366] - [rest_client:890] ERROR -
{u'node': u'ns_1@10.3.3.97', u'code': 4, u'text': u"Node 'ns_1@10.3.3.97' saw that node 'ns_1@10.3.3.82' came up.", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1362350734608.0, u'type': u'info'}andrey@baranouski:~/repository/testrunner$ ssh rooot@10.3.3.99
rooot@10.3.3.99's password:
Permission denied, please try again.
rooot@10.3.3.99's password:
Permission denied, please try again.
rooot@10.3.3.99's password:
andrey@baranouski:~/repository/testrunner$ ssh root@10.3.3.99
root@10.3.3.99's password:
Permission denied, please try again.
root@10.3.3.99's password:
Last login: Tue Feb 5 07:46:07 2013 from 10.32.26.65
[root@caper-007 ~]# cd /tmp/
[root@caper-007 tmp]# sudo gdb /opt/couchbase/bin/memcached core.memcached.17695
GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/couchbase/bin/memcached...done.
[New Thread 17718]
[New Thread 17720]
[New Thread 17719]
[New Thread 17717]
[New Thread 17716]
[New Thread 17715]
[New Thread 17713]
[New Thread 17712]
[New Thread 17711]
[New Thread 17710]
[New Thread 17709]
[New Thread 17704]
[New Thread 17703]
[New Thread 17695]
Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done.
Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0
Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done.
Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done.
Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/ep.so
Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done.
Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff819fd000
Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
Program terminated with signal 11, Segmentation fault.
#0 add_conn_to_pending_io_list (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:722
722 daemon/thread.c: No such file or directory.
in daemon/thread.c
(gdb) t a a bt
Thread 14 (Thread 0x2b9d8dcb6240 (LWP 17695)):
#0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546000, tv=<value optimized out>) at epoll.c:404
#2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546000, flags=<value optimized out>) at event.c:1558
#3 0x0000000000409742 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7918
Thread 13 (Thread 17703):
#0 0x00002b9d8d51445b in read () from /lib64/libc.so.6
#1 0x00002b9d8d4ba677 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x00002b9d8d4bb03e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x00002b9d8d4b0124 in _IO_getline_info_internal () from /lib64/libc.so.6
#4 0x00002b9d8d4aefc9 in fgets () from /lib64/libc.so.6
#5 0x00002b9d8dcb7939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37
#6 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#7 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 12 (Thread 17704):
#0 0x00002b9d8d23f1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0x11cde040) at extensions/loggers/file_logger.c:368
#2 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#3 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 11 (Thread 17709):
#0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546500, tv=<value optimized out>) at epoll.c:404
#2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546500, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414504 in worker_libevent (arg=0x11ce1900) at daemon/thread.c:301
#4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 10 (Thread 17710):
#0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546280, tv=<value optimized out>) at epoll.c:404
#2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546280, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414504 in worker_libevent (arg=0x11ce19f8) at daemon/thread.c:301
#4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 9 (Thread 17711):
#0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546c80, tv=<value optimized out>) at epoll.c:404
#2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546c80, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414504 in worker_libevent (arg=0x11ce1af0) at daemon/thread.c:301
#4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 8 (Thread 17712):
#0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546a00, tv=<value optimized out>) at epoll.c:404
#2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546a00, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414504 in worker_libevent (arg=0x11ce1be8) at daemon/thread.c:301
#4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
--Type <return> to continue, or q <return> to quit--
Thread 7 (Thread 17713):
#0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546780, tv=<value optimized out>) at epoll.c:404
#2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546780, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414504 in worker_libevent (arg=0x11ce1ce0) at daemon/thread.c:301
#4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 6 (Thread 17715):
#0 0x00002b9d8d4e8221 in nanosleep () from /lib64/libc.so.6
#1 0x00002b9d8d51bba4 in usleep () from /lib64/libc.so.6
#2 0x00002aaaaaf31945 in updateStatsThread (arg=0x11cde4c0) at src/memory_tracker.cc:31
#3 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#4 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 5 (Thread 17716):
#0 0x00002b9d8d005ed9 in (anonymous namespace)::GetSizeWithCallback(void const*, unsigned long (void const*)) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#1 0x00002b9d8d006669 in TCMallocImplementation::GetAllocatedSize(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#2 0x00002b9d8d0172e8 in MallocExtension_GetAllocatedSize () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#3 0x00002aaaaaf316b4 in NewHook (ptr=0x1a6fa3c0) at src/memory_tracker.cc:48
#4 0x00002b9d8d0138e5 in MallocHook::InvokeNewHookSlow(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#5 0x00002b9d8d006ca7 in MallocHook::InvokeNewHook(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#6 0x00002b9d8d0198a4 in tc_new () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#7 0x00002aaaaaf6f638 in CouchKVStore::set (this=0x165b0000, itm=..., cb=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:343
#8 0x00002aaaaaefd234 in EventuallyPersistentStore::flushOneDelOrSet (this=0x1653e480, qi=..., rejectQueue=std::queue wrapping: std::deque with 0 elements, vb=...) at src/ep.cc:2420
#9 0x00002aaaaaefd4fb in EventuallyPersistentStore::flushOne (this=0x1653e480, queue=<value optimized out>, rejectQueue=std::queue wrapping: std::deque with 0 elements, vb=...) at src/ep.cc:2468
#10 0x00002aaaaaf00ff5 in EventuallyPersistentStore::flushVBQueue (this=0x1653e480, vb=..., vb_queue=std::queue wrapping: std::deque with 250 elements =
, vbid=565, data_age=0) at src/ep.cc:2022
#11 0x00002aaaaaf0224c in EventuallyPersistentStore::flushOutgoingQueue (this=0x1653e480, flushQueue=0x1653e748, flushPhase=@0x1653c570, nextVbid=@0x1653c578) at src/ep.cc:1964
#12 0x00002aaaaaf2b9cc in Flusher::doFlush (this=0x1653c480) at src/flusher.cc:245
#13 0x00002aaaaaf2c805 in Flusher::step (this=0x1653c480, d=..., tid=...) at src/flusher.cc:158
#14 0x00002aaaaaef473a in Dispatcher::run (this=0x16582c40) at src/dispatcher.cc:173
#15 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16582c40) at src/dispatcher.cc:28
#16 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#17 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 4 (Thread 17717):
#0 0x00002b9d8d23f1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaef2078 in wait (this=0x165c6090, d=...) at src/syncobject.hh:58
#2 IdleTask::run (this=0x165c6090, d=...) at src/dispatcher.cc:336
#3 0x00002aaaaaef473a in Dispatcher::run (this=0x16582a80) at src/dispatcher.cc:173
#4 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16582a80) at src/dispatcher.cc:28
#5 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#6 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 3 (Thread 17719):
#0 get (mem=48) at src/atomic.hh:86
#1 operator bool (mem=48) at src/atomic.hh:95
#2 ObjectRegistry::memoryAllocated (mem=48) at src/objectregistry.cc:137
#3 0x00002b9d8d0138e5 in MallocHook::InvokeNewHookSlow(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#4 0x00002b9d8d006ca7 in MallocHook::InvokeNewHook(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#5 0x00002b9d8d0198a4 in tc_new () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
#6 0x00002b9d8d842861 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) ()
from /usr/lib64/libstdc++.so.6
#7 0x00002b9d8d843365 in ?? () from /usr/lib64/libstdc++.so.6
#8 0x00002b9d8d84345a in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, unsigned long, std::allocator<char> const&) () from /usr/lib64/libstdc++.so.6
--Type <return> to continue, or q <return> to quit--
#9 0x00002aaaaaee3dd9 in getKey (this=0x16d5fe00, v=<value optimized out>) at src/stored-value.hh:195
#10 BackFillVisitor::visit (this=0x16d5fe00, v=<value optimized out>) at src/backfill.cc:143
#11 0x00002aaaaaf36c75 in HashTable::visit (this=0x171bbc08, visitor=...) at src/stored-value.cc:404
#12 0x00002aaaaaef8bc2 in VBCBAdaptor::callback (this=0x1e5197a0, d=..., t=...) at src/ep.cc:2850
#13 0x00002aaaaaef473a in Dispatcher::run (this=0x165836c0) at src/dispatcher.cc:173
#14 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x165836c0) at src/dispatcher.cc:28
#15 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#16 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 2 (Thread 17720):
#0 0x00002b9d8d23f1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaf10eaf in wait (this=0x16542000) at src/syncobject.hh:58
#2 wait (this=0x16542000) at src/syncobject.hh:74
#3 wait (this=0x16542000) at src/tapconnmap.hh:169
#4 EventuallyPersistentEngine::notifyPendingConnections (this=0x16542000) at src/ep_engine.cc:3423
#5 0x00002aaaaaf10f93 in EvpNotifyPendingConns (arg=0x16542000) at src/ep_engine.cc:1145
#6 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#7 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x47718940 (LWP 17718)):
#0 add_conn_to_pending_io_list (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:722
#1 notify_io_complete (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:488
#2 0x00002aaaaaf4a4fd in notifyIOComplete (this=<value optimized out>, tc=0x16d13400) at src/ep_engine.h:439
#3 TapConnMap::notifyPausedConnection_UNLOCKED (this=<value optimized out>, tc=0x16d13400) at src/tapconnmap.cc:347
#4 0x00002aaaaaee4901 in performTapOp<void*> (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/tapconnmap.hh:119
#5 BackfillDiskLoad::callback (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/backfill.cc:78
#6 0x00002aaaaaef473a in Dispatcher::run (this=0x16583880) at src/dispatcher.cc:173
#7 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16583880) at src/dispatcher.cc:28
#8 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
#9 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
(gdb)
(gdb) Quit
(gdb) quit