Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
6.5.0
-
6.5.0-4908
-
Triaged
-
Centos 64-bit
-
Yes
Description
Script to Repro
./testrunner -i /tmp/testexec.6198.ini -p get-cbcollect-info=False,bucket_type=ephemeral,GROUP=P1_Set2,get-cbcollect-info=True -t rebalance.rebalanceinout.RebalanceInOutTests.test_incremental_rebalance_in_out_with_mutation_and_expiration,items=100000,value_size=512,max_verify=100000,zone=2,GROUP=IN_OUT;P1;P1_Set2
|
Test to repro
Rebalances nodes into and out of the cluster while doing mutations and
|
expirations. Use 'zone' param to have nodes divided into server groups
|
by having zone > 1.
|
|
This test begins by loading a given number of items into the cluster.
|
It then adds one node, rebalances that node into the cluster, and then
|
rebalances it back out. During the rebalancing we update half of the
|
items in the cluster and expire the other half. Once the node has been
|
removed and added back we recreate the expired items, wait for the
|
disk queues to drain, and then verify that there has been no data loss,
|
sum(curr_items) match the curr_items_total.We then remove and
|
add back two nodes at a time and so on until we have reached the point
|
where we are adding back and removing at least half of the nodes.
|
Rebalance failure
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed
|
2019-12-01 22:08:23 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.104.211:
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.104.216\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:20.343Z', u'module': u'ns_memcached', u'tstamp': 1575266900343, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@172.23.104.216' disconnected. Check logs for details.", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.303Z', u'module': u'ns_memcached', u'tstamp': 1575266899303, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2019-12-01T22:08:19.277234-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f6273aac000+0x8f213]\n2019-12-01T22:08:19.277277-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x74098]\n2019-12-01T22:08:19.277296-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77434]\n2019-12-01T22:08:19.277314-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77843]\n2019-12-01T22:08:19.277334-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77924]\n2019-12-01T22:08:19.277352-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x809f9]\n2019-12-01T22:08:19.277373-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x12f964]\n2019-12-01T22:08:19.277385-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f6275955000+0x8ee7]\n2019-12-01T22:08:19.277401-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f6273377000+0x7dd5]\n2019-12-01T22:08:19.277475-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f6272faa000+0xfdead]", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.297Z', u'module': u'ns_log', u'tstamp': 1575266899297, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'auto-reprovision is disabled as maximum number of nodes (1) that can be auto-reprovisioned has been reached.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.669Z', u'module': u'auto_reprovision', u'tstamp': 1575266898669, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'Bucket "default" has been reprovisioned on following nodes: [\'ns_1@172.23.104.220\']. Nodes on which the data service restarted: [\'ns_1@172.23.104.220\',\n \'ns_1@172.23.104.243\'].', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.668Z', u'module': u'auto_reprovision', u'tstamp': 1575266898668, u'type': u'info'}
|
Backtrace from gdb
(gdb) bt
|
#0 0x00007f6272fe0207 in __gconv_transform_internal_ucs2reverse () from /usr/lib64/libc-2.17.so
|
#1 0x0000000000000006 in ?? ()
|
#2 0x00007f6273025dc3 in wprintf () from /usr/lib64/libc-2.17.so
|
#3 0x0000000000000001 in ?? ()
|
#4 0x0000000a3affb1f0 in ?? ()
|
#5 0x000000020000000e in ?? ()
|
#6 0x00007f623affd600 in ?? ()
|
#7 0x00007f623affb190 in ?? ()
|
#8 0x00007f6271b5f400 in ?? ()
|
#9 0x0000000000000068 in ?? ()
|
#10 0x000000003affd600 in ?? ()
|
#11 0x00007f623affb230 in ?? ()
|
#12 0x00007f623affbe20 in ?? ()
|
#13 0x0000000000000068 in ?? ()
|
#14 0x00007f6272a00980 in ?? ()
|
#15 0x00007f6274e5fd58 in tcache_alloc_small (slow_path=false, zero=false, binind=10, size=0, tcache=0x7f62730258ce <putwc_unlocked+30>, arena=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:60
|
#16 arena_malloc (slow_path=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, arena=0x0, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:165
|
#17 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:53
|
#18 imalloc_no_sample (ind=10, usize=0, size=0, tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:1949
|
#19 imalloc_body (tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2123
|
#20 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2258
|
#21 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2289
|
#22 0x00007f627596043c in cb_malloc (size=0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_malloc.cc:51
|
#23 0x00007f6276a000b9 in operator new (count=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/global_new_replacement.cc:71
|
#24 0x00007f626e4faf71 in MutationResponse (sid=..., enableExpiryOut=Yes, includeCollectionID=(unknown: 32), includeDeleteTime=(unknown: 162), includeXattrs=Yes, includeVal=Yes, opaque=2, item=..., this=0x7f6238814c10)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/response.h:429
|
#25 make_unique<MutationResponse, SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > const&, unsigned int const&, IncludeValue const&, IncludeXattrs const&, IncludeDeleteTime const&, DocKeyEncodesCollectionId const&, EnableExpiryOutput const&, cb::mcbp::DcpStreamId const&> () at /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825
|
#26 ActiveStream::makeResponseFromItem (this=<optimized out>, item=..., sendCommitSyncWriteAs=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1029
|
#27 0x00007f626e4ff434 in ActiveStream::processItems (this=0x7f623affb3b0, this@entry=0x7f6238814c10, outstandingItemsResult=..., streamMutex=...)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1101
|
#28 0x00007f626e4ff843 in ActiveStream::nextCheckpointItemTask (this=this@entry=0x7f6238814c10, streamMutex=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:868
|
#29 0x00007f626e4ff924 in ActiveStream::nextCheckpointItemTask (this=0x7f6238814c10) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:858
|
#30 0x00007f626e5089f9 in ActiveStreamCheckpointProcessorTask::run (this=0x7f6238819110) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream_checkpoint_processor_task.cc:56
|
#31 0x00007f626e5b7964 in ExecutorThread::run (this=0x7f6271b97960) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:187
|
#32 0x00007f627595dee7 in run (this=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
|
#33 platform_thread_wrap (arg=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
|
#34 0x00007f627337edd5 in start_thread () from /usr/lib64/libpthread-2.17.so
|
#35 0x00007f62730a7ead in tdestroy_recurse () from /usr/lib64/libc-2.17.so
|
#36 0x0000000000000000 in ?? ()
|
(gdb)
|
cbcollect_info attached.
Last successful run was on 6.5.0-4897.
Attachments
Issue Links
- duplicates
-
MB-37103 [System test]: Disk Checkpoint does not have an initialised HCS
- Closed