Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
6.5.0
-
None
-
6.5.0-4908
-
Triaged
-
Centos 64-bit
-
Yes
Description
Script to Repro
./testrunner -i /tmp/testexec.6198.ini -p get-cbcollect-info=False,bucket_type=ephemeral,GROUP=P1_Set2,get-cbcollect-info=True -t rebalance.rebalanceinout.RebalanceInOutTests.test_incremental_rebalance_in_out_with_mutation_and_expiration,items=100000,value_size=512,max_verify=100000,zone=2,GROUP=IN_OUT;P1;P1_Set2
|
Test to repro
Rebalances nodes into and out of the cluster while doing mutations and
|
expirations. Use 'zone' param to have nodes divided into server groups
|
by having zone > 1.
|
|
This test begins by loading a given number of items into the cluster.
|
It then adds one node, rebalances that node into the cluster, and then
|
rebalances it back out. During the rebalancing we update half of the
|
items in the cluster and expire the other half. Once the node has been
|
removed and added back we recreate the expired items, wait for the
|
disk queues to drain, and then verify that there has been no data loss,
|
sum(curr_items) match the curr_items_total.We then remove and
|
add back two nodes at a time and so on until we have reached the point
|
where we are adding back and removing at least half of the nodes.
|
Rebalance failure
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed
|
2019-12-01 22:08:23 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.104.211:
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.104.216\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:20.343Z', u'module': u'ns_memcached', u'tstamp': 1575266900343, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@172.23.104.216' disconnected. Check logs for details.", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.303Z', u'module': u'ns_memcached', u'tstamp': 1575266899303, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2019-12-01T22:08:19.277234-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f6273aac000+0x8f213]\n2019-12-01T22:08:19.277277-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x74098]\n2019-12-01T22:08:19.277296-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77434]\n2019-12-01T22:08:19.277314-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77843]\n2019-12-01T22:08:19.277334-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77924]\n2019-12-01T22:08:19.277352-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x809f9]\n2019-12-01T22:08:19.277373-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x12f964]\n2019-12-01T22:08:19.277385-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f6275955000+0x8ee7]\n2019-12-01T22:08:19.277401-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f6273377000+0x7dd5]\n2019-12-01T22:08:19.277475-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f6272faa000+0xfdead]", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.297Z', u'module': u'ns_log', u'tstamp': 1575266899297, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'auto-reprovision is disabled as maximum number of nodes (1) that can be auto-reprovisioned has been reached.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.669Z', u'module': u'auto_reprovision', u'tstamp': 1575266898669, u'type': u'info'}
|
2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'Bucket "default" has been reprovisioned on following nodes: [\'ns_1@172.23.104.220\']. Nodes on which the data service restarted: [\'ns_1@172.23.104.220\',\n \'ns_1@172.23.104.243\'].', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.668Z', u'module': u'auto_reprovision', u'tstamp': 1575266898668, u'type': u'info'}
|
Backtrace from gdb
(gdb) bt
|
#0 0x00007f6272fe0207 in __gconv_transform_internal_ucs2reverse () from /usr/lib64/libc-2.17.so
|
#1 0x0000000000000006 in ?? ()
|
#2 0x00007f6273025dc3 in wprintf () from /usr/lib64/libc-2.17.so
|
#3 0x0000000000000001 in ?? ()
|
#4 0x0000000a3affb1f0 in ?? ()
|
#5 0x000000020000000e in ?? ()
|
#6 0x00007f623affd600 in ?? ()
|
#7 0x00007f623affb190 in ?? ()
|
#8 0x00007f6271b5f400 in ?? ()
|
#9 0x0000000000000068 in ?? ()
|
#10 0x000000003affd600 in ?? ()
|
#11 0x00007f623affb230 in ?? ()
|
#12 0x00007f623affbe20 in ?? ()
|
#13 0x0000000000000068 in ?? ()
|
#14 0x00007f6272a00980 in ?? ()
|
#15 0x00007f6274e5fd58 in tcache_alloc_small (slow_path=false, zero=false, binind=10, size=0, tcache=0x7f62730258ce <putwc_unlocked+30>, arena=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:60
|
#16 arena_malloc (slow_path=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, arena=0x0, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:165
|
#17 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:53
|
#18 imalloc_no_sample (ind=10, usize=0, size=0, tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:1949
|
#19 imalloc_body (tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2123
|
#20 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2258
|
#21 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2289
|
#22 0x00007f627596043c in cb_malloc (size=0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_malloc.cc:51
|
#23 0x00007f6276a000b9 in operator new (count=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/global_new_replacement.cc:71
|
#24 0x00007f626e4faf71 in MutationResponse (sid=..., enableExpiryOut=Yes, includeCollectionID=(unknown: 32), includeDeleteTime=(unknown: 162), includeXattrs=Yes, includeVal=Yes, opaque=2, item=..., this=0x7f6238814c10)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/response.h:429
|
#25 make_unique<MutationResponse, SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > const&, unsigned int const&, IncludeValue const&, IncludeXattrs const&, IncludeDeleteTime const&, DocKeyEncodesCollectionId const&, EnableExpiryOutput const&, cb::mcbp::DcpStreamId const&> () at /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825
|
#26 ActiveStream::makeResponseFromItem (this=<optimized out>, item=..., sendCommitSyncWriteAs=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1029
|
#27 0x00007f626e4ff434 in ActiveStream::processItems (this=0x7f623affb3b0, this@entry=0x7f6238814c10, outstandingItemsResult=..., streamMutex=...)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1101
|
#28 0x00007f626e4ff843 in ActiveStream::nextCheckpointItemTask (this=this@entry=0x7f6238814c10, streamMutex=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:868
|
#29 0x00007f626e4ff924 in ActiveStream::nextCheckpointItemTask (this=0x7f6238814c10) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:858
|
#30 0x00007f626e5089f9 in ActiveStreamCheckpointProcessorTask::run (this=0x7f6238819110) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream_checkpoint_processor_task.cc:56
|
#31 0x00007f626e5b7964 in ExecutorThread::run (this=0x7f6271b97960) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:187
|
#32 0x00007f627595dee7 in run (this=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
|
#33 platform_thread_wrap (arg=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
|
#34 0x00007f627337edd5 in start_thread () from /usr/lib64/libpthread-2.17.so
|
#35 0x00007f62730a7ead in tdestroy_recurse () from /usr/lib64/libc-2.17.so
|
#36 0x0000000000000000 in ?? ()
|
(gdb)
|
cbcollect_info attached.
Last successful run was on 6.5.0-4897.
Attachments
Issue Links
- duplicates
-
MB-37103 [System test]: Disk Checkpoint does not have an initialised HCS
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Is this a Regression? | Unknown [ 10452 ] | Yes [ 10450 ] |
Description |
+Script to Repro+
{noformat} ./testrunner -i /tmp/testexec.6198.ini -p get-cbcollect-info=False,bucket_type=ephemeral,GROUP=P1_Set2,get-cbcollect-info=True -t rebalance.rebalanceinout.RebalanceInOutTests.test_incremental_rebalance_in_out_with_mutation_and_expiration,items=100000,value_size=512,max_verify=100000,zone=2,GROUP=IN_OUT;P1;P1_Set2 {noformat} +Test to repro+ {noformat} Rebalances nodes into and out of the cluster while doing mutations and expirations. Use 'zone' param to have nodes divided into server groups by having zone > 1. This test begins by loading a given number of items into the cluster. It then adds one node, rebalances that node into the cluster, and then rebalances it back out. During the rebalancing we update half of the items in the cluster and expire the other half. Once the node has been removed and added back we recreate the expired items, wait for the disk queues to drain, and then verify that there has been no data loss, sum(curr_items) match the curr_items_total.We then remove and add back two nodes at a time and so on until we have reached the point where we are adding back and removing at least half of the nodes. {noformat} +Rebalance failure+ {noformat} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed 2019-12-01 22:08:23 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.104.211: 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.104.216\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:20.343Z', u'module': u'ns_memcached', u'tstamp': 1575266900343, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@172.23.104.216' disconnected. Check logs for details.", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.303Z', u'module': u'ns_memcached', u'tstamp': 1575266899303, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2019-12-01T22:08:19.277234-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f6273aac000+0x8f213]\n2019-12-01T22:08:19.277277-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x74098]\n2019-12-01T22:08:19.277296-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77434]\n2019-12-01T22:08:19.277314-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77843]\n2019-12-01T22:08:19.277334-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77924]\n2019-12-01T22:08:19.277352-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x809f9]\n2019-12-01T22:08:19.277373-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x12f964]\n2019-12-01T22:08:19.277385-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f6275955000+0x8ee7]\n2019-12-01T22:08:19.277401-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f6273377000+0x7dd5]\n2019-12-01T22:08:19.277475-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f6272faa000+0xfdead]", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.297Z', u'module': u'ns_log', u'tstamp': 1575266899297, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'auto-reprovision is disabled as maximum number of nodes (1) that can be auto-reprovisioned has been reached.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.669Z', u'module': u'auto_reprovision', u'tstamp': 1575266898669, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'Bucket "default" has been reprovisioned on following nodes: [\'ns_1@172.23.104.220\']. Nodes on which the data service restarted: [\'ns_1@172.23.104.220\',\n \'ns_1@172.23.104.243\'].', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.668Z', u'module': u'auto_reprovision', u'tstamp': 1575266898668, u'type': u'info'} {noformat} +Backtrace from gdb+ {noformat} (gdb) bt #0 0x00007f6272fe0207 in __gconv_transform_internal_ucs2reverse () from /usr/lib64/libc-2.17.so #1 0x0000000000000006 in ?? () #2 0x00007f6273025dc3 in wprintf () from /usr/lib64/libc-2.17.so #3 0x0000000000000001 in ?? () #4 0x0000000a3affb1f0 in ?? () #5 0x000000020000000e in ?? () #6 0x00007f623affd600 in ?? () #7 0x00007f623affb190 in ?? () #8 0x00007f6271b5f400 in ?? () #9 0x0000000000000068 in ?? () #10 0x000000003affd600 in ?? () #11 0x00007f623affb230 in ?? () #12 0x00007f623affbe20 in ?? () #13 0x0000000000000068 in ?? () #14 0x00007f6272a00980 in ?? () #15 0x00007f6274e5fd58 in tcache_alloc_small (slow_path=false, zero=false, binind=10, size=0, tcache=0x7f62730258ce <putwc_unlocked+30>, arena=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:60 #16 arena_malloc (slow_path=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, arena=0x0, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:165 #17 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:53 #18 imalloc_no_sample (ind=10, usize=0, size=0, tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:1949 #19 imalloc_body (tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2123 #20 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2258 #21 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2289 #22 0x00007f627596043c in cb_malloc (size=0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_malloc.cc:51 #23 0x00007f6276a000b9 in operator new (count=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/global_new_replacement.cc:71 #24 0x00007f626e4faf71 in MutationResponse (sid=..., enableExpiryOut=Yes, includeCollectionID=(unknown: 32), includeDeleteTime=(unknown: 162), includeXattrs=Yes, includeVal=Yes, opaque=2, item=..., this=0x7f6238814c10) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/response.h:429 #25 make_unique<MutationResponse, SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > const&, unsigned int const&, IncludeValue const&, IncludeXattrs const&, IncludeDeleteTime const&, DocKeyEncodesCollectionId const&, EnableExpiryOutput const&, cb::mcbp::DcpStreamId const&> () at /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825 #26 ActiveStream::makeResponseFromItem (this=<optimized out>, item=..., sendCommitSyncWriteAs=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1029 #27 0x00007f626e4ff434 in ActiveStream::processItems (this=0x7f623affb3b0, this@entry=0x7f6238814c10, outstandingItemsResult=..., streamMutex=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1101 #28 0x00007f626e4ff843 in ActiveStream::nextCheckpointItemTask (this=this@entry=0x7f6238814c10, streamMutex=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:868 #29 0x00007f626e4ff924 in ActiveStream::nextCheckpointItemTask (this=0x7f6238814c10) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:858 #30 0x00007f626e5089f9 in ActiveStreamCheckpointProcessorTask::run (this=0x7f6238819110) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream_checkpoint_processor_task.cc:56 #31 0x00007f626e5b7964 in ExecutorThread::run (this=0x7f6271b97960) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:187 #32 0x00007f627595dee7 in run (this=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58 #33 platform_thread_wrap (arg=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71 #34 0x00007f627337edd5 in start_thread () from /usr/lib64/libpthread-2.17.so #35 0x00007f62730a7ead in tdestroy_recurse () from /usr/lib64/libc-2.17.so #36 0x0000000000000000 in ?? () (gdb) {noformat} cbcollect_info attached. |
+Script to Repro+
{noformat} ./testrunner -i /tmp/testexec.6198.ini -p get-cbcollect-info=False,bucket_type=ephemeral,GROUP=P1_Set2,get-cbcollect-info=True -t rebalance.rebalanceinout.RebalanceInOutTests.test_incremental_rebalance_in_out_with_mutation_and_expiration,items=100000,value_size=512,max_verify=100000,zone=2,GROUP=IN_OUT;P1;P1_Set2 {noformat} +Test to repro+ {noformat} Rebalances nodes into and out of the cluster while doing mutations and expirations. Use 'zone' param to have nodes divided into server groups by having zone > 1. This test begins by loading a given number of items into the cluster. It then adds one node, rebalances that node into the cluster, and then rebalances it back out. During the rebalancing we update half of the items in the cluster and expire the other half. Once the node has been removed and added back we recreate the expired items, wait for the disk queues to drain, and then verify that there has been no data loss, sum(curr_items) match the curr_items_total.We then remove and add back two nodes at a time and so on until we have reached the point where we are adding back and removing at least half of the nodes. {noformat} +Rebalance failure+ {noformat} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed 2019-12-01 22:08:23 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.104.211: 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.104.216\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:20.343Z', u'module': u'ns_memcached', u'tstamp': 1575266900343, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@172.23.104.216' disconnected. Check logs for details.", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.303Z', u'module': u'ns_memcached', u'tstamp': 1575266899303, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.216', u'code': 0, u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2019-12-01T22:08:19.277234-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f6273aac000+0x8f213]\n2019-12-01T22:08:19.277277-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x74098]\n2019-12-01T22:08:19.277296-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77434]\n2019-12-01T22:08:19.277314-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77843]\n2019-12-01T22:08:19.277334-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x77924]\n2019-12-01T22:08:19.277352-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x809f9]\n2019-12-01T22:08:19.277373-08:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f626e488000+0x12f964]\n2019-12-01T22:08:19.277385-08:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f6275955000+0x8ee7]\n2019-12-01T22:08:19.277401-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f6273377000+0x7dd5]\n2019-12-01T22:08:19.277475-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f6272faa000+0xfdead]", u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:19.297Z', u'module': u'ns_log', u'tstamp': 1575266899297, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'auto-reprovision is disabled as maximum number of nodes (1) that can be auto-reprovisioned has been reached.', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.669Z', u'module': u'auto_reprovision', u'tstamp': 1575266898669, u'type': u'info'} 2019-12-01 22:08:23 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.104.211', u'code': 0, u'text': u'Bucket "default" has been reprovisioned on following nodes: [\'ns_1@172.23.104.220\']. Nodes on which the data service restarted: [\'ns_1@172.23.104.220\',\n \'ns_1@172.23.104.243\'].', u'shortText': u'message', u'serverTime': u'2019-12-01T22:08:18.668Z', u'module': u'auto_reprovision', u'tstamp': 1575266898668, u'type': u'info'} {noformat} +Backtrace from gdb+ {noformat} (gdb) bt #0 0x00007f6272fe0207 in __gconv_transform_internal_ucs2reverse () from /usr/lib64/libc-2.17.so #1 0x0000000000000006 in ?? () #2 0x00007f6273025dc3 in wprintf () from /usr/lib64/libc-2.17.so #3 0x0000000000000001 in ?? () #4 0x0000000a3affb1f0 in ?? () #5 0x000000020000000e in ?? () #6 0x00007f623affd600 in ?? () #7 0x00007f623affb190 in ?? () #8 0x00007f6271b5f400 in ?? () #9 0x0000000000000068 in ?? () #10 0x000000003affd600 in ?? () #11 0x00007f623affb230 in ?? () #12 0x00007f623affbe20 in ?? () #13 0x0000000000000068 in ?? () #14 0x00007f6272a00980 in ?? () #15 0x00007f6274e5fd58 in tcache_alloc_small (slow_path=false, zero=false, binind=10, size=0, tcache=0x7f62730258ce <putwc_unlocked+30>, arena=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:60 #16 arena_malloc (slow_path=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, arena=0x0, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:165 #17 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0x7f62730258ce <putwc_unlocked+30>, zero=false, ind=10, size=0, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:53 #18 imalloc_no_sample (ind=10, usize=0, size=0, tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:1949 #19 imalloc_body (tsd=0x7f627336d3a0 <_IO_obstack_jumps+128>, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2123 #20 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2258 #21 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2289 #22 0x00007f627596043c in cb_malloc (size=0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_malloc.cc:51 #23 0x00007f6276a000b9 in operator new (count=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/global_new_replacement.cc:71 #24 0x00007f626e4faf71 in MutationResponse (sid=..., enableExpiryOut=Yes, includeCollectionID=(unknown: 32), includeDeleteTime=(unknown: 162), includeXattrs=Yes, includeVal=Yes, opaque=2, item=..., this=0x7f6238814c10) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/response.h:429 #25 make_unique<MutationResponse, SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> > const&, unsigned int const&, IncludeValue const&, IncludeXattrs const&, IncludeDeleteTime const&, DocKeyEncodesCollectionId const&, EnableExpiryOutput const&, cb::mcbp::DcpStreamId const&> () at /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825 #26 ActiveStream::makeResponseFromItem (this=<optimized out>, item=..., sendCommitSyncWriteAs=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1029 #27 0x00007f626e4ff434 in ActiveStream::processItems (this=0x7f623affb3b0, this@entry=0x7f6238814c10, outstandingItemsResult=..., streamMutex=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:1101 #28 0x00007f626e4ff843 in ActiveStream::nextCheckpointItemTask (this=this@entry=0x7f6238814c10, streamMutex=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:868 #29 0x00007f626e4ff924 in ActiveStream::nextCheckpointItemTask (this=0x7f6238814c10) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream.cc:858 #30 0x00007f626e5089f9 in ActiveStreamCheckpointProcessorTask::run (this=0x7f6238819110) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/active_stream_checkpoint_processor_task.cc:56 #31 0x00007f626e5b7964 in ExecutorThread::run (this=0x7f6271b97960) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/executorthread.cc:187 #32 0x00007f627595dee7 in run (this=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58 #33 platform_thread_wrap (arg=0x7f6271a6e670) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71 #34 0x00007f627337edd5 in start_thread () from /usr/lib64/libpthread-2.17.so #35 0x00007f62730a7ead in tdestroy_recurse () from /usr/lib64/libc-2.17.so #36 0x0000000000000000 in ?? () (gdb) {noformat} cbcollect_info attached. Last successful run was on 6.5.0-4897. |
Assignee | Daniel Owen [ owend ] | Dave Rigby [ drigby ] |
Component/s | memcached [ 11621 ] |
Fix Version/s | Mad-Hatter [ 15037 ] | |
Triage | Untriaged [ 10351 ] | Triaged [ 10350 ] |
Assignee | Dave Rigby [ drigby ] | Balakumaran Gopal [ balakumaran.gopal ] |
Resolution | Duplicate [ 3 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Summary | Rebalance fails and memcached crashes seen in rebalance in out tests | Rebalance fails and memcached crashes seen in Ephemeral rebalance in out tests |
Status | Resolved [ 5 ] | Closed [ 6 ] |