Details
Description
steps:
1. 3 nodes in cluster, 4 buckets. run data loader more then a day
2. setup replication from SRC to cluster DEST for all buckets.
3. rebalance in at SRC cluster
rebalance in at DEST cluster
4. Graceful Fail Over(rebalance) for node in SRC cluster, add back(Delta Recovery)
5. click failover, Hard Fail Over for node in SRC cluster A, add back(Full Recovery) and rebalance
6. remove node in SRC cluster, stop rebalance. Cancel removing node and rebalance
7. rebalance out 1 node on SRC cluster
8. rebalance out 1 node on DEST cluster
9. rebalance in 2 nodes on SRC cluster
please note MB-14983 & MB-14984 have been posted
without any actions with the cluster memcached crash generated on root@172.23.105.157 node. then the node is in pending state
I know that gdb stack trace is not useful here but anyway:
gdb /opt/couchbase/bin/memcached -c core.memcached.6391
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/couchbase/bin/memcached...(no debugging symbols found)...done.
BFD: Warning: /tmp/core.memcached.6391 is truncated: expected core file size >= 3021598720, found: 172032.
[New Thread 7536]
[New Thread 6391]
[New Thread 6405]
[New Thread 6400]
[New Thread 6404]
[New Thread 6402]
[New Thread 7537]
[New Thread 7526]
[New Thread 7528]
[New Thread 6407]
[New Thread 7529]
[New Thread 6401]
[New Thread 6403]
[New Thread 6408]
[New Thread 7531]
[New Thread 7534]
[New Thread 6406]
[New Thread 7530]
[New Thread 7532]
[New Thread 7533]
[New Thread 6409]
[New Thread 7535]
Cannot access memory at address 0x7fe10914f168
Cannot access memory at address 0x7fe10914f168
Cannot access memory at address 0x7fe10914f168
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Failed to read a valid object file image from memory.
Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
Program terminated with signal 6, Aborted.
#0 0x00007fe10616e8a5 in ?? ()
Missing separate debuginfos, use: debuginfo-install couchbase-server-4.0.0-2093.x86_64
(gdb) bt
#0 0x00007fe10616e8a5 in ?? ()
Cannot access memory at address 0x7fe0f54336e8
(gdb) t a a bt
Thread 22 (Thread 7535):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe0f5e321c0
Thread 21 (Thread 6409):
#0 0x00007fe106224f03 in ?? ()
Cannot access memory at address 0x7fe0fdde9c80
Thread 20 (Thread 7533):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe0f72341c0
Thread 19 (Thread 7532):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe0f7c351c0
Thread 18 (Thread 7530):
#0 0x00007fe1070867bb in ?? ()
Cannot access memory at address 0x7fe0f9039ac0
Thread 17 (Thread 6406):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe0ffbeaeb0
Thread 16 (Thread 7534):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe0f68331c0
Thread 15 (Thread 7531):
#0 0x00007fe1070867bb in ?? ()
Cannot access memory at address 0x7fe0f8638ac0
Thread 14 (Thread 6408):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe0fe7e8eb0
Thread 13 (Thread 6403):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe1019edeb0
Thread 12 (Thread 6401):
#0 0x00007fe10621766d in ?? ()
Cannot access memory at address 0x7fe102ffec40
Thread 11 (Thread 7529):
#0 0x00007fe1070867bb in ?? ()
Cannot access memory at address 0x7fe0f9a3aac0
Thread 10 (Thread 6407):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe0ff1e9eb0
Thread 9 (Thread 7528):
#0 0x00007fe1070867bb in ?? ()
Cannot access memory at address 0x7fe0fa43bac0
Thread 8 (Thread 7526):
#0 0x00007fe1061e8b8d in ?? ()
Cannot access memory at address 0x7fe0fd3e8d50
Thread 7 (Thread 7537):
#0 0x00007fe107089054 in ?? ()
Cannot access memory at address 0x7fe0f4a32a50
Thread 6 (Thread 6402):
#0 0x00007fe1070867bb in ?? ()
Cannot access memory at address 0x7fe1025fdcc0
--Type <return> to continue, or q <return> to quit--
Thread 5 (Thread 6404):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe100feceb0
Thread 4 (Thread 6400):
#0 0x00007fe10621760d in ?? ()
Cannot access memory at address 0x7fe1041fdc50
Thread 3 (Thread 6405):
#0 0x00007fe10708643c in ?? ()
Cannot access memory at address 0x7fe1005eb6a0
Thread 2 (Thread 6391):
#0 0x00007fe106224f03 in ?? ()
Cannot access memory at address 0x7fff3919ee20
Thread 1 (Thread 7536):
#0 0x00007fe10616e8a5 in ?? ()
Cannot access memory at address 0x7fe0f54336e8
(gdb)
after the step #9. rebalance in 2 nodes on SRC cluster:
src cluster:172.23.105.22,172.23.105.156,172.23.105.157,172.23.105.158,172.23.105.207
dest cluster: 172.23.105.159, 172.23.105.160, 172.23.105.206
one more comment: data is not sync in the clusters( can create separate ticket if it's required)
list of all crashes in the clusters: https://friendpaste.com/3relYTEwZZX44kxwOl2ezi