Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5314

Autofailover may failover two nodes automatically within 1 minute if the master node is failed over

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 1.8.1
    • 1.8.1
    • ns_server
    • Security Level: Public
    • None
    • Large cluster (14 Node) in DGM
      redhat linux
      1024 vbuckets, 1 bucket

    Description

      Repro Steps
      -----------------
      Setup a large cluster(14nodes) with master(206)
      Loaded 32M data
      Continue mutating items on the loaded data
      Enable Auto-failover
      Reboot the master node (206)

      Output
      -----------
      Could not auto-failover node ('ns_1@10.3.121.206'). There was at least another node down.
      Could not auto-failover node ('ns_1@10.3.121.214'). There was at least another node down.
      Could not auto-failover node ('ns_1@10.3.121.228'). There was at least another node down.
      Could not auto-failover node ('ns_1@10.3.121.232'). There was at least another node down.

      Diags from 206,224 are larger than 20M and are copied at
      https://s3.amazonaws.com/bugdb/jira/MB-5314/dgm_04.tar
      https://s3.amazonaws.com/bugdb/jira/MB-5314/224.out
      https://s3.amazonaws.com/bugdb/jira/MB-5314/206.out

      Sample output from 224 ( Attaching this as a .out file on the diags below)

      • ps -e -o f,s,pid,uid,ppid,pgid,sid,size,stackp,sz,vsz,rss,maj_flt,psr,time,args --forest
        F S PID UID PPID PGID SID SZ STACKP SZ VSZ RSS MAJFL PSR TIME COMMAND
        1 S 2 0 0 0 0 0 00000000 0 0 0 0 0 00:00:00 [kthreadd]
        1 S 3 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [migration/0]
        1 S 4 0 2 0 0 0 00000000 0 0 0 0 0 00:00:01 _ [ksoftirqd/0]
        1 S 5 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [migration/0]
        5 S 6 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [watchdog/0]
        1 S 7 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [migration/1]
        1 S 8 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [migration/1]
        1 S 9 0 2 0 0 0 00000000 0 0 0 0 1 00:00:01 _ [ksoftirqd/1]
        5 S 10 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [watchdog/1]
        1 S 11 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [migration/2]
        1 S 12 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [migration/2]
        1 S 13 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [ksoftirqd/2]
        5 S 14 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [watchdog/2]
        1 S 15 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [migration/3]
        1 S 16 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [migration/3]
        1 S 17 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [ksoftirqd/3]
        5 S 18 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [watchdog/3]
        5 S 19 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [events/0]
        1 S 20 0 2 0 0 0 00000000 0 0 0 0 1 00:03:30 _ [events/1]
        1 S 21 0 2 0 0 0 00000000 0 0 0 0 2 00:00:10 _ [events/2]
        1 S 22 0 2 0 0 0 00000000 0 0 0 0 3 00:00:01 _ [events/3]
        1 S 23 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [cpuset]
        1 S 24 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [khelper]
        1 S 25 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [netns]
        1 S 26 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [async/mgr]
        1 S 27 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [pm]
        1 S 28 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [sync_supers]
        1 S 29 0 2 0 0 0 00000000 0 0 0 0 2 00:00:06 _ [bdi-default]
        1 S 30 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [kintegrityd/0]
        1 S 31 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [kintegrityd/1]
        1 S 32 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [kintegrityd/2]
        1 S 33 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [kintegrityd/3]
        1 S 34 0 2 0 0 0 00000000 0 0 0 0 0 00:00:05 _ [kblockd/0]
        1 S 35 0 2 0 0 0 00000000 0 0 0 0 1 00:00:01 _ [kblockd/1]
        1 S 36 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [kblockd/2]
        1 S 37 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [kblockd/3]
        1 S 38 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [kacpid]
        1 S 39 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [kacpi_notify]
        1 S 40 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [kacpi_hotplug]
        1 S 41 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [ata/0]
        1 S 42 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [ata/1]
        1 S 43 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [ata/2]
        1 S 44 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [ata/3]
        1 S 45 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [ata_aux]
        1 S 46 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [ksuspend_usbd]
        1 S 47 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [khubd]
        5 S 48 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [kseriod]
        1 S 49 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [md/0]
        1 S 50 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [md/1]
        1 S 51 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [md/2]
        1 S 52 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [md/3]
        1 S 53 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [md_misc/0]
        1 S 54 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [md_misc/1]
        1 S 55 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [md_misc/2]
        1 S 56 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [md_misc/3]
        1 S 57 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [khungtaskd]
        1 S 58 0 2 0 0 0 00000000 0 0 0 0 1 00:00:31 _ [kswapd0]
        1 S 59 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [ksmd]
        1 S 60 0 2 0 0 0 00000000 0 0 0 0 3 00:00:48 _ [khugepaged]
        1 S 61 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [aio/0]
        1 S 62 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [aio/1]
        1 S 63 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [aio/2]
        1 S 64 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [aio/3]
        1 S 65 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [crypto/0]
        1 S 66 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [crypto/1]
        1 S 67 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [crypto/2]
        1 S 68 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [crypto/3]
        1 S 73 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [kthrotld/0]
        1 S 74 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [kthrotld/1]
        1 S 75 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [kthrotld/2]
        1 S 76 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [kthrotld/3]
        1 S 78 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [kpsmoused]
        1 S 79 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [usbhid_resumer]
        1 S 110 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [kstriped]
        1 S 249 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [scsi_eh_0]
        1 S 251 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [scsi_eh_1]
        1 S 379 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [virtio-blk]
        1 S 419 0 2 0 0 0 00000000 0 0 0 0 0 00:00:05 _ [kdmflush]
        1 S 421 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [kdmflush]
        1 S 440 0 2 0 0 0 00000000 0 0 0 0 0 00:01:21 _ [kjournald]
        1 S 933 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [kjournald]
        1 S 980 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [kauditd]
        1 S 1335 0 2 0 0 0 00000000 0 0 0 0 0 00:00:00 _ [rpciod/0]
        1 S 1336 0 2 0 0 0 00000000 0 0 0 0 1 00:00:00 _ [rpciod/1]
        1 S 1337 0 2 0 0 0 00000000 0 0 0 0 2 00:00:00 _ [rpciod/2]
        1 S 1338 0 2 0 0 0 00000000 0 0 0 0 3 00:00:00 _ [rpciod/3]
        1 S 1429 0 2 0 0 0 00000000 0 0 0 0 0 00:00:23 _ [flush-253:0]
        1 S 1899 0 2 0 0 0 00000000 0 0 0 0 0 00:00:02 _ [kdmflush]
        1 S 1910 0 2 0 0 0 00000000 0 0 0 0 1 00:00:41 _ [kjournald]
        1 S 23314 0 2 0 0 0 00000000 0 0 0 0 0 00:00:09 _ [flush-253:2]
        4 S 1 0 0 1 1 412 a09ceff0 4849 19396 948 82 1 00:00:01 /sbin/init
        5 S 530 0 1 530 530 980 1a0193a0 2824 11296 252 0 1 00:00:00 /sbin/udevd -d
        5 S 1782 0 530 530 530 2376 1a0193a0 3173 12692 344 0 2 00:00:00 _ /sbin/udevd -d
        5 S 1783 0 530 530 530 2032 1a0193a0 3087 12348 320 0 3 00:00:00 _ /sbin/udevd -d
        1 S 1193 0 1 1193 1193 604 cd5620b0 2293 9172 620 101 0 00:00:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhc
        5 S 1246 0 1 1243 981 227824 89cf8e40 63771 255084 1148 67 0 00:00:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 4
        5 S 1275 0 1 1275 1275 304 00788710 2301 9204 484 1 3 00:01:05 irqbalance
        5 S 1289 32 1 1289 1289 304 63c844a0 4756 19024 532 0 0 00:00:00 rpcbind
        5 S 1307 29 1 1307 1307 328 d0038530 5800 23200 668 0 0 00:00:00 rpc.statd
        1 S 1342 0 1 1342 1342 316 68fe0870 7377 29508 236 0 1 00:00:00 rpc.idmapd
        5 S 1434 81 1 1434 1434 76204 cd61c050 24337 97348 836 25 2 00:00:00 dbus-daemon --system
        4 S 1446 0 1 1446 1446 712 22cab0f0 47286 189144 860 0 0 00:00:00 cupsd -C /etc/cups/cupsd.conf
        1 S 1471 0 1 1471 1471 268 e9642320 1033 4132 452 0 2 00:00:00 /usr/sbin/acpid
        5 S 1480 68 1 1480 1480 828 cd3f96b0 6301 25204 1444 20 0 00:00:02 hald
        0 S 1481 0 1480 1480 1480 296 72b23df0 4540 18160 632 1 0 00:00:00 _ hald-runner
        0 S 1509 0 1481 1480 1480 292 61b4e090 5069 20276 592 2 2 00:00:00 _ hald-addon-input: Listening on /dev/in
        4 S 1520 68 1481 1480 1480 296 7c75f9e0 4465 17860 680 2 0 00:00:00 _ hald-addon-acpi: listening on acpid so
        5 S 1540 0 1 1540 1540 350120 57554320 96439 385756 868 14 0 00:00:02 automount --pid-file /var/run/autofs.pid
        1 S 1556 0 1 1556 1556 784 16fe7df0 1704 6816 268 0 1 00:00:00 /usr/sbin/mcelog --daemon
        5 S 1567 0 1 1567 1567 608 e7cc5e50 16017 64068 500 73 1 00:00:00 /usr/sbin/sshd
        4 S 11996 0 1567 11996 11996 792 828b03a0 24454 97816 3892 35 0 00:00:00 _ sshd: root@pts/0
        4 S 12001 0 11996 12001 12001 424 01d909b0 27098 108392 1784 5 0 00:00:00 _ -bash
        4 R 12022 0 12001 12022 12001 1196 e88c9780 27074 108296 1016 2 0 00:00:00 _ ps -e -o f,s,pid,uid,ppid,pgid,sid
        4 S 1643 0 1 1643 1643 596 7db64d30 19669 78676 1032 4 1 00:00:05 /usr/libexec/postfix/master
        4 S 1654 89 1643 1643 1643 704 d6e5ecb0 19732 78928 996 5 3 00:00:00 _ qmgr -l -t fifo -u
        4 S 12008 89 1643 1643 1643 600 bb935ea0 19689 78756 3220 16 0 00:00:00 _ pickup -l -t fifo -u
        1 S 1667 0 1 1667 1667 292 2da10740 29710 118840 704 0 1 00:00:00 /usr/sbin/abrtd
        0 S 1675 0 1 1675 1675 268 bd122ba0 2304 9216 564 20 1 00:00:00 abrt-dump-oops -d /var/spool/abrt -rwx /var/lo
        1 S 1686 498 1 1686 1686 379996 d4d02bd0 121031 484124 1728 13 0 00:00:44 /usr/sbin/qpidd --data-dir /var/lib/qpidd --da
        1 S 1721 0 1 1721 1721 1424 424170b0 29312 117248 788 0 2 00:00:03 crond
        5 S 1732 0 1 1732 1732 480 c5996e90 5373 21492 300 11 1 00:00:00 /usr/sbin/atd
        1 S 1748 0 1 1748 1748 268 573eb390 1028 4112 280 9 0 00:00:00 /usr/bin/rhsmcertd 240 1440
        1 S 1750 0 1748 1748 1748 268 573eb390 1028 4112 276 11 0 00:00:00 _ /usr/bin/rhsmcertd 240 1440
        4 S 1765 0 1 1765 1765 700 241f1bc0 19284 77136 1024 6 0 00:00:00 login – root
        4 S 1856 0 1765 1856 1856 420 41ae36c0 27097 108388 1064 0 0 00:00:00 _ -bash
        4 S 1767 0 1 1767 1767 268 ab3917a0 1029 4116 448 0 1 00:00:00 /sbin/mingetty /dev/tty2
        4 S 1769 0 1 1769 1769 268 98afe3b0 1029 4116 448 0 1 00:00:00 /sbin/mingetty /dev/tty3
        4 S 1771 0 1 1771 1771 268 209c7660 1029 4116 448 0 3 00:00:00 /sbin/mingetty /dev/tty4
        4 S 1773 0 1 1773 1773 268 9a2c5520 1029 4116 448 0 1 00:00:00 /sbin/mingetty /dev/tty5
        4 S 1775 0 1 1775 1775 268 f2c56be0 1029 4116 448 0 1 00:00:00 /sbin/mingetty /dev/tty6
        4 S 1790 0 1 1434 1434 4078148 002e28e0 1028479 4113916 1268 45 0 00:00:00 /usr/sbin/console-kit-daemon --no-daemon
        1 S 18820 497 1 18819 18819 300 cbdfa630 2720 10880 336 4 1 00:00:01 /opt/couchbase/lib/erlang/erts-5.8.4/bin/epmd
        0 S 18835 497 1 18834 18834 1562896 be2e2b70 396073 1584292 145148 359 1 06:14:06 /opt/couchbase/lib/erlang/erts-5.8.4/bin/beam.
        0 S 18862 497 18835 18862 18862 292 5dd8e480 26539 106156 1204 0 1 00:00:05 _ sh -s disksup
        0 S 18864 497 18835 18864 18864 264 ecea1a60 1027 4108 524 0 0 00:00:08 _ /opt/couchbase/lib/erlang/lib/os_mon-2.2.6
        0 S 18865 497 18835 18865 18865 264 047534e0 1026 4104 328 1 1 00:00:00 _ /opt/couchbase/lib/erlang/lib/os_mon-2.2.6
        0 S 18866 497 18835 18866 18866 268 5a5f00d0 2711 10844 452 0 0 00:00:16 _ inet_gethost 4
        1 S 18867 497 18866 18866 18866 268 5a5f00d0 2711 10844 344 0 1 00:00:00 | _ inet_gethost 4
        1 S 19235 497 18866 18866 18866 268 5a5f00d0 2711 10844 328 0 1 00:00:10 | _ inet_gethost 4
        0 S 18868 497 18835 18868 18868 410156 f2410b60 105389 421556 9524 3 3 00:02:54 _ /opt/couchbase/bin/moxi -Z port_listen=112
        0 S 18869 497 18835 18869 18869 4935360 786e8e60 1242109 4968436 4808440 2858 1 03:04:28 _ /opt/couchbase/bin/memcached -X /opt/co
        0 S 18870 497 18835 18870 18870 272 03a37550 1044 4176 488 1 0 00:03:59 _ portsigar for ns_1@10.3.121.224

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ketaki Ketaki Gangal (Inactive)
            ketaki Ketaki Gangal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty