Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-14983

[system tests] many goxdcr crashes on src and dest clusters

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 4.0.0
    • 4.0.0
    • XDCR
    • Security Level: Public
    • None
    • 4.0.0-2093
    • Untriaged
    • Centos 64-bit
    • Unknown

    Description

      steps:
      1. 3 nodes in cluster, 4 buckets. run data loader more then a day
      2. setup replication from SRC to cluster DEST for all buckets.
      3. rebalance in at SRC cluster
      rebalance in at DEST cluster
      4. Graceful Fail Over(rebalance) for node in SRC cluster, add back(Delta Recovery)
      5. click failover, Hard Fail Over for node in SRC cluster A, add back(Full Recovery) and rebalance
      6. remove node in SRC cluster, stop rebalance. Cancel removing node and rebalance
      7. rebalance out 1 node on SRC cluster
      8. rebalance out 1 node on DEST cluster
      9. rebalance in 2 nodes on SRC cluster

      through all scenarios on both clusters many goxdcr crashed dumped

      Andreis-MacBook-Pro:testrunner andrei$ python scripts/ssh.py -i viber.ini "ls -la /tmp/"
      172.23.105.159
      total 83140
      drwxrwxrwt. 4 root root 4096 May 14 03:14 .
      dr-xr-xr-x. 23 root root 4096 Nov 23 08:20 ..
      drwx------. 2 root root 4096 May 14 00:00 atop.d
      rw-rr-. 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxrwxrwt. 2 root root 4096 Nov 23 08:20 .ICE-unix

      172.23.105.206
      total 83136
      drwxrwxrwt. 4 root root 4096 May 14 03:46 .
      dr-xr-xr-x. 23 root root 4096 Nov 10 2014 ..
      drwx------. 2 root root 4096 May 14 00:00 atop.d
      rw-rr-. 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxrwxrwt. 2 root root 4096 Nov 10 2014 .ICE-unix

      172.23.105.160
      total 83140
      drwxrwxrwt. 4 root root 4096 May 14 03:39 .
      dr-xr-xr-x. 23 root root 4096 Nov 23 08:54 ..
      drwx------. 2 root root 4096 May 14 00:00 atop.d
      rw-rr-. 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxrwxrwt. 2 root root 4096 Nov 23 08:54 .ICE-unix

      172.23.105.158
      total 1978144
      drwxrwxrwt. 5 root root 12288 May 14 08:06 .
      dr-xr-xr-x. 23 root root 4096 Apr 21 10:11 ..
      drwx------. 2 root root 4096 May 14 00:00 atop.d
      rw------. 1 couchbase couchbase 458010624 May 14 07:55 core.goxdcr.1324
      rw------. 1 couchbase couchbase 1706594304 May 14 08:06 core.goxdcr.3970
      rw------. 1 couchbase couchbase 385552384 May 14 06:39 core.goxdcr.4327
      rw-rr-. 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxr-xr-x. 2 root root 4096 Nov 10 2014 hsperfdata_root
      drwxrwxrwt. 2 root root 4096 Nov 10 2014 .ICE-unix

      172.23.105.207
      total 330796
      drwxrwxrwt. 4 root root 4096 May 14 09:37 .
      dr-xr-xr-x. 23 root root 4096 Nov 10 2014 ..
      drwx------. 2 root root 4096 May 14 00:00 atop.d
      rw------. 1 couchbase couchbase 107376640 May 14 09:32 core.goxdcr.28229
      rw------. 1 couchbase couchbase 108490752 May 14 09:33 core.goxdcr.28382
      rw------. 1 couchbase couchbase 116559872 May 14 09:34 core.goxdcr.28398
      rw------. 1 couchbase couchbase 112099328 May 14 09:35 core.goxdcr.28497
      rw------. 1 couchbase couchbase 108490752 May 14 09:35 core.goxdcr.28513
      rw------. 1 couchbase couchbase 99110912 May 14 09:35 core.goxdcr.28530
      rw------. 1 couchbase couchbase 116285440 May 14 09:35 core.goxdcr.28546
      rw------. 1 couchbase couchbase 98975744 May 14 09:35 core.goxdcr.28595
      rw------. 1 couchbase couchbase 104947712 May 14 09:36 core.goxdcr.28610
      rw------. 1 couchbase couchbase 123703296 May 14 09:36 core.goxdcr.28625
      rw------. 1 couchbase couchbase 100622336 May 14 09:36 core.goxdcr.28642
      rw------. 1 couchbase couchbase 122454016 May 14 09:36 core.goxdcr.28658
      rw------. 1 couchbase couchbase 124952576 May 14 09:36 core.goxdcr.28675
      rw------. 1 couchbase couchbase 99110912 May 14 09:36 core.goxdcr.28739
      rw------. 1 couchbase couchbase 110706688 May 14 09:36 core.goxdcr.28754
      rw------. 1 couchbase couchbase 109604864 May 14 09:36 core.goxdcr.28770
      rw------. 1 couchbase couchbase 110718976 May 14 09:37 core.goxdcr.28786
      rw------. 1 couchbase couchbase 108216320 May 14 09:37 core.goxdcr.28802
      rw-rr-. 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxrwxrwt. 2 root root 4096 Nov 10 2014 .ICE-unix

      172.23.105.156
      total 456952
      drwxrwxrwt. 6 root root 4096 May 14 09:59 .
      dr-xr-xr-x. 23 root root 4096 Dec 4 08:34 ..
      drwx------ 2 root root 4096 May 14 00:00 atop.d
      rw------ 1 couchbase couchbase 389009408 May 14 06:37 core.goxdcr.14984
      rw------ 1 couchbase couchbase 311730176 May 14 06:40 core.goxdcr.20026
      rw-rr- 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxr-xr-x 2 root root 4096 Dec 4 08:34 hsperfdata_root
      drwxrwxrwt 2 root root 4096 Dec 4 08:34 .ICE-unix
      drwx------ 2 root root 4096 May 14 10:00 tmpCLs0Mm

      172.23.105.22
      total 1018880
      drwxrwxrwt. 5 root root 4096 May 14 09:38 .
      dr-xr-xr-x. 23 root root 4096 Nov 10 2014 ..
      drwx------. 2 root root 4096 May 14 00:00 atop.d
      rw------. 1 couchbase couchbase 130392064 May 14 09:36 core.goxdcr.10010
      rw------. 1 couchbase couchbase 107180032 May 14 09:36 core.goxdcr.10062
      rw------. 1 couchbase couchbase 101605376 May 14 09:36 core.goxdcr.10077
      rw------. 1 couchbase couchbase 121208832 May 14 09:37 core.goxdcr.10092
      rw------. 1 couchbase couchbase 122454016 May 14 09:37 core.goxdcr.10110
      rw------. 1 couchbase couchbase 113213440 May 14 09:37 core.goxdcr.10128
      rw------. 1 couchbase couchbase 122589184 May 14 09:37 core.goxdcr.10144
      rw------. 1 couchbase couchbase 110981120 May 14 09:37 core.goxdcr.10207
      rw------. 1 couchbase couchbase 110854144 May 14 09:37 core.goxdcr.10223
      rw------. 1 couchbase couchbase 106065920 May 14 09:38 core.goxdcr.10239
      rw------. 1 couchbase couchbase 98975744 May 14 09:38 core.goxdcr.10254
      rw------. 1 couchbase couchbase 111960064 May 14 09:38 core.goxdcr.10270
      rw------. 1 couchbase couchbase 100352000 May 14 09:38 core.goxdcr.10287
      rw------. 1 couchbase couchbase 107044864 May 14 09:38 core.goxdcr.10302
      rw------. 1 couchbase couchbase 110841856 May 14 09:38 core.goxdcr.10350
      rw------. 1 couchbase couchbase 790585344 May 14 08:06 core.goxdcr.1751
      rw------. 1 couchbase couchbase 107511808 May 14 09:32 core.goxdcr.9691
      rw------. 1 couchbase couchbase 130256896 May 14 09:35 core.goxdcr.9831
      rw------. 1 couchbase couchbase 95764480 May 14 09:35 core.goxdcr.9933
      rw------. 1 couchbase couchbase 112099328 May 14 09:36 core.goxdcr.9994
      rw-rr-. 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxrwxrwt. 2 root root 4096 Nov 10 2014 .ICE-unix
      drwx------. 2 root root 4096 Jul 1 2014 tmptcWvo9

      172.23.105.157
      total 3000516
      drwxrwxrwt. 6 root root 4096 May 14 09:59 .
      dr-xr-xr-x. 23 root root 4096 May 13 15:52 ..
      drwx------. 2 root root 4096 May 14 00:00 atop.d
      rw------. 1 couchbase couchbase 340205568 May 14 06:39 core.goxdcr.11501
      rw------. 1 couchbase couchbase 262119424 May 14 06:42 core.goxdcr.11685
      rw------. 1 couchbase couchbase 174669824 May 14 06:42 core.goxdcr.11786
      rw------. 1 couchbase couchbase 114851840 May 14 06:42 core.goxdcr.11805
      rw------. 1 couchbase couchbase 738746368 May 14 08:02 core.goxdcr.11824
      rw------. 1 couchbase couchbase 865300480 May 14 08:06 core.goxdcr.15187
      rw------. 1 couchbase couchbase 350224384 May 14 09:58 core.goxdcr.19677
      rw------. 1 couchbase couchbase 293244928 May 14 09:58 core.goxdcr.19945
      rw------. 1 couchbase couchbase 293249024 May 14 09:59 core.goxdcr.20007
      rw------. 1 couchbase couchbase 275992576 May 14 09:59 core.goxdcr.20025
      rw------. 1 couchbase couchbase 187527168 May 14 09:59 core.goxdcr.20054
      rw------. 1 couchbase couchbase 399122432 May 14 06:35 core.goxdcr.6918
      rw-rr-. 1 root root 85112309 May 10 12:27 couchbase-server-enterprise-4.0.0-2093-centos6.x86_64.rpm
      drwxr-xr-x. 2 root root 4096 Nov 10 2014 hsperfdata_root
      drwxrwxrwt. 2 root root 4096 Nov 10 2014 .ICE-unix
      drwx------. 2 root root 4096 May 7 13:38 tmpUdUiK7

      logs on src cluster:
      Port server goxdcr on node 'babysitter_of_ns_1@127.0.0.1' exited with status 1. Restarting. Messages: runtime.goexit()
      /usr/local/go/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208d14fd8 sp=0xc208d14fd0
      created by github.com/couchbase/gomemcached/client.(*UprFeed).StartFeed
      /home/couchbase/jenkins/workspace/sherlock-unix/godeps/src/github.com/couchbase/gomemcached/client/upr_feed.go:328 +0x90
      [goport] 2015/05/14 10:09:26 /opt/couchbase/bin/goxdcr terminated: signal: aborted (core dumped) ns_log000 ns_1@172.23.105.157 10:09:26 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/UserInfo/UserInfo started running. xdcr000 ns_1@172.23.105.157 10:09:25 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/UserInfo/UserInfo started running. (repeated 4 times) xdcr000 ns_1@172.23.105.157 10:09:21 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/UserInfo/UserInfo failed. err=map[xmem_59329612e4ee4f8af5d349937762f53b/UserInfo/UserInfo_172.23.105.159:11210_0:Received non-recoverable error from memcached in target cluster] (repeated 3 times) xdcr000 ns_1@172.23.105.157 10:09:21 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums started running. (repeated 4 times) xdcr000 ns_1@172.23.105.157 10:09:21 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums failed. err=map[xmem_59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums_172.23.105.160:11210_0:Received non-recoverable error from memcached in target cluster] (repeated 3 times) xdcr000 ns_1@172.23.105.157 10:09:21 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums failed. err=map[xmem_59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums_172.23.105.159:11210_1:Received non-recoverable error from memcached in target cluster] xdcr000 ns_1@172.23.105.156 10:09:09 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/UserInfo/UserInfo failed. err=map[xmem_59329612e4ee4f8af5d349937762f53b/UserInfo/UserInfo_172.23.105.160:11210_0:Received non-recoverable error from memcached in target cluster] xdcr000 ns_1@172.23.105.156 10:09:07 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums failed. err=map[xmem_59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums_172.23.105.206:11210_1:Received non-recoverable error from memcached in target cluster] xdcr000 ns_1@172.23.105.158 10:09:00 - Thu May 14, 2015
      Replication 59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums failed. err=map[xmem_59329612e4ee4f8af5d349937762f53b/AbRegNums/AbRegNums_172.23.105.159:11210_0:Received non-recoverable error from memcached in target cluster] xdcr000 ns_1@172.23.105.156 10:08:57 - Thu May 14, 2015

      will provide collect info soon

      Attachments

        1. xdcr_157.tar.gz
          11.24 MB
        2. xdcr_207.tar.gz
          11.86 MB
        3. xdcr_22.tar.gz
          10.25 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              xiaomei Xiaomei Zhang (Inactive)
              andreibaranouski Andrei Baranouski
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty