Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-13551

Go-XDCR: Checkpointing fails after source-node rebalance-out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • 4.0.0
    • 4.0.0
    • XDCR
    • Security Level: Public
    • centOS 6.x

    Description

      Build
      ------
      3.5.0-1270

      Testcase
      ------------

      ./testrunner -i INI_FILE.ini -p enable_goxdcr=True,checkpoint_interval=60 -t xdcr.checkpointXDCR.XDCRCheckpointUnitTest.test_rebalance,rdirection=unidirection,topology=chain,replication_type=xmem,rebalance=source

      Steps
      -------
      1. C1 [.170,.171] --> C2 [.120,.169]
      2. Add 3 keys to C1 at an interval of 70s, checkpoints happen as expected
      3. Rebalance-out .170 from C1.
      4. Add another key on C1 after 70s.
      5. Make sure checkpointing happens. It fails here though.

      [2015-02-17 17:49:11,961] - [checkpointXDCR:310] INFO - Node ip:172.23.107.170 port:8091 ssh_username:root contains active vb0
      [2015-02-17 17:49:11,962] - [xdcrnewbasetests:1722] INFO - Starting rebalance-out nodes:[ip:172.23.107.170 port:8091 ssh_username:root] at C1 cluster 172.23.107.170
      [2015-02-17 17:49:12,444] - [rest_client:1153] INFO - rebalance params : password=password&ejectedNodes=ns_1%40172.23.107.170&user=Administrator&knownNodes=ns_1%40172.23.107.171%2Cns_1%40172.23.107.170
      [2015-02-17 17:49:12,457] - [rest_client:1157] INFO - rebalance operation started
      [2015-02-17 17:49:12,462] - [rest_client:1275] INFO - rebalance percentage : 0.00 %
      [2015-02-17 17:49:22,481] - [rest_client:1275] INFO - rebalance percentage : 37.11 %
      [2015-02-17 17:49:32,604] - [rest_client:1275] INFO - rebalance percentage : 74.15 %
      [2015-02-17 17:49:42,621] - [rest_client:1275] INFO - rebalance percentage : 100.00 %
      [2015-02-17 17:49:50,978] - [rest_client:715] ERROR - socket error while connecting to http://172.23.107.170:8091/pools error [Errno 111] Connection refused
      [2015-02-17 17:49:51,980] - [rest_client:715] ERROR - socket error while connecting to http://172.23.107.170:8091/pools error [Errno 111] Connection refused
      [2015-02-17 17:49:52,983] - [rest_client:715] ERROR - socket error while connecting to http://172.23.107.170:8091/pools error [Errno 111] Connection refused
      [2015-02-17 17:49:53,987] - [rest_client:715] ERROR - socket error while connecting to http://172.23.107.170:8091/pools error [Errno 111] Connection refused
      [2015-02-17 17:49:54,990] - [rest_client:715] ERROR - socket error while connecting to http://172.23.107.170:8091/pools error [Errno 111] Connection refused
      [2015-02-17 17:49:55,992] - [rest_client:715] ERROR - socket error while connecting to http://172.23.107.170:8091/pools error [Errno 111] Connection refused
      [2015-02-17 17:49:56,995] - [rest_client:715] ERROR - socket error while connecting to http://172.23.107.170:8091/pools error [Errno 111] Connection refused
      [2015-02-17 17:49:58,006] - [task:439] INFO - rebalancing was completed with progress: 100% in 45.5480399132 sec
      [2015-02-17 17:49:58,082] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:49:58,230] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:49:58,315] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:49:58,489] - [checkpointXDCR:318] INFO - Remote uuid before rebalance :259452465926078, after rebalance : 259452465926078
      [2015-02-17 17:49:58,520] - [rest_client:1169] INFO - /diag/eval status on 172.23.107.171:8091: True content: '

      {ok, BC} = ns_bucket:get_bucket(default), ns_bucket:replication_type(BC).' command: '{ok, BC}

      = ns_bucket:get_bucket(default), ns_bucket:replication_type(BC).'
      [2015-02-17 17:49:58,520] - [checkpointXDCR:324] INFO - Current internal replication = UPR,hence vb_uuid did not change, subsequent _commit_for_checkpoints are expected to pass
      [2015-02-17 17:49:58,590] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:49:58,664] - [xdcrnewbasetests:2818] INFO - sleep for 70 secs. ...
      [2015-02-17 17:51:08,816] - [data_helper:295] INFO - creating direct client 172.23.107.120:11210 default
      [2015-02-17 17:51:08,907] - [data_helper:295] INFO - creating direct client 172.23.107.169:11210 default
      [2015-02-17 17:51:08,988] - [data_helper:295] INFO - creating direct client 172.23.107.120:11210 default
      [2015-02-17 17:51:09,255] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:51:09,374] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:51:09,563] - [checkpointXDCR:251] INFO - Local failover log: [259452465926078, 2]
      [2015-02-17 17:51:09,563] - [checkpointXDCR:252] INFO - Remote failover log: [14054679954478, 2]
      [2015-02-17 17:51:09,564] - [checkpointXDCR:253] INFO - ################ New mutation:3 ##################
      [2015-02-17 17:51:09,567] - [checkpointXDCR:236] INFO - Loaded key pymc2329 onto vb0 in 172.23.107.171
      [2015-02-17 17:51:09,568] - [checkpointXDCR:237] INFO - deleted, flags, exp, rev_id, cas for key pymc2329 = (0, 0, 0, 1, 1424224269567918080)
      [2015-02-17 17:51:09,647] - [data_helper:295] INFO - creating direct client 172.23.107.120:11210 default
      [2015-02-17 17:51:09,760] - [data_helper:295] INFO - creating direct client 172.23.107.169:11210 default
      [2015-02-17 17:51:09,825] - [remote_util:155] INFO - connecting to 172.23.107.120 with username : root password : couchbase ssh_key:
      [2015-02-17 17:51:09,977] - [remote_util:188] INFO - Connected to 172.23.107.120
      [2015-02-17 17:51:10,194] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: sudo cat /proc/cpuinfo
      [2015-02-17 17:51:10,275] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:51:10,276] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: df -Th
      [2015-02-17 17:51:10,369] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:51:10,370] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: sudo cat /proc/meminfo
      [2015-02-17 17:51:10,453] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:51:10,453] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: hostname
      [2015-02-17 17:51:10,541] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:51:10,541] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: hostname -d
      [2015-02-17 17:51:10,632] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:51:10,634] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: grep "POST /_commit_for_checkpoint" "/opt/couchbase/var/lib/couchbase/logs/couchdb.log" | wc -l
      [2015-02-17 17:51:10,727] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:51:10,728] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: grep "POST /_commit_for_checkpoint 200" "/opt/couchbase/var/lib/couchbase/logs/couchdb.log" | wc -l
      [2015-02-17 17:51:10,822] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:51:10,824] - [checkpointXDCR:168] INFO - 9
      [2015-02-17 17:51:10,825] - [checkpointXDCR:171] INFO - Checkpoint on this node (this run): 1
      [2015-02-17 17:51:10,925] - [checkpointXDCR:263] INFO - Checkpointing failed - may not be an error if vb_uuid changed
      FAIL

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              apiravi Aruna Piravi (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty