Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-13552

Go-XDCR: Checkpointing fails after source node failover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 4.0.0
    • 4.0.0
    • XDCR
    • Security Level: Public
    • centOS 6.x

    Description

      Build
      ------
      3.5.0-1270

      Testcase
      ------------

      /testrunner -i INI_FILE.ini -p enable_goxdcr=True,checkpoint_interval=60 -t xdcr.checkpointXDCR.XDCRCheckpointUnitTest.test_failover, rdirection=unidirection,topology=chain,replication_type=xmem,failover=source

      Steps
      -------
      1. C1 [.170,.171] --> C2 [.120,.169]
      2. Add 3 keys to C1 at an interval of 70s, checkpoints happen as expected
      3. Failover and rebalance out .170 from C1.
      4. Add another key on C1 after 70s.
      5. Make sure checkpointing happens. It fails here though.

      [2015-02-17 17:58:22,313] - [checkpointXDCR:357] INFO - Node ip:172.23.107.170 port:8091 ssh_username:root contains active vb0
      [2015-02-17 17:58:22,313] - [xdcrnewbasetests:1840] INFO - Starting failover for nodes:[ip:172.23.107.170 port:8091 ssh_username:root] at C1 cluster 172.23.107.170
      [2015-02-17 17:58:22,516] - [task:2678] INFO - Failing over 172.23.107.170:8091 with graceful=False
      [2015-02-17 17:58:23,926] - [rest_client:1098] INFO - fail_over node ns_1@172.23.107.170 successful
      [2015-02-17 17:58:23,926] - [task:2658] INFO - 0 seconds sleep after failover, for nodes to go pending....
      [2015-02-17 17:58:24,947] - [rest_client:1153] INFO - rebalance params : password=password&ejectedNodes=ns_1%40172.23.107.170&user=Administrator&knownNodes=ns_1%40172.23.107.171%2Cns_1%40172.23.107.170
      [2015-02-17 17:58:24,957] - [rest_client:1157] INFO - rebalance operation started
      [2015-02-17 17:58:24,965] - [rest_client:1275] INFO - rebalance percentage : 0.00 %
      [2015-02-17 17:58:34,996] - [task:439] INFO - rebalancing was completed with progress: 100% in 10.0387940407 sec
      [2015-02-17 17:58:35,082] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:58:35,254] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:58:35,344] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:58:35,537] - [checkpointXDCR:373] INFO - Remote uuid before failover :247473979359006, after failover : 117678140481495
      [2015-02-17 17:58:35,635] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:58:35,714] - [xdcrnewbasetests:2818] INFO - sleep for 70 secs. ...
      [2015-02-17 17:59:45,858] - [data_helper:295] INFO - creating direct client 172.23.107.120:11210 default
      [2015-02-17 17:59:45,963] - [data_helper:295] INFO - creating direct client 172.23.107.169:11210 default
      [2015-02-17 17:59:46,048] - [data_helper:295] INFO - creating direct client 172.23.107.120:11210 default
      [2015-02-17 17:59:46,309] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:59:46,413] - [data_helper:295] INFO - creating direct client 172.23.107.171:11210 default
      [2015-02-17 17:59:46,614] - [checkpointXDCR:251] INFO - Local failover log: [117678140481495, 2]
      [2015-02-17 17:59:46,615] - [checkpointXDCR:252] INFO - Remote failover log: [170831141775234, 2]
      [2015-02-17 17:59:46,615] - [checkpointXDCR:253] INFO - ################ New mutation:3 ##################
      [2015-02-17 17:59:46,618] - [checkpointXDCR:236] INFO - Loaded key pymc2329 onto vb0 in 172.23.107.171
      [2015-02-17 17:59:46,619] - [checkpointXDCR:237] INFO - deleted, flags, exp, rev_id, cas for key pymc2329 = (0, 0, 0, 1, 1424224786618515456)
      [2015-02-17 17:59:46,679] - [data_helper:295] INFO - creating direct client 172.23.107.120:11210 default
      [2015-02-17 17:59:46,768] - [data_helper:295] INFO - creating direct client 172.23.107.169:11210 default
      [2015-02-17 17:59:46,855] - [remote_util:155] INFO - connecting to 172.23.107.120 with username : root password : couchbase ssh_key:
      [2015-02-17 17:59:46,990] - [remote_util:188] INFO - Connected to 172.23.107.120
      [2015-02-17 17:59:47,211] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: sudo cat /proc/cpuinfo
      [2015-02-17 17:59:47,295] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:59:47,296] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: df -Th
      [2015-02-17 17:59:47,382] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:59:47,383] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: sudo cat /proc/meminfo
      [2015-02-17 17:59:47,490] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:59:47,491] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: hostname
      [2015-02-17 17:59:47,552] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:59:47,553] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: hostname -d
      [2015-02-17 17:59:47,633] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:59:47,635] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: grep "POST /_commit_for_checkpoint" "/opt/couchbase/var/lib/couchbase/logs/couchdb.log" | wc -l
      [2015-02-17 17:59:47,669] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:59:47,670] - [remote_util:1800] INFO - running command.raw on 172.23.107.120: grep "POST /_commit_for_checkpoint 200" "/opt/couchbase/var/lib/couchbase/logs/couchdb.log" | wc -l
      [2015-02-17 17:59:47,743] - [remote_util:1837] INFO - command executed successfully
      [2015-02-17 17:59:47,744] - [checkpointXDCR:168] INFO - 11
      [2015-02-17 17:59:47,745] - [checkpointXDCR:171] INFO - Checkpoint on this node (this run): 1
      [2015-02-17 17:59:47,746] - [checkpointXDCR:263] INFO - Checkpointing failed - may not be an error if vb_uuid changed
      FAIL

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              apiravi Aruna Piravi (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty