Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-13359

Keys stuck in intracluster-replication queue when checkpoint_interval is set to 60s

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.0.3
    • 3.0.3
    • couchbase-bucket
    • Security Level: Public
    • None
    • Untriaged
    • No
    • Mar 9 - Mar 27

    Description

      Build
      -------
      3.5.0-1217

      Seeing this only if I set checkpoint_interval=60s, also only keys in replica (vbuckets: 2, 281, 448) are missing as you can see below. Dcp is getting stuck on .188

      Pls also see screenshots. You can see replication dcp stream idle with the 3 keys under "ep_dcp_replica_items_remaining"

      Testcase
      ------------
      ./testrunner -i bixdcr.ini -p get-cbcollect-info=True,get-logs=True,enable_goxdcr=True,checkpoint_interval=60 -t xdcr.filterXDCR.XDCRFilterTests.test_xdcr_with_filter,items=10000,rdirection=bidirection,ctopology=chain,replication_type=xmem,default@C1=filter_expression:C1-key-1,timeout_percentage:60,default@C2=filter_expression:C2-key-1

      2015-02-09 15:36:28 | INFO | MainProcess | Cluster_Thread | [task.check] Saw curr_items 11111 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091',default bucket
      2015-02-09 15:36:28 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 default
      2015-02-09 15:36:28 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 default
      2015-02-09 15:36:28 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_active_curr_items 11111 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091',default bucket
      2015-02-09 15:36:29 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 default
      2015-02-09 15:36:29 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 default
      2015-02-09 15:36:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:36:35 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:36:40 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:36:45 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:36:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:36:55 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:37:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:37:05 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:37:10 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:37:15 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:37:20 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      2015-02-09 15:37:25 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 11108 == 11111 expected on '10.3.4.188:8091''10.3.4.189:8091', default bucket
      [('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py', 781, '__bootstrap', 'self.__bootstrap_inner()'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py', 808, '__bootstrap_inner', 'self.run()'), ('./testrunner.py', 262, 'run', '**self._Thread__kwargs)'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py', 395, '__call__', 'return self.run(*args, **kwds)'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py', 331, 'run', 'testMethod()'), ('pytests/xdcr/filterXDCR.py', 81, 'test_xdcr_with_filter', 'self.verify_results()'), ('pytests/xdcr/xdcrnewbasetests.py', 2485, 'verify_results', 'dest_key_count_ok = dest_cluster.verify_items_count()'), ('pytests/xdcr/xdcrnewbasetests.py', 1747, 'verify_items_count', 'task.result(timeout)'), ('lib/tasks/future.py', 162, 'result', 'self.set_exception(TimeoutError())'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
      Mon Feb 9 15:37:28 2015
      2015-02-09 15:37:28 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.4.188 with username : root password : couchbase ssh_key:
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.4.188
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.4.188: sudo cat /proc/cpuinfo
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.4.188: df -Th
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.4.188: sudo cat /proc/meminfo
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.4.188: hostname
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2015-02-09 15:37:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.4.188: hostname -d
      2015-02-09 15:37:31 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2015-02-09 15:37:31 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.4.188: /opt/couchbase/bin/cbvdiff -b default 10.3.4.188:11210,10.3.4.189:11210
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.log_command_output] VBucket 2: active count 2 != 1 replica count
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.log_command_output]
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.log_command_output] VBucket 281: active count 2 != 1 replica count
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.log_command_output]
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.log_command_output] VBucket 448: active count 2 != 1 replica count
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.log_command_output]
      2015-02-09 15:37:34 | INFO | MainProcess | test_thread | [remote_util.log_command_output] Active item count = 11111

      C1 : .186, .187
      C2 : .188, .189
      Pls find data files and logs attached.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              apiravi Aruna Piravi (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty