Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11778

upr replica is unable to detect death of upr producer (was: Some replica items not deleted)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 3.0
    • 3.0-Beta
    • couchbase-bucket
    • Security Level: Public
    • None
    • centOS 6.x

    Description

      I'm seeing a bug similar to MB-11573 on 991. 600 replica items haven't been deleted. However curr_items and vb_active_curr_items are correct.

      2014-07-21 18:18:44 | INFO | MainProcess | Cluster_Thread | [task.check] Saw curr_items 2800 == 2800 expected on '172.23.106.47:8091''172.23.106.48:8091',default bucket
      2014-07-21 18:18:45 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.47:11210 default
      2014-07-21 18:18:45 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.48:11210 default
      2014-07-21 18:18:45 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_active_curr_items 2800 == 2800 expected on '172.23.106.47:8091''172.23.106.48:8091',default bucket
      2014-07-21 18:18:45 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.47:11210 default
      2014-07-21 18:18:45 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.48:11210 default
      2014-07-21 18:18:45 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 3400 == 2800 expected on '172.23.106.47:8091''172.23.106.48:8091', default bucket
      2014-07-21 18:18:48 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 3400 == 2800 expected on '172.23.106.47:8091''172.23.106.48:8091', sasl_bucket_1 bucket
      2014-07-21 18:18:49 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 3400 == 2800 expected on '172.23.106.47:8091''172.23.106.48:8091', standard_bucket_1 bucket

      testcase:
      ./testrunner -i sanity.ini -t xdcr.pauseResumeXDCR.PauseResumeTest.replication_with_pause_and_resume,reboot=dest_node,items=2000,rdirection=bidirection,replication_type=xmem,standard_buckets=1,sasl_buckets=1,pause=source-destination,doc-ops=update-delete,doc-ops-dest=update-delete

      What the test does:

      3nodes * 3nodes, bi-dir xdcr on 3 buckets
      1. Load 2k items on both clusters. Pause all xdcr(all items got replicated by this time)
      2. Reboot one dest node (.48)
      3. After warmup, resume replication on all buckets, on both clusters
      4. 30% Update, 30% delete items on both sides. No expiration set.
      5. Verify item count , value and rev-ids.

      The cluster is available for debugging until tomorrow morning. Thanks.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty