Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6498

backfill from new master even after reliable replica building procedure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 2.0-beta
    • 2.0-beta
    • couchbase-bucket, ns_server
    • Security Level: Public
    • None

    Description

      Seeing this in (reopening) logs of MB-4673. That was root cause of rebalance failure. Main bug is in ns_server (because we expect backfills from replicas in replica count > 1 case).

      Still I think we should investigate possible issue in ep-engine. Or maybe incorrectness in how ns_server is ensuring replicas are built.

      This happens after movement of vbucket 285 when after moving it we establish new replication chain. Second replica sees this:

      [rebalance:info,2012-08-30T8:24:01.533,ns_1@10.3.121.94:<0.21104.17>:ebucketmigrator_srv:init:485]Some vbuckets were not yet ready to replicate from:
      [285]

      which means open checkpoint for this vbucket was either missing or 0 on first replica. Looking at first replica we indeed see this:

      [ns_server:debug,2012-08-30T8:24:01.296,ns_1@10.3.121.98:<0.26188.21>:ebucketmigrator_srv:init:536]Reusing old upstream:
      [

      {vbuckets,[17,18,19,20,21,22,23,24,270,271,272,273,274,275,276,277,278,279, 280,281,282,283,284,285,321,322,323,324,325,329,330,331,332,333, 334,335,336,480,481,482,483,484,485,486,487,513,514,515,516,517, 518,519,520,521,626,627,628,629,630,631,632,633,975,976,977,978, 979,980,981,982]}

      ,

      {name,<<"replication_ns_1@10.3.121.98">>}

      ,

      {takeover,false}

      ]
      [rebalance:debug,2012-08-30T8:24:01.300,ns_1@10.3.121.98:<0.26188.21>:ebucketmigrator_srv:init:555]upstream_sender pid: <0.26189.21>
      [rebalance:info,2012-08-30T8:24:01.301,ns_1@10.3.121.98:<0.26188.21>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 285
      [ns_server:debug,2012-08-30T8:24:01.311,ns_1@10.3.121.98:<0.28617.6>:mc_connection:do_delete_vbucket:118]Notifying mc_couch_events of vbucket deletion: bucket-1/285

      Because first replica replicates from new master it cannot be ahead of it on open checkpoint and because we're doing reliable replica building we expect 285 on first replica to be up-to-date when new replication chain is built. In fact logs indicate that first replica had closed checkpoint 4 when vbucket filter was changed.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            chiyoung Chiyoung Seo (Inactive)
            alkondratenko Aleksey Kondratenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty