Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-14973

GoXDCR: It takes 10 mins of no replication to detect xmem is stuck

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.0.0
    • 4.0.0
    • XDCR
    • Security Level: Public
    • centOS 6.x

    Description

      Build


      4.0.0-2109

      Found during manual testing.

      1.C1 [.186] --> C2[.188], existing default buckets, replication
      2. Rebalance-in .189 on C2.
      3. In parallel, start load on C1.
      Replication stops with error NOT_MY_VBUCKET

      Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default failed. err=map[xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_1:Fatal error when receiving responses from memcached in target cluster.]
      

      This error was seen at 16:16:47 - Wed May 13, 2015

      Then pipeline was constructed -
      Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default started running. xdcr000 ns_1@127.0.0.1 16:17:00 - Wed May 13, 2015

      4. Rebalance completed at 16:19:06 - Wed May 13, 2015

      Rebalance completed successfully.
      ns_orchestrator001	ns_1@10.3.4.188	16:19:06 - Wed May 13, 2015
      

      5. Although pipeline was constructed at 16:17:00, for next 10 mins there was no replication between .186 and .188 until the next error message is reported on .186 at 16:27:04 - Wed May 13, 2015

      Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default failed. err=map[xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_1:Xmem is stuck]	xdcr000	ns_1@127.0.0.1	16:27:04 - Wed May 13, 2015
      

      6. Replication then starts on .186 @
      Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default started running. xdcr000 ns_1@127.0.0.1 16:27:17 - Wed May 13, 2015

      Questions
      ---------
      1. Why is C1 reporting xmem stuck, although target cluster rebalance completed 8 mins earlier?
      2. Why does it take 10 mins to report Xmem is stuck?

      Attaching cbcollect from .186 and .188

      Attachments

        For Gerrit Dashboard: MB-14973
        # Subject Branch Project Status CR V

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty