Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4828

rebalancing multiple nodes can hang if a bucket has less than 100k items due to a race condition in tap take-over

    XMLWordPrintable

Details

    Description

      this was observed by one of our users which had 10 buckets . some buckets had less than 10k items and tap takeover got stuck.

      tap stats :

      6179: vb_1014:cursor_checkpoint_id:eq_tapq:rebalance_1014: 1
      97312: eq_tapq:rebalance_1014:ack_log_size: 0
      97313: eq_tapq:rebalance_1014:ack_playback_size: 0
      97314: eq_tapq:rebalance_1014:ack_seqno: 10
      97315: eq_tapq:rebalance_1014:ack_window_full: false
      97316: eq_tapq:rebalance_1014:backfill_completed: false
      97317: eq_tapq:rebalance_1014:bg_backlog_size: 0
      97318: eq_tapq:rebalance_1014:bg_jobs_completed: 0
      97319: eq_tapq:rebalance_1014:bg_jobs_issued: 0
      97320: eq_tapq:rebalance_1014:bg_queued: 0
      97321: eq_tapq:rebalance_1014:bg_result_size: 0
      97322: eq_tapq:rebalance_1014:bg_results: 0
      97323: eq_tapq:rebalance_1014:bg_wait_for_results: false
      97324: eq_tapq:rebalance_1014:complete: false
      97325: eq_tapq:rebalance_1014:connected: true
      97326: eq_tapq:rebalance_1014:created: 1272317
      97327: eq_tapq:rebalance_1014:empty: false
      97328: eq_tapq:rebalance_1014:flags: 93 (ack,backfill,vblist,takeover,checkpoints)
      97329: eq_tapq:rebalance_1014:has_item: false
      97330: eq_tapq:rebalance_1014:has_queued_item: true
      97331: eq_tapq:rebalance_1014:idle: false
      97332: eq_tapq:rebalance_1014:num_tap_nack: 0
      97333: eq_tapq:rebalance_1014:num_tap_tmpfail_survivors: 0
      97334: eq_tapq:rebalance_1014:paused: 1
      97335: eq_tapq:rebalance_1014:pending_backfill: false
      97336: eq_tapq:rebalance_1014:pending_disconnect: false
      97337: eq_tapq:rebalance_1014:pending_disk_backfill: false
      97338: eq_tapq:rebalance_1014:qlen: 0
      97339: eq_tapq:rebalance_1014:qlen_high_pri: 0
      97340: eq_tapq:rebalance_1014:qlen_low_pri: 1
      97341: eq_tapq:rebalance_1014:queue_backfillremaining: 0
      97342: eq_tapq:rebalance_1014:queue_backoff: 0
      97343: eq_tapq:rebalance_1014:queue_drain: 0
      97344: eq_tapq:rebalance_1014:queue_fill: 0
      97345: eq_tapq:rebalance_1014:queue_itemondisk: 0
      97346: eq_tapq:rebalance_1014:queue_memory: 0
      97347: eq_tapq:rebalance_1014:rec_fetched: 5
      97348: eq_tapq:rebalance_1014:recv_ack_seqno: 8
      97349: eq_tapq:rebalance_1014:reserved: 1
      97350: eq_tapq:rebalance_1014:seqno_ack_requested: 9
      97351: eq_tapq:rebalance_1014:supports_ack: true
      97352: eq_tapq:rebalance_1014:suspended: false
      97353: eq_tapq:rebalance_1014:total_backlog_size: 10
      97354: eq_tapq:rebalance_1014:total_noops: 20036
      97355: eq_tapq:rebalance_1014:type: producer
      97356: eq_tapq:rebalance_1014:vb_filter:

      { 1014 }


      97357: eq_tapq:rebalance_1014:vb_filters: 1

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            chiyoung Chiyoung Seo (Inactive)
            farshid Farshid Ghods (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty