Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4367

rebalance gets stuck even if ack_seqno is correct and has_queued_item is true and total_backlog_size > 1000

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.7.2
    • Fix Version/s: 1.8.0, 2.0-beta
    • Component/s: None
    • Security Level: Public
    • Labels:
      None
    1. diag.txt.gz
      1.55 MB
      Farshid Ghods

      Issue Links

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        root@ip-10-194-21-140:~# /opt/membase/bin/mbstats 10.82.243.173:11210 tap | grep rebalance
        eq_tapq:rebalance_398:ack_log_size: 0
        eq_tapq:rebalance_398:ack_playback_size: 0
        eq_tapq:rebalance_398:ack_seqno: 2646
        eq_tapq:rebalance_398:ack_window_full: false
        eq_tapq:rebalance_398:backfill_completed: false
        eq_tapq:rebalance_398:bg_backlog_size: 0
        eq_tapq:rebalance_398:bg_jobs_completed: 242
        eq_tapq:rebalance_398:bg_jobs_issued: 242
        eq_tapq:rebalance_398:bg_queue_size: 0
        eq_tapq:rebalance_398:bg_queued: 242
        eq_tapq:rebalance_398:bg_result_size: 0
        eq_tapq:rebalance_398:bg_results: 0
        eq_tapq:rebalance_398:bg_wait_for_results: false
        eq_tapq:rebalance_398:complete: false
        eq_tapq:rebalance_398:connected: true
        eq_tapq:rebalance_398:created: 1177
        eq_tapq:rebalance_398:empty: false
        eq_tapq:rebalance_398:flags: 93 (ack,backfill,vblist,takeover,checkpoints)
        eq_tapq:rebalance_398:has_item: false
        eq_tapq:rebalance_398:has_queued_item: true
        eq_tapq:rebalance_398:idle: false
        eq_tapq:rebalance_398:num_tap_nack: 0
        eq_tapq:rebalance_398:num_tap_tmpfail_survivors: 0
        eq_tapq:rebalance_398:paused: 1
        eq_tapq:rebalance_398:pending_backfill: false
        eq_tapq:rebalance_398:pending_disconnect: false
        eq_tapq:rebalance_398:pending_disk_backfill: false
        eq_tapq:rebalance_398:qlen: 0
        eq_tapq:rebalance_398:qlen_high_pri: 0
        eq_tapq:rebalance_398:qlen_low_pri: 1
        eq_tapq:rebalance_398:queue_backfillremaining: 0
        eq_tapq:rebalance_398:queue_backoff: 0
        eq_tapq:rebalance_398:queue_drain: 2637
        eq_tapq:rebalance_398:queue_fill: 0
        eq_tapq:rebalance_398:queue_itemondisk: 0
        eq_tapq:rebalance_398:queue_memory: 0
        eq_tapq:rebalance_398:rec_fetched: 2641
        eq_tapq:rebalance_398:recv_ack_seqno: 2645
        eq_tapq:rebalance_398:reserved: 1
        eq_tapq:rebalance_398:supports_ack: true
        eq_tapq:rebalance_398:suspended: false
        eq_tapq:rebalance_398:total_backlog_size: 1248
        eq_tapq:rebalance_398:total_noops: 37
        eq_tapq:rebalance_398:type: producer
        eq_tapq:rebalance_398:vb_filter:

        { 398 }

        eq_tapq:rebalance_398:vb_filters: 1

        Show
        farshid Farshid Ghods (Inactive) added a comment - root@ip-10-194-21-140:~# /opt/membase/bin/mbstats 10.82.243.173:11210 tap | grep rebalance eq_tapq:rebalance_398:ack_log_size: 0 eq_tapq:rebalance_398:ack_playback_size: 0 eq_tapq:rebalance_398:ack_seqno: 2646 eq_tapq:rebalance_398:ack_window_full: false eq_tapq:rebalance_398:backfill_completed: false eq_tapq:rebalance_398:bg_backlog_size: 0 eq_tapq:rebalance_398:bg_jobs_completed: 242 eq_tapq:rebalance_398:bg_jobs_issued: 242 eq_tapq:rebalance_398:bg_queue_size: 0 eq_tapq:rebalance_398:bg_queued: 242 eq_tapq:rebalance_398:bg_result_size: 0 eq_tapq:rebalance_398:bg_results: 0 eq_tapq:rebalance_398:bg_wait_for_results: false eq_tapq:rebalance_398:complete: false eq_tapq:rebalance_398:connected: true eq_tapq:rebalance_398:created: 1177 eq_tapq:rebalance_398:empty: false eq_tapq:rebalance_398:flags: 93 (ack,backfill,vblist,takeover,checkpoints) eq_tapq:rebalance_398:has_item: false eq_tapq:rebalance_398:has_queued_item: true eq_tapq:rebalance_398:idle: false eq_tapq:rebalance_398:num_tap_nack: 0 eq_tapq:rebalance_398:num_tap_tmpfail_survivors: 0 eq_tapq:rebalance_398:paused: 1 eq_tapq:rebalance_398:pending_backfill: false eq_tapq:rebalance_398:pending_disconnect: false eq_tapq:rebalance_398:pending_disk_backfill: false eq_tapq:rebalance_398:qlen: 0 eq_tapq:rebalance_398:qlen_high_pri: 0 eq_tapq:rebalance_398:qlen_low_pri: 1 eq_tapq:rebalance_398:queue_backfillremaining: 0 eq_tapq:rebalance_398:queue_backoff: 0 eq_tapq:rebalance_398:queue_drain: 2637 eq_tapq:rebalance_398:queue_fill: 0 eq_tapq:rebalance_398:queue_itemondisk: 0 eq_tapq:rebalance_398:queue_memory: 0 eq_tapq:rebalance_398:rec_fetched: 2641 eq_tapq:rebalance_398:recv_ack_seqno: 2645 eq_tapq:rebalance_398:reserved: 1 eq_tapq:rebalance_398:supports_ack: true eq_tapq:rebalance_398:suspended: false eq_tapq:rebalance_398:total_backlog_size: 1248 eq_tapq:rebalance_398:total_noops: 37 eq_tapq:rebalance_398:type: producer eq_tapq:rebalance_398:vb_filter: { 398 } eq_tapq:rebalance_398:vb_filters: 1
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        same thing happened again while removing 10 nodes and adding 10 nodes at the same time after moving 90 percent of the vbuckets.

        Every 2.0s: /opt/membase/bin/mbstats 10.76.61.117:11210 tap | grep rebalance Thu Oct 20 06:21:06 2011

        eq_tapq:rebalance_870:ack_log_size: 9219
        eq_tapq:rebalance_870:ack_playback_size: 9219
        eq_tapq:rebalance_870:ack_seqno: 9244
        eq_tapq:rebalance_870:ack_window_full: false
        eq_tapq:rebalance_870:backfill_completed: false
        eq_tapq:rebalance_870:bg_backlog_size: 0
        eq_tapq:rebalance_870:bg_jobs_completed: 0
        eq_tapq:rebalance_870:bg_jobs_issued: 0
        eq_tapq:rebalance_870:bg_queue_size: 0
        eq_tapq:rebalance_870:bg_queued: 0
        eq_tapq:rebalance_870:bg_result_size: 0
        eq_tapq:rebalance_870:bg_results: 0
        eq_tapq:rebalance_870:bg_wait_for_results: false
        eq_tapq:rebalance_870:complete: false
        eq_tapq:rebalance_870:connected: true
        eq_tapq:rebalance_870:created: 23270
        eq_tapq:rebalance_870:empty: false
        eq_tapq:rebalance_870:flags: 93 (ack,backfill,vblist,takeover,checkpoints)
        eq_tapq:rebalance_870:has_item: false
        eq_tapq:rebalance_870:has_queued_item: true
        eq_tapq:rebalance_870:idle: false
        eq_tapq:rebalance_870:num_tap_nack: 20
        eq_tapq:rebalance_870:num_tap_tmpfail_survivors: 20
        eq_tapq:rebalance_870:paused: 0
        eq_tapq:rebalance_870:pending_backfill: false
        eq_tapq:rebalance_870:pending_disconnect: false
        eq_tapq:rebalance_870:pending_disk_backfill: false
        eq_tapq:rebalance_870:qlen: 156241
        eq_tapq:rebalance_870:qlen_high_pri: 0
        eq_tapq:rebalance_870:qlen_low_pri: 1
        eq_tapq:rebalance_870:queue_backfillremaining: 156241
        eq_tapq:rebalance_870:queue_backoff: 20
        eq_tapq:rebalance_870:queue_drain: 9239
        eq_tapq:rebalance_870:queue_fill: 0
        eq_tapq:rebalance_870:queue_itemondisk: 0
        eq_tapq:rebalance_870:queue_memory: 0
        eq_tapq:rebalance_870:rec_fetched: 9243
        eq_tapq:rebalance_870:recv_ack_seqno: 24
        eq_tapq:rebalance_870:reserved: 1
        eq_tapq:rebalance_870:supports_ack: true
        eq_tapq:rebalance_870:suspended: false
        eq_tapq:rebalance_870:total_backlog_size: 156607
        eq_tapq:rebalance_870:total_noops: 1
        eq_tapq:rebalance_870:type: producer
        eq_tapq:rebalance_870:vb_filter:

        { 870 }

        eq_tapq:rebalance_870:vb_filters: 1

        Show
        farshid Farshid Ghods (Inactive) added a comment - same thing happened again while removing 10 nodes and adding 10 nodes at the same time after moving 90 percent of the vbuckets. Every 2.0s: /opt/membase/bin/mbstats 10.76.61.117:11210 tap | grep rebalance Thu Oct 20 06:21:06 2011 eq_tapq:rebalance_870:ack_log_size: 9219 eq_tapq:rebalance_870:ack_playback_size: 9219 eq_tapq:rebalance_870:ack_seqno: 9244 eq_tapq:rebalance_870:ack_window_full: false eq_tapq:rebalance_870:backfill_completed: false eq_tapq:rebalance_870:bg_backlog_size: 0 eq_tapq:rebalance_870:bg_jobs_completed: 0 eq_tapq:rebalance_870:bg_jobs_issued: 0 eq_tapq:rebalance_870:bg_queue_size: 0 eq_tapq:rebalance_870:bg_queued: 0 eq_tapq:rebalance_870:bg_result_size: 0 eq_tapq:rebalance_870:bg_results: 0 eq_tapq:rebalance_870:bg_wait_for_results: false eq_tapq:rebalance_870:complete: false eq_tapq:rebalance_870:connected: true eq_tapq:rebalance_870:created: 23270 eq_tapq:rebalance_870:empty: false eq_tapq:rebalance_870:flags: 93 (ack,backfill,vblist,takeover,checkpoints) eq_tapq:rebalance_870:has_item: false eq_tapq:rebalance_870:has_queued_item: true eq_tapq:rebalance_870:idle: false eq_tapq:rebalance_870:num_tap_nack: 20 eq_tapq:rebalance_870:num_tap_tmpfail_survivors: 20 eq_tapq:rebalance_870:paused: 0 eq_tapq:rebalance_870:pending_backfill: false eq_tapq:rebalance_870:pending_disconnect: false eq_tapq:rebalance_870:pending_disk_backfill: false eq_tapq:rebalance_870:qlen: 156241 eq_tapq:rebalance_870:qlen_high_pri: 0 eq_tapq:rebalance_870:qlen_low_pri: 1 eq_tapq:rebalance_870:queue_backfillremaining: 156241 eq_tapq:rebalance_870:queue_backoff: 20 eq_tapq:rebalance_870:queue_drain: 9239 eq_tapq:rebalance_870:queue_fill: 0 eq_tapq:rebalance_870:queue_itemondisk: 0 eq_tapq:rebalance_870:queue_memory: 0 eq_tapq:rebalance_870:rec_fetched: 9243 eq_tapq:rebalance_870:recv_ack_seqno: 24 eq_tapq:rebalance_870:reserved: 1 eq_tapq:rebalance_870:supports_ack: true eq_tapq:rebalance_870:suspended: false eq_tapq:rebalance_870:total_backlog_size: 156607 eq_tapq:rebalance_870:total_noops: 1 eq_tapq:rebalance_870:type: producer eq_tapq:rebalance_870:vb_filter: { 870 } eq_tapq:rebalance_870:vb_filters: 1
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        this is the giant cluster with 170 million items

        Show
        farshid Farshid Ghods (Inactive) added a comment - this is the giant cluster with 170 million items
        Hide
        dipti Dipti Borkar added a comment -

        Duplicate of MB-4517

        Show
        dipti Dipti Borkar added a comment - Duplicate of MB-4517
        Hide
        karan Karan Kumar (Inactive) added a comment -

        Duplicate.

        Show
        karan Karan Kumar (Inactive) added a comment - Duplicate.

          People

          • Assignee:
            mikew Mike Wiederhold
            Reporter:
            farshid Farshid Ghods (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes