Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34319

[ 5.5.4-MP1]-Online upgrade with swap rebalance takes unusually long time

    XMLWordPrintable

Details

    • Bug
    • Resolution: User Error
    • Major
    • 5.5.4
    • 5.5.4
    • couchbase-bucket
    • None
    • 5.5.4-4338 -> 5.5.4-4340

    Description

      Script to Repro

      ./testrunner -i /tmp/win10-bucket-ops.ini -p upgrade_version=5.5.4-4340 -t newupgradetests.MultiNodesUpgradeTests.online_upgrade_swap_rebalance_with_high_doc_ops,initial_version=5.5.4-4338,items=1000000,nodes_init=3,run_with_views=False,flusher_batch_split_trigger=3
      

      Steps to Repro
      1) Create a 1 node cluster(5.5.4-4338) and bucket default
      2) Set the following command and restart memcached

      curl -i -u Administrator:password --data 'ns_bucket:update_bucket_props("default", [{extra_config_string, "flusher_batch_split_trigger=3"}]).' http://host:8091/diag/eval
      

      3) Start dataload in progress
      4) Rebalance in 1 node(5.5.4-4338)
      5) Rebalance in another node(5.5.4-4338).
      6) Check the following stats. It won't be true because of MB-34173 which is expected

      last_persisted_snap_start <= last_persisted_seqno <=  last_persisted_snap_end
      

      7) Start data load again
      8) Swap rebalance 1 5.5.4-4338 with 5.5.4-4340 node.
      9) Repeat step 7) and 8) till the last 5.5.4-4338 node is swap rebalanced.

      However it is noticed that the first two swap rebalance takes around 9 mins and 18 mins respectively. However the final swap rebalance takes unusually long time(close to 2.5 hours)

      In Final swap rebalance the node in is 172.23.121.10(5.5.4-4340) and node being rebalanced out is 172.23.120.201(5.5.4-4338).

      I see the following entries in logs(on 172.23.121.10).

      [rebalance:debug,2019-05-23T07:30:34.618-07:00,ns_1@172.23.121.10:<0.16083.1>:janitor_agent:do_wait_seqno_persisted:982]Got etmpfail while waiting for sequence number 20547 to persist for vBucket:997. Will retry.
      [rebalance:debug,2019-05-23T07:30:35.064-07:00,ns_1@172.23.121.10:<0.16092.1>:janitor_agent:do_wait_seqno_persisted:982]Got etmpfail while waiting for sequence number 20528 to persist for vBucket:671. Will retry.
      [rebalance:debug,2019-05-23T07:30:35.070-07:00,ns_1@172.23.121.10:<0.16098.1>:janitor_agent:do_wait_seqno_persisted:982]Got etmpfail while waiting for sequence number 20636 to persist for vBucket:996. Will retry.
      [rebalance:debug,2019-05-23T07:30:35.368-07:00,ns_1@172.23.121.10:<0.16141.1>:janitor_agent:do_wait_seqno_persisted:982]Got etmpfail while waiting for sequence number 20760 to persist for vBucket:995. Will retry.
      [rebalance:debug,2019-05-23T07:30:35.380-07:00,ns_1@172.23.121.10:<0.16147.1>:janitor_agent:do_wait_seqno_persisted:982]Got etmpfail while waiting for sequence number 20887 to persist for vBucket:994. Will retry.
      [rebalance:debug,2019-05-23T07:30:35.544-07:00,ns_1@172.23.121.10:<0.16153.1>:janitor_agent:do_wait_seqno_persisted:982]Got etmpfail while waiting for sequence number 20655 to persist for vBucket:670. Will retry.
      

      cbccollect_info logs attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            owend Daniel Owen
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty