Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-12268

Changes in 3.0 causes up to 200ms delay after first nmv is received during a rebalance

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • None
    • 3.0, 4.0.0, 4.1.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • Untriaged
    • Yes

    Description

      Doing some load testing with 3.0 exposed a potentially serious performance issue during rebalance.

      When a client (in this case libcouchbase via pillowfight) receives the first "not my vbucket", it took nearly 200ms before it was able to get a config via carrier publication that had the correct vbucket in it.

      From the application's perspective, 2.5.1 had only a few 'ms' delay on each missed operation during rebalance, and with 3.0 it is receiving up to 200ms. We need further investigation, but it seems like a regression.

      Logs from a pillowfight run:

      1 15408ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
         2 15408ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
         3 15408ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f4318067b90 to retry queue. Try count=1
         4 15408ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 14 ms
         5 15408ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
         6 15408ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
         7 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
         8 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
         9 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
        10 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
        11 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
        12 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
        13 15424ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f4318067b90 (S=31359). VBID=550
        14 15426ms [I0] {3376} [INFO] (confmon - L:161) Not applying configuration received via CCCP. No changes detected
        15 15426ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f4318070370 to retry queue. Try count=2
        16 15426ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 29 ms
        17 15460ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f4318070370 (S=31359). VBID=550
        18 15461ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
        19 15461ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
        20 15461ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f43180f4c20 to retry queue. Try count=3
        21 15461ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 44 ms
        22 15461ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
        23 15461ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
        24 15461ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
        25 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
        26 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
        27 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
        28 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
        29 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
        30 15507ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f43180f4c20 (S=31359). VBID=550
        31 15509ms [I0] {3376} [INFO] (confmon - L:161) Not applying configuration received via CCCP. No changes detected
        32 15509ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f4318067c10 to retry queue. Try count=4
        33 15509ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 59 ms
        34 15570ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f4318067c10 (S=31359). VBID=550
        35 15572ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
        36 15572ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
        37 15572ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f43180f50d0 to retry queue. Try count=5
        38 15572ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 74 ms
        39 15572ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
        40 15572ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
        41 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
        42 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
        43 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
        44 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
        45 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
        46 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
        47 15648ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f43180f50d0 (S=31359). VBID=550
        48 15649ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
        49 15649ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
        50 15649ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f43180704c0 to retry queue. Try count=6
        51 15649ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 89 ms
        52 15649ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
        53 15649ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
        54 15649ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
        55 15649ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
        56 15649ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
        57 15650ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
        58 15650ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
        59 15650ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-12268
          # Subject Branch Project Status CR V

          Activity

            People

              drigby Dave Rigby (Inactive)
              perry Perry Krug
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty