Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36971

Rebalance stuck after replica decrement of the bucket.

    XMLWordPrintable

Details

    Description

      Steps to Reproduce:

      1. Create a 3 node cluster.

        +----------------+----------+--------------+
        | Nodes          | Services | Status       |
        +----------------+----------+--------------+
        | 172.23.105.168 | kv       | Cluster node |
        | 172.23.106.82  | None     | <--- IN ---  |
        | 172.23.106.83  | None     | <--- IN ---  |
        +----------------+----------+--------------+
         

      2. Create a bucket with replicas =1, compression=off, eviction policy = valueOnly.
      3. Load 100 docs into the bucket with transactions and durability as MAJORITY. Bucket Stats after this Step:

        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | GleamBookUsers | membase | 1        | 0   | 121   | 59885223936 | 68191848 | 16983306  |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
         

      4. Rebalance In 1 node(172.23.106.86) with another 40 creates,20 updates in parallel with transactions and durability=MAJORITY. Bucket Stats after this Step:

         +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | GleamBookUsers | membase | 1        | 0   | 169   | 79846965248 | 72661672 | 35408247  |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        

      5. Rebalance Out 1 node(172.23.106.83) with another 40 creates,20 updates,40 deletes in parallel with transactions and durability=MAJORITY. Bucket Stats after this Step:

         +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | GleamBookUsers | membase | 1        | 0   | 177   | 59885223936 | 69478976 | 41532152  |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        

      6. Rebalance In 2 nodes(172.23.106.85, 172.23.106.83) and Rebalance Out 1 node(172.23.106.82) with another 40 creates,20 updates,40 deletes in parallel with transactions and durability=MAJORITY. Bucket Stats after this Step:

         +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | GleamBookUsers | membase | 1        | 0   | 185   | 79846965248 | 73724568 | 47940963  |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+

      7. Swap Rebalance 1 node(IN=172.23.106.82, OUT=172.23.106.83) with another 40 creates,20 updates,40 deletes in parallel with transactions and durability=MAJORITY. Bucket Stats after this Step:

         +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+
        | GleamBookUsers | membase | 1        | 0   | 193   | 79846965248 | 74178992 | 53671743  |
        +----------------+---------+----------+-----+-------+-------------+----------+-----------+

      8. Increment the bucket replica from 1 to 2.
      9. Rebalance In 1 node(172.23.106.83) with another 40 creates,20 updates,40 deletes in parallel with transactions and durability=MAJORITY. Bucket Stats after this Step:

         +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used  | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | GleamBookUsers | membase | 2        | 0   | 201   | 99808706560 | 105600664 | 89580007  |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+

      10. Rebalance the Cluster. After it completes, Perform 40 creates,20 updates,40 deletes with transactions and durability=MAJORITY.
      11. While load mentioned in Step 10 is in progress, Stop the memcached process , Restart the process again after 20 seconds. Bucket Stats after Steps 10-11:

        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used  | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | GleamBookUsers | membase | 2        | 0   | 209   | 99808706560 | 105947400 | 96040934  |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+ 

      12. Perform 40 creates,20 updates,40 deletes with transactions and durability=MAJORITY.
      13. While Step 12 is in progress, failover a node(172.23.106.83).
      14. Rebalance Out the node failed over in Step 13 while Step 12 is in progress. Wait for Step 12 to finish.
      15. Rebalance In 1 node(172.23.106.83).
      16. Bucket Stats after Steps 12-15:

         +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used  | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | GleamBookUsers | membase | 2        | 0   | 217   | 99808706560 | 106897488 | 101330796 |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+

      17. Perform 40 creates,20 updates,40 deletes with transactions and durability=MAJORITY.
      18. While Step 17 is in progress, failover a node(172.23.106.83).
      19. Fully Recovery the node failed over in Step 18. Bucket Stats after Step 17-19:

        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used  | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | GleamBookUsers | membase | 2        | 0   | 225   | 99808706560 | 108080464 | 114445902 |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+ 

      20. Perform 40 creates,20 updates,40 deletes with transactions and durability=MAJORITY.
      21. While Step 20 is in progress, failover a node(172.23.106.83).
      22. Delta Recovery the node failed over in Step 21. Bucket Stats after Step 20-22:

         +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | Bucket         | Type    | Replicas | TTL | Items | RAM Quota   | RAM Used  | Disk Used |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+
        | GleamBookUsers | membase | 2        | 0   | 229   | 99808706560 | 105156400 | 133176197 |
        +----------------+---------+----------+-----+-------+-------------+-----------+-----------+

      23. Decrement the bucket replica from 2 to 1.
      24. Rebalance the cluster.

      Rebalance hangs up at 87%.

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-6.5.0-4901 contains kv_engine commit 0861963 with commit message:
            MB-36971: Return KEY_EEXISTS if stream opaque incorrect

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4901 contains kv_engine commit 0861963 with commit message: MB-36971 : Return KEY_EEXISTS if stream opaque incorrect

            Build couchbase-server-7.0.0-1099 contains kv_engine commit 19210da with commit message:
            MB-36971: Ensure that DCP Producer handles KeyEnoent correctly

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1099 contains kv_engine commit 19210da with commit message: MB-36971 : Ensure that DCP Producer handles KeyEnoent correctly

            Build couchbase-server-7.0.0-1099 contains kv_engine commit 8088d25 with commit message:
            MB-36971: Never skip op::checkpoint_start in CM::getItemsForCursor

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1099 contains kv_engine commit 8088d25 with commit message: MB-36971 : Never skip op::checkpoint_start in CM::getItemsForCursor

            Build couchbase-server-7.0.0-1100 contains kv_engine commit 0861963 with commit message:
            MB-36971: Return KEY_EEXISTS if stream opaque incorrect

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1100 contains kv_engine commit 0861963 with commit message: MB-36971 : Return KEY_EEXISTS if stream opaque incorrect

            Build couchbase-server-7.0.0-1100 contains kv_engine commit f17fdd7 with commit message:
            MB-36971: Send the HCS when streaming a Disk-Checkpoint

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1100 contains kv_engine commit f17fdd7 with commit message: MB-36971 : Send the HCS when streaming a Disk-Checkpoint

            People

              prateek.kumar Prateek Kumar (Inactive)
              prateek.kumar Prateek Kumar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty