Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41092

[Collections] Incorrect number of items after deleting collections during failover

    XMLWordPrintable

Details

    Description

      Summary:
      Incorrect/Inconsistent num items resulting after dropping collections during failover and rebalance-out operation

      Script to Repo:

      ./testrunner -i /tmp/durability_volume.ini sdk_client_pool=True,rerun=False,skip_validations=False,log_level=debug -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_rebalance_out,nodes_init=5,nodes_failover=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud 

      Steps to Reproduce
      1. Create 5 node cluster
      2020-08-23 03:44:30,214 | test | INFO | pool-1-thread-7 | [table_view:display:72] Rebalance Overview

       +-----------------+---------++--------------
      |Nodes|Services|Status|
      +-----------------+---------++--------------
      |172.23.105.211|kv|Cluster node|
      |172.23.105.212|None|<--- IN —|
      |172.23.105.213|None|<--- IN —|
      |172.23.105.215|None|<--- IN —|
      |172.23.105.217|None|<--- IN —|
      +-----------------+---------++--------------
      

      2.  Create 1 bucket, 61 collections and load data (2500 items each. Total 61x2500=152,500)

       2020-08-23 03:49:23,144 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
       +----------+----------++----------------------++-------------++------------------------++-----------
      |Bucket|Type|Replicas|Durability|TTL|Items|RAM Quota|RAM Used|Disk Used|
      +----------+----------++----------------------++-------------++------------------------++-----------
      |default|couchbase|3|none|0|152500|10485760000|398529632|542554790|
       
      +----------+----------++----------------------++-------------++------------------------++-----------
      

      3. Failover .215 and .217 one after the other and rebalance them out, while dropping 60 collections in parallel
      4. Do the data validation

      Expected Results
      1 collection remains, with 2500 items .

      Actual Results
      1 collection remains, with 2512 items (12 additional items). Refer screenshots, and observations below.

      Observations
      Command 1:

      /opt/couchbase/bin/cbstats localhost:11210 -u Administrator -p password all -a | grep curr_items | grep vb_active_curr_items

      Command 2 

      /opt/couchbase/bin/cbstats localhost:11210 -u Administrator -p password collections -a | grep items: | tr -s ' ' | cut -d ':' -f 4 | awk '{sum+=$1} END {print sum}'

      Command 1 on node .211 - 851 (7 items extra)
      Command 2 on node .211 - 844

      Command 1 on node .212 - 830 (5 items extra)
      Command 1 on node .212 - 825

      Command 1 on node .213 - 831
      Command 2 on node .213 - 831 (matches properly on this node)

      ie: sum of all items from all collections combined shows to be 2500(the correct items expected)
      but the curr_items_tot reports an incorrect number.

      • this issue is reproducible most of the times, but not always. 
      • Did not observe this issue with fewer items.

      Attachments

        1. after-compaction.png
          after-compaction.png
          42 kB
        2. before-compaction.png
          before-compaction.png
          42 kB
        3. collection_UI.png
          collection_UI.png
          98 kB
        4. couch-dbdump-859.couch.31.txt
          81 kB
        5. hash_dump.txt
          146 kB
        6. hash_table_dump.zip
          423 kB
        7. image-2020-10-12-13-50-13-195.png
          image-2020-10-12-13-50-13-195.png
          147 kB
        8. image-2020-10-20-13-17-39-052.png
          image-2020-10-20-13-17-39-052.png
          43 kB
        9. items_UI.png
          items_UI.png
          194 kB
        10. mem-vb-709.txt
          392 kB
        11. node-220-hash-table-709.txt
          50 kB
        12. rollback.txt
          314 kB
        13. screenshot-1.png
          screenshot-1.png
          16 kB
        14. screenshot-2.png
          screenshot-2.png
          17 kB
        15. Screenshot 2020-10-19 at 12.33.22 PM.png
          Screenshot 2020-10-19 at 12.33.22 PM.png
          382 kB
        16. screenshot-3.png
          screenshot-3.png
          22 kB
        17. screenshot-4.png
          screenshot-4.png
          45 kB
        18. screenshot-5.png
          screenshot-5.png
          44 kB
        19. screenshot-6.png
          screenshot-6.png
          47 kB

        Issue Links

          For Gerrit Dashboard: MB-41092
          # Subject Branch Project Status CR V

          Activity

            People

              sumedh.basarkod Sumedh Basarkod (Inactive)
              sumedh.basarkod Sumedh Basarkod (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are 2 open Gerrit changes

                  PagerDuty