Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5534

keys failing to persist to disk with state "ram_but_not_disk" during rebalance

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Security Level: Public
    • Labels:
      None
    • Environment:
      2.0.0-1314-rel

      Description

      in viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in

      Load 200k docs
      Add+Rebalance 6 nodes to cluster while running queries

      Looks like what happens is the keys being moved to new servers are not always being persisted to disk.

      ...from test log's we report when key a did not persist and cannot be indexed by view-engine.

      ["query doc_id: admin0150-2008_12_27 doesn\'t exist in bucket: default", "Error expected in results for key with invalid state

      {\'key_vb_state\': \'active\', \'key_last_modification_time\': \'1339537159\', \'key_data_age\': \'0\', \'key_cas\': \'7680066560307527\', \'key_exptime\': \'0\', \'key_is_dirty\': \'0\', \'key_flags\': \'0\', \'key_valid\': \'ram_but_not_disk\'}

      ",

      There should be plenty of time to allow this as we retry the query several times and no new docs are being loaded/updated during the rebalance.

      diags attached.

      1. 10.2.2.108-8091-diag.txt.gz
        555 kB
        Tommie McAfee
      2. 10.2.2.109-8091-diag.txt.gz
        1.69 MB
        Tommie McAfee
      3. 10.2.2.60-8091-diag.txt.gz
        1.43 MB
        Tommie McAfee
      4. 10.2.2.63-8091-diag.txt.gz
        1.68 MB
        Tommie McAfee
      5. 10.2.2.64-8091-diag.txt.gz
        562 kB
        Tommie McAfee
      6. 10.2.2.65-8091-diag.txt.gz
        559 kB
        Tommie McAfee
      7. 10.2.2.67-8091-diag.txt.gz
        567 kB
        Tommie McAfee
      8. finddoc.pl
        0.9 kB
        Tommie McAfee
      9. get_key_meta.py
        0.8 kB
        Tommie McAfee
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        tommie Tommie McAfee added a comment -

        Hi Jin, trying to repro for you. I did not see anything in the logs about error's while persisting keys.

        Show
        tommie Tommie McAfee added a comment - Hi Jin, trying to repro for you. I did not see anything in the logs about error's while persisting keys.
        Hide
        tommie Tommie McAfee added a comment -

        Seeing now the state of the servers are in the same condition as MB-5511 with rebalance hanging and deadlock in indexer. Probably a side-effect of that bug.

        Show
        tommie Tommie McAfee added a comment - Seeing now the state of the servers are in the same condition as MB-5511 with rebalance hanging and deadlock in indexer. Probably a side-effect of that bug.
        Hide
        jin Jin Lim added a comment -

        MB-5511 tracking the root cause of this issue.

        Show
        jin Jin Lim added a comment - MB-5511 tracking the root cause of this issue.
        Hide
        tommie Tommie McAfee added a comment -

        MB-5511 is partially fixed (index deadlock) but rebalance is still hanging and this bug is still reproducible in latest 2.0

        Show
        tommie Tommie McAfee added a comment - MB-5511 is partially fixed (index deadlock) but rebalance is still hanging and this bug is still reproducible in latest 2.0
        Hide
        tommie Tommie McAfee added a comment -

        Hey Jin, I currently have a cluster in this state were keys are not persisted and am using a script to help get the following info:

        Here state of key(admin0110-2008_10_06) on host 10.2.2.64 says is 'ram but not disk':

        {'key_vb_state': 'replica', 'key_last_modification_time': '1339724247', 'key_data_age': '0', 'key_cas': '7866802812969421', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'}

        If I query the database, notice this key is skipped in the results:

        startkey = [2008,10,6] , startkey_docid = admin0109-2008_10_06
        …..

        {"id":"admin0109-2008_10_06","key":[2008,10,6],"value":["employee-109-0522e76","109@couchbase.com"]}

        ,

        {"id":"admin0111-2008_10_06","key":[2008,10,6],"value":["employee-111-0522e76","111@couchbase.com"]}

        ,

        {"id":"admin0112-2008_10_06","key":[2008,10,6],"value":["employee-112-0522e76","112@couchbase.com"]}

        ,

        {"id":"admin0113-2008_10_06","key":[2008,10,6],"value":["employee-113-0522e76","113@couchbase.com"]}

        ,

        This is post rebalance in.
        master node is 10.2.2.60
        Would be great if someone could take a look.

        Show
        tommie Tommie McAfee added a comment - Hey Jin, I currently have a cluster in this state were keys are not persisted and am using a script to help get the following info: Here state of key(admin0110-2008_10_06) on host 10.2.2.64 says is 'ram but not disk': {'key_vb_state': 'replica', 'key_last_modification_time': '1339724247', 'key_data_age': '0', 'key_cas': '7866802812969421', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'} If I query the database, notice this key is skipped in the results: startkey = [2008,10,6] , startkey_docid = admin0109-2008_10_06 ….. {"id":"admin0109-2008_10_06","key":[2008,10,6],"value":["employee-109-0522e76","109@couchbase.com"]} , {"id":"admin0111-2008_10_06","key":[2008,10,6],"value":["employee-111-0522e76","111@couchbase.com"]} , {"id":"admin0112-2008_10_06","key":[2008,10,6],"value":["employee-112-0522e76","112@couchbase.com"]} , {"id":"admin0113-2008_10_06","key":[2008,10,6],"value":["employee-113-0522e76","113@couchbase.com"]} , This is post rebalance in. master node is 10.2.2.60 Would be great if someone could take a look.
        Hide
        tommie Tommie McAfee added a comment -

        build 1334

        Show
        tommie Tommie McAfee added a comment - build 1334
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Unlikely to be because of MB-5511 or MB-5541.

        We've seen this error "ram_but_not_disk" before the changes that added the occasional deadlock were merged. Also, if the deadlock was triggered, all subsequent operations and tests would just timeout or never finish, and this wasn't the case in the results of MB-5331.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Unlikely to be because of MB-5511 or MB-5541 . We've seen this error "ram_but_not_disk" before the changes that added the occasional deadlock were merged. Also, if the deadlock was triggered, all subsequent operations and tests would just timeout or never finish, and this wasn't the case in the results of MB-5331 .
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        With build 1338, which includes the view-engine dead lock fixes (unrelated to the problem btw, MB-5541 MB-5511), we're still seeing this issue:

        http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/

        The test detects, via memcached, that some keys are in RAM but didn't get persisted to disk.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - With build 1338, which includes the view-engine dead lock fixes (unrelated to the problem btw, MB-5541 MB-5511 ), we're still seeing this issue: http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/ The test detects, via memcached, that some keys are in RAM but didn't get persisted to disk.
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Also, looking at one of the tests output:

        [2012-06-15 09:47:29,432] - [data_helper] [140513919293184] - INFO - key arch0008-2008_09_22 doesn't exist in memcached
        [2012-06-15 09:47:29,511] - [data_helper] [140513919293184] - INFO - key arch0000-2008_10_09 doesn't exist in memcached
        [2012-06-15 09:47:29,581] - [data_helper] [140513919293184] - INFO - key arch0191-2008_01_23 doesn't exist in memcached
        [2012-06-15 09:47:29,659] - [data_helper] [140513919293184] - INFO - key arch0187-2008_05_14 doesn't exist in memcached
        [2012-06-15 09:47:29,739] - [data_helper] [140513919293184] - INFO - key arch0070-2008_09_19 doesn't exist in memcached
        [2012-06-15 09:47:29,815] - [data_helper] [140513919293184] - INFO - key arch0040-2008_07_27 doesn't exist in memcached
        [2012-06-15 09:47:29,894] - [data_helper] [140513919293184] - INFO - key arch0095-2008_10_05 doesn't exist in memcached
        [2012-06-15 09:47:29,979] - [data_helper] [140513919293184] - INFO - key arch0087-2008_03_13 doesn't exist in memcached
        [2012-06-15 09:47:30,058] - [data_helper] [140513919293184] - INFO - key arch0034-2008_04_13 doesn't exist in memcached
        [2012-06-15 09:47:30,139] - [data_helper] [140513919293184] - INFO - key arch0084-2008_04_08 doesn't exist in memcached

        From:

        http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/artifact/logs/testrunner-12-Jun-15_05-28-10/test_employee_dataset_alldocs_queries_rebalance_in.log

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Also, looking at one of the tests output: [2012-06-15 09:47:29,432] - [data_helper] [140513919293184] - INFO - key arch0008-2008_09_22 doesn't exist in memcached [2012-06-15 09:47:29,511] - [data_helper] [140513919293184] - INFO - key arch0000-2008_10_09 doesn't exist in memcached [2012-06-15 09:47:29,581] - [data_helper] [140513919293184] - INFO - key arch0191-2008_01_23 doesn't exist in memcached [2012-06-15 09:47:29,659] - [data_helper] [140513919293184] - INFO - key arch0187-2008_05_14 doesn't exist in memcached [2012-06-15 09:47:29,739] - [data_helper] [140513919293184] - INFO - key arch0070-2008_09_19 doesn't exist in memcached [2012-06-15 09:47:29,815] - [data_helper] [140513919293184] - INFO - key arch0040-2008_07_27 doesn't exist in memcached [2012-06-15 09:47:29,894] - [data_helper] [140513919293184] - INFO - key arch0095-2008_10_05 doesn't exist in memcached [2012-06-15 09:47:29,979] - [data_helper] [140513919293184] - INFO - key arch0087-2008_03_13 doesn't exist in memcached [2012-06-15 09:47:30,058] - [data_helper] [140513919293184] - INFO - key arch0034-2008_04_13 doesn't exist in memcached [2012-06-15 09:47:30,139] - [data_helper] [140513919293184] - INFO - key arch0084-2008_04_08 doesn't exist in memcached From: http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/artifact/logs/testrunner-12-Jun-15_05-28-10/test_employee_dataset_alldocs_queries_rebalance_in.log
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Farshid, can you give this high priority, like trunk green blocker?

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Farshid, can you give this high priority, like trunk green blocker?
        Hide
        jin Jin Lim added a comment -

        The db files being used for this failed test are gone by now. Otherwise, we could have first verified whether the missing document has never persisted or has actually persisted on disk but somehow being not known to upper layers. Anyway, while looking at remaining logs, I found a bunch of compactor crash one just like this below. And, I think this needs to be looked at by engineering team - as the compactor crash could have left a db file unknown state including missing docs.

        [ns_server:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:compaction_daemon:compaction_daemon:handle_info:211] Compactor <0.10399.2> exited unexpectedly: {timeout,
        {gen_server,call,
        ['capi_set_view_manager-default',
        fetch_ddocs]}}. Moving to the next bucket.
        [error_logger:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:error_logger:ale_error_logger_handler:log_report:72]
        =========================CRASH REPORT=========================
        crasher:
        initial call: compaction_daemon:spawn_bucket_compactor/3-fun-2/0
        pid: <0.10399.2>
        registered_name: []
        exception exit: {timeout,
        {gen_server,call,
        ['capi_set_view_manager-default',fetch_ddocs]}}
        in function gen_server:call/2
        in call from compaction_daemon:ddoc_names/1
        in call from compaction_daemon:'spawn_bucket_compactor/3-fun-2'/4
        ancestors: [compaction_daemon,<0.431.0>,ns_server_sup,
        ns_server_cluster_sup,<0.59.0>]
        messages: []
        links: [<0.438.0>]
        dictionary: []
        trap_exit: false
        status: running
        heap_size: 6765
        stack_size: 24
        reductions: 174901
        neighbours:

        Show
        jin Jin Lim added a comment - The db files being used for this failed test are gone by now. Otherwise, we could have first verified whether the missing document has never persisted or has actually persisted on disk but somehow being not known to upper layers. Anyway, while looking at remaining logs, I found a bunch of compactor crash one just like this below. And, I think this needs to be looked at by engineering team - as the compactor crash could have left a db file unknown state including missing docs. [ns_server:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:compaction_daemon:compaction_daemon:handle_info:211] Compactor <0.10399.2> exited unexpectedly: {timeout, {gen_server,call, ['capi_set_view_manager-default', fetch_ddocs]}}. Moving to the next bucket. [error_logger:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:error_logger:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: compaction_daemon: spawn_bucket_compactor/3-fun-2 /0 pid: <0.10399.2> registered_name: [] exception exit: {timeout, {gen_server,call, ['capi_set_view_manager-default',fetch_ddocs] }} in function gen_server:call/2 in call from compaction_daemon:ddoc_names/1 in call from compaction_daemon:' spawn_bucket_compactor/3-fun-2 '/4 ancestors: [compaction_daemon,<0.431.0>,ns_server_sup, ns_server_cluster_sup,<0.59.0>] messages: [] links: [<0.438.0>] dictionary: [] trap_exit: false status: running heap_size: 6765 stack_size: 24 reductions: 174901 neighbours:
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Jin, that doesn't cause the data loss (either persistence or memory) issues.
        I'll put my hands on fire if it does.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Jin, that doesn't cause the data loss (either persistence or memory) issues. I'll put my hands on fire if it does.
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        And btw, this is not the compactor, it's just the compaction scheduler. This failure happened before compaction being triggered - even if it failing during compaction, when compaction is restarted it never reuses an existing partially compacted file - it truncates the existing compact file (if any) to 0 bytes - this applies both to database and view compaction.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - And btw, this is not the compactor, it's just the compaction scheduler. This failure happened before compaction being triggered - even if it failing during compaction, when compaction is restarted it never reuses an existing partially compacted file - it truncates the existing compact file (if any) to 0 bytes - this applies both to database and view compaction.
        Hide
        jin Jin Lim added a comment -

        Thanks for the clarification, Filipe.

        Show
        jin Jin Lim added a comment - Thanks for the clarification, Filipe.
        Hide
        tommie Tommie McAfee added a comment -

        Using a script from Jin that looks for the document in couchdb we found that the missing key exists in disk on a node different than the one reporting "ram_but_not_disk". Something may have happened after rebalance to cause this. Below you can see two nodes are returning meta_data for the key (probably the original and new node). However, the first node has state "item_deleted"

        ./get_key_meta.py

        10.2.2.108

        {'key_vb_state': 'replica', 'key_last_modification_time': '1339788618', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'}

        trying diff memcached....
        trying diff memcached....
        trying diff memcached....
        trying diff memcached....
        10.2.2.63

        {'key_vb_state': 'active', 'key_last_modification_time': '1339788619', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'ram_but_not_disk'}

        -------COUCH SEARCH ON 10.2.2.108--------
        [root@localhost default]# ./finddoc.pl arch0107-2008_11_14
        Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28.
        Useless use of a constant in void context at ./finddoc.pl line 28.
        --------------------------------------------------
        finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while...
        --------------------------------------------------
        skipping master doc, master.couch.1
        found doc, id: arch0107-2008_11_14, in 56.couch.1
        [root@localhost default]#

        -------COUCH SEARCH ON 10.2.2.63--------
        [root@localhost default]# ./finddoc.pl arch0107-2008_11_14
        Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28.
        Useless use of a constant in void context at ./finddoc.pl line 28.
        --------------------------------------------------
        finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while...
        --------------------------------------------------
        skipping master doc, master.couch.1
        [root@localhost default]#

        Show
        tommie Tommie McAfee added a comment - Using a script from Jin that looks for the document in couchdb we found that the missing key exists in disk on a node different than the one reporting "ram_but_not_disk". Something may have happened after rebalance to cause this. Below you can see two nodes are returning meta_data for the key (probably the original and new node). However, the first node has state "item_deleted" ./get_key_meta.py 10.2.2.108 {'key_vb_state': 'replica', 'key_last_modification_time': '1339788618', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'} trying diff memcached.... trying diff memcached.... trying diff memcached.... trying diff memcached.... 10.2.2.63 {'key_vb_state': 'active', 'key_last_modification_time': '1339788619', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'ram_but_not_disk'} ------- COUCH SEARCH ON 10.2.2.108 -------- [root@localhost default] # ./finddoc.pl arch0107-2008_11_14 Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28. Useless use of a constant in void context at ./finddoc.pl line 28. -------------------------------------------------- finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while... -------------------------------------------------- skipping master doc, master.couch.1 found doc, id: arch0107-2008_11_14, in 56.couch.1 [root@localhost default] # ------- COUCH SEARCH ON 10.2.2.63 -------- [root@localhost default] # ./finddoc.pl arch0107-2008_11_14 Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28. Useless use of a constant in void context at ./finddoc.pl line 28. -------------------------------------------------- finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while... -------------------------------------------------- skipping master doc, master.couch.1 [root@localhost default] #
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Tommie what's the vbucket ID?

        As a side questions, where do those scripts live?

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Tommie what's the vbucket ID? As a side questions, where do those scripts live?
        Hide
        tommie Tommie McAfee added a comment -

        >Tommie what's the vbucket ID?
        56

        >where do those scripts live?
        They were local. I will attach them here

        Show
        tommie Tommie McAfee added a comment - >Tommie what's the vbucket ID? 56 >where do those scripts live? They were local. I will attach them here
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Thanks.

        So it seems really like a persistence issue in the first place.
        The key should be persisted in node 10.2.2.63, but it is not.

        Hard to tell when exactly the failing queries happened, but I see that at the end of the logs for those 2 nodes the vbucket 56 is not marked as active in the indexes.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Thanks. So it seems really like a persistence issue in the first place. The key should be persisted in node 10.2.2.63, but it is not. Hard to tell when exactly the failing queries happened, but I see that at the end of the logs for those 2 nodes the vbucket 56 is not marked as active in the indexes.
        Hide
        sharon Sharon Barr (Inactive) added a comment -

        Looks like a blocker rather then a major issue

        Show
        sharon Sharon Barr (Inactive) added a comment - Looks like a blocker rather then a major issue
        Hide
        tommie Tommie McAfee added a comment -

        More debugging tips from Filipe…. discovered update_seq = 0 for db(56) which corresponds to this key on active node. This db also reports "no documents" so no writes have happened here:

        /opt/couchbase/bin/couch_dbinfo 56.couch.1
        DB Info (56.couch.1)
        file format version: 10
        update_seq: 0
        no documents
        B-tree size: 0 bytes
        total disk size: 4.0 kB

        Information also reflected in index info:
        curl -s 'http://10.2.2.63:8092/_set_view/default/_design/dev_test_view-6ffa498/_info' | json_xs
        ….

        "active_partitions" : [
        56,
        57,
        58,
        59,
        60,
        61,
        62,
        63,
        64,
        65,
        66,
        67,
        68,
        69,
        70,
        71,
        72,
        73
        ],
        "pending_transition" : null,
        "update_seqs" :

        { "67" : 1560, "63" : 1571, "71" : 1585, "70" : 1581, "68" : 1558, "72" : 1585, "65" : 1580, "57" : 1573, "64" : 1581, "61" : 1584, "58" : 1585, "59" : 1590, "69" : 1563, "60" : 1125, "56" : 0, "73" : 1581, "66" : 1562, "62" : 1572 }

        ,

        Show
        tommie Tommie McAfee added a comment - More debugging tips from Filipe…. discovered update_seq = 0 for db(56) which corresponds to this key on active node. This db also reports "no documents" so no writes have happened here: /opt/couchbase/bin/couch_dbinfo 56.couch.1 DB Info (56.couch.1) file format version: 10 update_seq: 0 no documents B-tree size: 0 bytes total disk size: 4.0 kB Information also reflected in index info: curl -s 'http://10.2.2.63:8092/_set_view/default/_design/dev_test_view-6ffa498/_info' | json_xs …. "active_partitions" : [ 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73 ], "pending_transition" : null, "update_seqs" : { "67" : 1560, "63" : 1571, "71" : 1585, "70" : 1581, "68" : 1558, "72" : 1585, "65" : 1580, "57" : 1573, "64" : 1581, "61" : 1584, "58" : 1585, "59" : 1590, "69" : 1563, "60" : 1125, "56" : 0, "73" : 1581, "66" : 1562, "62" : 1572 } ,
        Hide
        tommie Tommie McAfee added a comment -

        Here's the test runner command to reproduce:
        python testrunner -i resources/jenkins/centos-64-7node-viewquery.ini -t viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in

        I've disabled the job in jenkins so you can run this exact command and it will run the test against our servers if you cannot reproduce in your own vm's.

        Some final observations in manually trying to reproduce this is that heavy queries causes disk write queue to drain slower, and when items are still in queue before rebalance(in) then we can get into this state.

        Let me know if anything else is needed.

        thanks,

        Show
        tommie Tommie McAfee added a comment - Here's the test runner command to reproduce: python testrunner -i resources/jenkins/centos-64-7node-viewquery.ini -t viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in I've disabled the job in jenkins so you can run this exact command and it will run the test against our servers if you cannot reproduce in your own vm's. Some final observations in manually trying to reproduce this is that heavy queries causes disk write queue to drain slower, and when items are still in queue before rebalance(in) then we can get into this state. Let me know if anything else is needed. thanks,
        Hide
        jin Jin Lim added a comment - - edited

        The test now passes without any failure after locally build ep engine with the fix, http://review.couchbase.org/#change,17367.

        Note: the number of items being persisted while rebalancing still fluctuates for vbuckets movements, which is perfectly normal under the current design. After the completion of rebalancing one should expect NO data loss at all.

        Show
        jin Jin Lim added a comment - - edited The test now passes without any failure after locally build ep engine with the fix, http://review.couchbase.org/#change,17367 . Note: the number of items being persisted while rebalancing still fluctuates for vbuckets movements, which is perfectly normal under the current design. After the completion of rebalancing one should expect NO data loss at all.
        Show
        jin Jin Lim added a comment - http://review.couchbase.org/#change,17367
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Super, thanks Jin!

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Super, thanks Jin!

          People

          • Assignee:
            jin Jin Lim
            Reporter:
            tommie Tommie McAfee
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes