Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36338

Expelling erroneously allows deduping of SyncWrites in checkpoint

    XMLWordPrintable

Details

    • Untriaged
    • Unknown
    • KV-Engine Mad-Hatter GA

    Description

      Build : 6.5.0-4471
      Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
      Scale : 3
      Iteration : 1st

      Step:

      [2019-10-04T09:44:15-07:00, sequoiatools/couchbase-cli:6.5:6a4039] server-add -c 172.23.108.103:8091 --server-add https://172.23.106.100 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2019-10-04T09:44:38-07:00, sequoiatools/couchbase-cli:6.5:b4a767] rebalance -c 172.23.108.103:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
      docker logs b4a767
      docker start b4a767
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2019-10-04T09:55:07-07:00, sequoiatools/cmd:5a4414] 60
      [2019-10-04T09:56:16-07:00, sequoiatools/cmd:348ec6] 300
      

      Error: (diag.log in 172.23.108.103)

      2019-10-04T09:54:52.724-07:00, compaction_daemon:0:warning:message(ns_1@172.23.97.119) - Compactor for view `mapreduce_view/default/_design/scale/replica` (pid [{type,
                                                                               view},
                                                                              {important,
                                                                               true},
                                                                              {name,
                                                                               <<"mapreduce_view/default/_design/scale/replica">>},
                                                                              {fa,
                                                                               {#Fun<compaction_daemon.23.62703019>,
                                                                                [<<"default">>,
                                                                                 <<"_design/scale">>,
                                                                                 mapreduce_view,
                                                                                 replica,
                                                                                 {config,
                                                                                  {30,
                                                                                   undefined},
                                                                                  {30,
                                                                                   undefined},
                                                                                  undefined,
                                                                                  false,
                                                                                  false,
                                                                                  {daemon_config,
                                                                                   30,
                                                                                   131072,
                                                                                   20971520}},
                                                                                 true,
                                                                                 {[{type,
                                                                                    bucket}]}]}}]) terminated unexpectedly: {updater_died,
                                                                                                                             {updater_error,
                                                                                                                              timeout}}
      2019-10-04T09:54:52.930-07:00, ns_rebalancer:2:info:message(ns_1@172.23.108.103) - Bad replicators after rebalance:
      Missing = [{'ns_1@172.23.108.103','ns_1@172.23.97.119',113},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',114},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',115},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',116},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',117},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',118},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',119},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',120},
                 {'ns_1@172.23.108.103','ns_1@172.23.97.119',121}]
      Extras = []
      2019-10-04T09:54:52.933-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason bad_replicas.
      Rebalance Operation Id = 0b67a052f94895e5418ff35de8a74354
      -------------------------------
       
       
      per_node_processes('ns_1@172.23.108.103') =
           {<0.32074.207>,
            [{backtrace,
                 [<<"Program counter: 0x00007f550eb437c8 (gen_server:loop/7 + 288)">>,
                  <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
                  <<"0x00007f538ccd6128 Return addr 0x00007f55553e6ff0 (proc_lib:init_p_do_apply/3 + 72)">>,
                  <<"y(0)     []">>,<<"y(1)     infinity">>,
                  <<"y(2)     dcp_proxy">>,
                  <<"(3)     {state,#Port<0.236464>,{producer,\"replication:ns_1@172.23.106.100->ns_1@172.23.108.103:NEW_ORDER\",'ns_1@172.23.1">>,
                  <<"y(4)     <0.32074.207>">>,<<"y(5)     <0.7577.208>">>,<<>>,
                  <<"0x00007f538ccd6160 Return addr 0x0000000000942608 (<terminate process normally>)">>,
                  <<"y(0)     []">>,
                  <<"y(1)     Catch 0x00007f55553e7010 (proc_lib:init_p_do_apply/3 + 104)">>,
                  <<>>]},
             {messages,[]},
             {dictionary,
                 [{'$initial_call',{dcp_proxy,init,1}},
                  {'$ancestors',
                      ['dcp_replicator-NEW_ORDER-ns_1@172.23.106.100',
                       'dcp_sup-NEW_ORDER','single_bucket_kv_sup-NEW_ORDER',
                       ns_bucket_sup,ns_bucket_worker_sup,ns_server_sup,
                       ns_server_nodes_sup,<0.205.0>,ns_server_cluster_sup,root_sup,
                       <0.117.0>]}]},
             {registered_name,[]},
             {status,waiting},
             {initial_call,{proc_lib,init_p,5}},
             {error_handler,error_handler},
             {garbage_collection,
                 [{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                  {min_bin_vheap_size,46422},
                  {min_heap_size,233},
                  {fullsweep_after,512},
                  {minor_gcs,98}]},
             {garbage_collection_info,
                 [{old_heap_block_size,6772},
                  {heap_block_size,1598},
                  {mbuf_size,0},
                  {recent_size,24},
                  {stack_size,10},
                  {old_heap_size,2339},
                  {heap_size,445},
      
      

      logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.156.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.157.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.164.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.61.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.67.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.69.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.70.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.87.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.106.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.106.188.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.108.103.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.148.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.251.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.252.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.253.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.56.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.95.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.119.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.121.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.239.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.242.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.98.135.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.99.20.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.99.21.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.99.25.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            girish.benakappa Girish Benakappa created issue -
            girish.benakappa Girish Benakappa made changes -
            Field Original Value New Value
            Link This issue Clones MB-36260 [ MB-36260 ]
            girish.benakappa Girish Benakappa made changes -
            Assignee Varun Velamuri [ varun.velamuri ] Girish Benakappa [ girish.benakappa ]
            girish.benakappa Girish Benakappa made changes -
            Description Build : 6.5.0-4380
            Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
            Scale : 3
            Iteration : 1st

            Step:
            {code:java}
            [2019-10-01T14:58:30-07:00, sequoiatools/couchbase-cli:6.5:722f32] rebalance -c 172.23.108.103:8091 --server-remove 172.23.99.11,172.23.104.88 -u Administrator -p password


            Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 --server-remove 172.23.99.11,172.23.104.88 -u Administrator -p password]

            docker logs 722f32
            docker start 722f32

            *Unable to display progress bar on this os
            JERROR: Rebalance failed. See logs for detailed reason. You can try again.
            [2019-10-01T14:59:21-07:00, sequoiatools/cmd:cdbba4] 60
            {code}


            Error: (ns_server.error.log in 172.23.108.103)

            {code:java}
            [ns_server:error,2019-10-01T11:20:43.534-07:00,ns_1@172.23.108.103:<0.12891.32>:ns_rebalance_observer:generic_get_call:110]Unexpected exception {exit,
                                     {killed,
                                         {gen_server,call,
                                             [{via,leader_registry,ns_rebalance_observer},
                                              {get_rebalance_info,[]},
                                              10000]}}}
            [ns_server:error,2019-10-01T14:59:10.483-07:00,ns_1@172.23.108.103:service_rebalancer-index<0.31005.140>:service_rebalancer:run_rebalance_worker:125]Worker terminated abnormally: {'EXIT',<0.30948.140>,
                                           {rebalance_failed,
                                            {service_error,
                                             <<"indexer rebalance failure - index build is in progress for indexes: [default:default_claims].">>}}}
            [user:error,2019-10-01T14:59:10.487-07:00,ns_1@172.23.108.103:<0.10788.0>:ns_orchestrator:log_rebalance_completion:1466]Rebalance exited with reason {service_rebalance_failed,index,
                                          {worker_died,
                                           {'EXIT',<0.30948.140>,
                                            {rebalance_failed,
                                             {service_error,
                                              <<"indexer rebalance failure - index build is in progress for indexes: [default:default_claims].">>}}}}}.
            Rebalance Operation Id = 5c049484b167e342d4229c1bf9517fd1
            {code}
            Build : 6.5.0-4471
            Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
            Scale : 3
            Iteration : 1st

            Step:
            {code:java}
            [2019-10-04T09:44:15-07:00, sequoiatools/couchbase-cli:6.5:6a4039] server-add -c 172.23.108.103:8091 --server-add https://172.23.106.100 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
            [2019-10-04T09:44:38-07:00, sequoiatools/couchbase-cli:6.5:b4a767] rebalance -c 172.23.108.103:8091 -u Administrator -p password


            Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]

            docker logs b4a767
            docker start b4a767

            *Unable to display progress bar on this os
            JERROR: Rebalance failed. See logs for detailed reason. You can try again.
            [2019-10-04T09:55:07-07:00, sequoiatools/cmd:5a4414] 60
            [2019-10-04T09:56:16-07:00, sequoiatools/cmd:348ec6] 300
            {code}


            Error: (diag.log in 172.23.108.103)

            {code:java}
            2019-10-04T09:54:52.724-07:00, compaction_daemon:0:warning:message(ns_1@172.23.97.119) - Compactor for view `mapreduce_view/default/_design/scale/replica` (pid [{type,
                                                                                     view},
                                                                                    {important,
                                                                                     true},
                                                                                    {name,
                                                                                     <<"mapreduce_view/default/_design/scale/replica">>},
                                                                                    {fa,
                                                                                     {#Fun<compaction_daemon.23.62703019>,
                                                                                      [<<"default">>,
                                                                                       <<"_design/scale">>,
                                                                                       mapreduce_view,
                                                                                       replica,
                                                                                       {config,
                                                                                        {30,
                                                                                         undefined},
                                                                                        {30,
                                                                                         undefined},
                                                                                        undefined,
                                                                                        false,
                                                                                        false,
                                                                                        {daemon_config,
                                                                                         30,
                                                                                         131072,
                                                                                         20971520}},
                                                                                       true,
                                                                                       {[{type,
                                                                                          bucket}]}]}}]) terminated unexpectedly: {updater_died,
                                                                                                                                   {updater_error,
                                                                                                                                    timeout}}
            2019-10-04T09:54:52.930-07:00, ns_rebalancer:2:info:message(ns_1@172.23.108.103) - Bad replicators after rebalance:
            Missing = [{'ns_1@172.23.108.103','ns_1@172.23.97.119',113},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',114},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',115},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',116},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',117},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',118},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',119},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',120},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',121}]
            Extras = []
            2019-10-04T09:54:52.933-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason bad_replicas.
            Rebalance Operation Id = 0b67a052f94895e5418ff35de8a74354
            -------------------------------


            per_node_processes('ns_1@172.23.108.103') =
                 {<0.32074.207>,
                  [{backtrace,
                       [<<"Program counter: 0x00007f550eb437c8 (gen_server:loop/7 + 288)">>,
                        <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
                        <<"0x00007f538ccd6128 Return addr 0x00007f55553e6ff0 (proc_lib:init_p_do_apply/3 + 72)">>,
                        <<"y(0) []">>,<<"y(1) infinity">>,
                        <<"y(2) dcp_proxy">>,
                        <<"(3) {state,#Port<0.236464>,{producer,\"replication:ns_1@172.23.106.100->ns_1@172.23.108.103:NEW_ORDER\",'ns_1@172.23.1">>,
                        <<"y(4) <0.32074.207>">>,<<"y(5) <0.7577.208>">>,<<>>,
                        <<"0x00007f538ccd6160 Return addr 0x0000000000942608 (<terminate process normally>)">>,
                        <<"y(0) []">>,
                        <<"y(1) Catch 0x00007f55553e7010 (proc_lib:init_p_do_apply/3 + 104)">>,
                        <<>>]},
                   {messages,[]},
                   {dictionary,
                       [{'$initial_call',{dcp_proxy,init,1}},
                        {'$ancestors',
                            ['dcp_replicator-NEW_ORDER-ns_1@172.23.106.100',
                             'dcp_sup-NEW_ORDER','single_bucket_kv_sup-NEW_ORDER',
                             ns_bucket_sup,ns_bucket_worker_sup,ns_server_sup,
                             ns_server_nodes_sup,<0.205.0>,ns_server_cluster_sup,root_sup,
                             <0.117.0>]}]},
                   {registered_name,[]},
                   {status,waiting},
                   {initial_call,{proc_lib,init_p,5}},
                   {error_handler,error_handler},
                   {garbage_collection,
                       [{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,512},
                        {minor_gcs,98}]},
                   {garbage_collection_info,
                       [{old_heap_block_size,6772},
                        {heap_block_size,1598},
                        {mbuf_size,0},
                        {recent_size,24},
                        {stack_size,10},
                        {old_heap_size,2339},
                        {heap_size,445},

            {code}
            girish.benakappa Girish Benakappa made changes -
            Link This issue Clones MB-36260 [ MB-36260 ]
            girish.benakappa Girish Benakappa made changes -
            Description Build : 6.5.0-4471
            Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
            Scale : 3
            Iteration : 1st

            Step:
            {code:java}
            [2019-10-04T09:44:15-07:00, sequoiatools/couchbase-cli:6.5:6a4039] server-add -c 172.23.108.103:8091 --server-add https://172.23.106.100 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
            [2019-10-04T09:44:38-07:00, sequoiatools/couchbase-cli:6.5:b4a767] rebalance -c 172.23.108.103:8091 -u Administrator -p password


            Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]

            docker logs b4a767
            docker start b4a767

            *Unable to display progress bar on this os
            JERROR: Rebalance failed. See logs for detailed reason. You can try again.
            [2019-10-04T09:55:07-07:00, sequoiatools/cmd:5a4414] 60
            [2019-10-04T09:56:16-07:00, sequoiatools/cmd:348ec6] 300
            {code}


            Error: (diag.log in 172.23.108.103)

            {code:java}
            2019-10-04T09:54:52.724-07:00, compaction_daemon:0:warning:message(ns_1@172.23.97.119) - Compactor for view `mapreduce_view/default/_design/scale/replica` (pid [{type,
                                                                                     view},
                                                                                    {important,
                                                                                     true},
                                                                                    {name,
                                                                                     <<"mapreduce_view/default/_design/scale/replica">>},
                                                                                    {fa,
                                                                                     {#Fun<compaction_daemon.23.62703019>,
                                                                                      [<<"default">>,
                                                                                       <<"_design/scale">>,
                                                                                       mapreduce_view,
                                                                                       replica,
                                                                                       {config,
                                                                                        {30,
                                                                                         undefined},
                                                                                        {30,
                                                                                         undefined},
                                                                                        undefined,
                                                                                        false,
                                                                                        false,
                                                                                        {daemon_config,
                                                                                         30,
                                                                                         131072,
                                                                                         20971520}},
                                                                                       true,
                                                                                       {[{type,
                                                                                          bucket}]}]}}]) terminated unexpectedly: {updater_died,
                                                                                                                                   {updater_error,
                                                                                                                                    timeout}}
            2019-10-04T09:54:52.930-07:00, ns_rebalancer:2:info:message(ns_1@172.23.108.103) - Bad replicators after rebalance:
            Missing = [{'ns_1@172.23.108.103','ns_1@172.23.97.119',113},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',114},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',115},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',116},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',117},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',118},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',119},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',120},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',121}]
            Extras = []
            2019-10-04T09:54:52.933-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason bad_replicas.
            Rebalance Operation Id = 0b67a052f94895e5418ff35de8a74354
            -------------------------------


            per_node_processes('ns_1@172.23.108.103') =
                 {<0.32074.207>,
                  [{backtrace,
                       [<<"Program counter: 0x00007f550eb437c8 (gen_server:loop/7 + 288)">>,
                        <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
                        <<"0x00007f538ccd6128 Return addr 0x00007f55553e6ff0 (proc_lib:init_p_do_apply/3 + 72)">>,
                        <<"y(0) []">>,<<"y(1) infinity">>,
                        <<"y(2) dcp_proxy">>,
                        <<"(3) {state,#Port<0.236464>,{producer,\"replication:ns_1@172.23.106.100->ns_1@172.23.108.103:NEW_ORDER\",'ns_1@172.23.1">>,
                        <<"y(4) <0.32074.207>">>,<<"y(5) <0.7577.208>">>,<<>>,
                        <<"0x00007f538ccd6160 Return addr 0x0000000000942608 (<terminate process normally>)">>,
                        <<"y(0) []">>,
                        <<"y(1) Catch 0x00007f55553e7010 (proc_lib:init_p_do_apply/3 + 104)">>,
                        <<>>]},
                   {messages,[]},
                   {dictionary,
                       [{'$initial_call',{dcp_proxy,init,1}},
                        {'$ancestors',
                            ['dcp_replicator-NEW_ORDER-ns_1@172.23.106.100',
                             'dcp_sup-NEW_ORDER','single_bucket_kv_sup-NEW_ORDER',
                             ns_bucket_sup,ns_bucket_worker_sup,ns_server_sup,
                             ns_server_nodes_sup,<0.205.0>,ns_server_cluster_sup,root_sup,
                             <0.117.0>]}]},
                   {registered_name,[]},
                   {status,waiting},
                   {initial_call,{proc_lib,init_p,5}},
                   {error_handler,error_handler},
                   {garbage_collection,
                       [{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,512},
                        {minor_gcs,98}]},
                   {garbage_collection_info,
                       [{old_heap_block_size,6772},
                        {heap_block_size,1598},
                        {mbuf_size,0},
                        {recent_size,24},
                        {stack_size,10},
                        {old_heap_size,2339},
                        {heap_size,445},

            {code}
            Build : 6.5.0-4471
            Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
            Scale : 3
            Iteration : 1st

            Step:
            {code:java}
            [2019-10-04T09:44:15-07:00, sequoiatools/couchbase-cli:6.5:6a4039] server-add -c 172.23.108.103:8091 --server-add https://172.23.106.100 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
            [2019-10-04T09:44:38-07:00, sequoiatools/couchbase-cli:6.5:b4a767] rebalance -c 172.23.108.103:8091 -u Administrator -p password


            Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]

            docker logs b4a767
            docker start b4a767

            *Unable to display progress bar on this os
            JERROR: Rebalance failed. See logs for detailed reason. You can try again.
            [2019-10-04T09:55:07-07:00, sequoiatools/cmd:5a4414] 60
            [2019-10-04T09:56:16-07:00, sequoiatools/cmd:348ec6] 300
            {code}


            Error: (diag.log in 172.23.108.103)

            {code:java}
            2019-10-04T09:54:52.724-07:00, compaction_daemon:0:warning:message(ns_1@172.23.97.119) - Compactor for view `mapreduce_view/default/_design/scale/replica` (pid [{type,
                                                                                     view},
                                                                                    {important,
                                                                                     true},
                                                                                    {name,
                                                                                     <<"mapreduce_view/default/_design/scale/replica">>},
                                                                                    {fa,
                                                                                     {#Fun<compaction_daemon.23.62703019>,
                                                                                      [<<"default">>,
                                                                                       <<"_design/scale">>,
                                                                                       mapreduce_view,
                                                                                       replica,
                                                                                       {config,
                                                                                        {30,
                                                                                         undefined},
                                                                                        {30,
                                                                                         undefined},
                                                                                        undefined,
                                                                                        false,
                                                                                        false,
                                                                                        {daemon_config,
                                                                                         30,
                                                                                         131072,
                                                                                         20971520}},
                                                                                       true,
                                                                                       {[{type,
                                                                                          bucket}]}]}}]) terminated unexpectedly: {updater_died,
                                                                                                                                   {updater_error,
                                                                                                                                    timeout}}
            2019-10-04T09:54:52.930-07:00, ns_rebalancer:2:info:message(ns_1@172.23.108.103) - Bad replicators after rebalance:
            Missing = [{'ns_1@172.23.108.103','ns_1@172.23.97.119',113},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',114},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',115},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',116},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',117},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',118},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',119},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',120},
                       {'ns_1@172.23.108.103','ns_1@172.23.97.119',121}]
            Extras = []
            2019-10-04T09:54:52.933-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason bad_replicas.
            Rebalance Operation Id = 0b67a052f94895e5418ff35de8a74354
            -------------------------------


            per_node_processes('ns_1@172.23.108.103') =
                 {<0.32074.207>,
                  [{backtrace,
                       [<<"Program counter: 0x00007f550eb437c8 (gen_server:loop/7 + 288)">>,
                        <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,<<>>,
                        <<"0x00007f538ccd6128 Return addr 0x00007f55553e6ff0 (proc_lib:init_p_do_apply/3 + 72)">>,
                        <<"y(0) []">>,<<"y(1) infinity">>,
                        <<"y(2) dcp_proxy">>,
                        <<"(3) {state,#Port<0.236464>,{producer,\"replication:ns_1@172.23.106.100->ns_1@172.23.108.103:NEW_ORDER\",'ns_1@172.23.1">>,
                        <<"y(4) <0.32074.207>">>,<<"y(5) <0.7577.208>">>,<<>>,
                        <<"0x00007f538ccd6160 Return addr 0x0000000000942608 (<terminate process normally>)">>,
                        <<"y(0) []">>,
                        <<"y(1) Catch 0x00007f55553e7010 (proc_lib:init_p_do_apply/3 + 104)">>,
                        <<>>]},
                   {messages,[]},
                   {dictionary,
                       [{'$initial_call',{dcp_proxy,init,1}},
                        {'$ancestors',
                            ['dcp_replicator-NEW_ORDER-ns_1@172.23.106.100',
                             'dcp_sup-NEW_ORDER','single_bucket_kv_sup-NEW_ORDER',
                             ns_bucket_sup,ns_bucket_worker_sup,ns_server_sup,
                             ns_server_nodes_sup,<0.205.0>,ns_server_cluster_sup,root_sup,
                             <0.117.0>]}]},
                   {registered_name,[]},
                   {status,waiting},
                   {initial_call,{proc_lib,init_p,5}},
                   {error_handler,error_handler},
                   {garbage_collection,
                       [{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,512},
                        {minor_gcs,98}]},
                   {garbage_collection_info,
                       [{old_heap_block_size,6772},
                        {heap_block_size,1598},
                        {mbuf_size,0},
                        {recent_size,24},
                        {stack_size,10},
                        {old_heap_size,2339},
                        {heap_size,445},

            {code}

            logs:

            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.156.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.157.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.164.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.61.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.67.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.69.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.70.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.104.87.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.106.100.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.106.188.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.108.103.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.148.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.251.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.252.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.253.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.56.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.96.95.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.119.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.121.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.122.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.239.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.97.242.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.98.135.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.99.20.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.99.21.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1570208407/collectinfo-2019-10-04T170009-ns_1%40172.23.99.25.zip
            girish.benakappa Girish Benakappa made changes -
            Labels system-test eagle-eye system-test
            girish.benakappa Girish Benakappa made changes -
            Assignee Girish Benakappa [ girish.benakappa ] Jeelan Poola [ jeelan.poola ]
            girish.benakappa Girish Benakappa made changes -
            Component/s ns_server [ 10019 ]
            Component/s secondary-index [ 11211 ]
            girish.benakappa Girish Benakappa made changes -
            Assignee Jeelan Poola [ jeelan.poola ] Ajit Yagaty [ ajit.yagaty ]
            girish.benakappa Girish Benakappa made changes -
            Link This issue is cloned by MB-36340 [ MB-36340 ]
            girish.benakappa Girish Benakappa made changes -
            Link This issue is cloned by MB-36340 [ MB-36340 ]
            girish.benakappa Girish Benakappa made changes -
            Link This issue is cloned by MB-36341 [ MB-36341 ]
            girish.benakappa Girish Benakappa made changes -
            Link This issue is cloned by MB-36341 [ MB-36341 ]

            Memcached repeatedly drops replication connection on us on node 172.23.97.119:

            [ns_server:error,2019-10-04T09:49:44.542-07:00,ns_1@172.23.97.119:<0.31735.117>:dcp_proxy:handle_info:117]Socket #Port<0.216600> was closed. Closing myself. State = {state,
                                                                        #Port<0.216600>,
                                                                        {consumer,
                                                                         "replication:ns_1@172.23.108.103->ns_1@172.23.97.119:default",
                                                                         'ns_1@172.23.97.119',
                                                                         "default"},
                                                                        undefined,<<>>,
                                                                        dcp_consumer_conn,
                                                                        {state,idle,
                                                                         "789qrstuvwxy"},
                                                                        #Port<0.216601>,
                                                                        <0.31738.117>,
                                                                        true}
            [error_logger:error,2019-10-04T09:49:44.542-07:00,ns_1@172.23.97.119:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]** Generic server <0.31735.117> terminating 
            ** Last message in was {tcp_closed,#Port<0.216600>}
            ** When Server state == {state,#Port<0.216600>,
                                        {consumer,
                                            "replication:ns_1@172.23.108.103->ns_1@172.23.97.119:default",
                                            'ns_1@172.23.97.119',"default"},
                                        undefined,<<>>,dcp_consumer_conn,
                                        {state,idle,"789qrstuvwxy"},
                                        #Port<0.216601>,<0.31738.117>,true}
            ** Reason for termination == 
            ** socket_closed
            

            There are lots of errors of the following kind in memcached log:

            2019-10-04T09:49:44.383606-07:00 ERROR 427: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:48279 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007ff0d557e310","ewouldblock":false,"packet":{"bodylen":41,"cas":1570207457743667200,"datatype":"raw","extlen":21,"key":"<ud>.9A95B082-102_634200</ud>","keylen":20,"magic":"ClientRequest","opaque":7,"opcode":"DCP_DELETION","vbucket":114},"refcount":1}] - closing connection ([ 127.0.0.1:48279 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): PassiveDurabilityMonitor::completeSyncWrite vb:114 No tracked, but received commit for key <ud>cid:0x0:9A95B082-102_634200</ud>
            

            Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - Memcached repeatedly drops replication connection on us on node 172.23.97.119: [ns_server:error,2019-10-04T09:49:44.542-07:00,ns_1@172.23.97.119:<0.31735.117>:dcp_proxy:handle_info:117]Socket #Port<0.216600> was closed. Closing myself. State = {state, #Port<0.216600>, {consumer, "replication:ns_1@172.23.108.103->ns_1@172.23.97.119:default", 'ns_1@172.23.97.119', "default"}, undefined,<<>>, dcp_consumer_conn, {state,idle, "789qrstuvwxy"}, #Port<0.216601>, <0.31738.117>, true} [error_logger:error,2019-10-04T09:49:44.542-07:00,ns_1@172.23.97.119:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]** Generic server <0.31735.117> terminating ** Last message in was {tcp_closed,#Port<0.216600>} ** When Server state == {state,#Port<0.216600>, {consumer, "replication:ns_1@172.23.108.103->ns_1@172.23.97.119:default", 'ns_1@172.23.97.119',"default"}, undefined,<<>>,dcp_consumer_conn, {state,idle,"789qrstuvwxy"}, #Port<0.216601>,<0.31738.117>,true} ** Reason for termination == ** socket_closed There are lots of errors of the following kind in memcached log: 2019-10-04T09:49:44.383606-07:00 ERROR 427: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:48279 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007ff0d557e310","ewouldblock":false,"packet":{"bodylen":41,"cas":1570207457743667200,"datatype":"raw","extlen":21,"key":"<ud>.9A95B082-102_634200</ud>","keylen":20,"magic":"ClientRequest","opaque":7,"opcode":"DCP_DELETION","vbucket":114},"refcount":1}] - closing connection ([ 127.0.0.1:48279 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): PassiveDurabilityMonitor::completeSyncWrite vb:114 No tracked, but received commit for key <ud>cid:0x0:9A95B082-102_634200</ud>
            Aliaksey Artamonau Aliaksey Artamonau (Inactive) made changes -
            Component/s couchbase-bucket [ 10173 ]
            Component/s ns_server [ 10019 ]
            Aliaksey Artamonau Aliaksey Artamonau (Inactive) made changes -
            Assignee Ajit Yagaty [ ajit.yagaty ] Daniel Owen [ owend ]
            owend Daniel Owen made changes -
            Assignee Daniel Owen [ owend ] James Harrison [ james.harrison ]
            james.harrison James Harrison made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            Caused by checkpoint expelling.
            As of http://review.couchbase.org/#/c/115757/ (in order to fix another issue) entries in the checkpoint keyIndex are invalidated rather than removed, and flagged tagged with whether the item was a SyncWrite or not, to allow us to avoid deduping SyncWrites after the queued_item has been freed. However, expelling splices items out from the start of the list to just before the earliest cursor, then swaps the value of the dummy item with the value of the item the cursor points to.
            This was (as I understand) to allow that last item to be expelled, without invalidating the cursor iterators while also ensuring the checkpoint is still started with a dummy item as required.

            When invalidating the keyIndex entry, the entry's iterator is dereferenced, expecting to find the correct item (residing in the list the items were splice out into; splicing does not invalidate iterators) to check if the contained value is a syncWrite. In the failing case seen in this ticket, the last item to be expelled is a sync write. As noted above, this element of the list was not spliced out, but instead had its value swapped with that of the dummy item (which was spliced out). The dummy value is decidedly not a SyncWrite.

            The checkpoint then later allows a duplicate syncWrite to be (erroneously) queued, as the keyIndex indicates the existing item was not a SyncWrite. Eventually this is streamed to a replica which has not performed expelling, and so correctly identifies that the second SyncWrite should not appear in the same checkpoint.

            Fix in progress.

            james.harrison James Harrison added a comment - Caused by checkpoint expelling. As of http://review.couchbase.org/#/c/115757/ (in order to fix another issue) entries in the checkpoint keyIndex are invalidated rather than removed, and flagged tagged with whether the item was a SyncWrite or not, to allow us to avoid deduping SyncWrites after the queued_item has been freed. However , expelling splices items out from the start of the list to just before the earliest cursor, then swaps the value of the dummy item with the value of the item the cursor points to. This was (as I understand) to allow that last item to be expelled, without invalidating the cursor iterators while also ensuring the checkpoint is still started with a dummy item as required. When invalidating the keyIndex entry, the entry's iterator is dereferenced, expecting to find the correct item (residing in the list the items were splice out into; splicing does not invalidate iterators) to check if the contained value is a syncWrite. In the failing case seen in this ticket, the last item to be expelled is a sync write. As noted above, this element of the list was not spliced out, but instead had its value swapped with that of the dummy item (which was spliced out). The dummy value is decidedly not a SyncWrite. The checkpoint then later allows a duplicate syncWrite to be (erroneously) queued, as the keyIndex indicates the existing item was not a SyncWrite. Eventually this is streamed to a replica which has not performed expelling, and so correctly identifies that the second SyncWrite should not appear in the same checkpoint. Fix in progress.
            ritam.sharma Ritam Sharma made changes -
            Labels eagle-eye system-test durability eagle-eye system-test
            drigby Dave Rigby made changes -
            Sprint KV-Engine Mad-Hatter GA [ 910 ]
            drigby Dave Rigby made changes -
            Rank Ranked higher
            james.harrison James Harrison made changes -
            Summary [System Test] Rebalance exited with reason bad_replicas Expelling erroneously allows deduping of SyncWrites in checkpoint
            lynn.straus Lynn Straus made changes -
            Labels durability eagle-eye system-test approved-for-mad-hatter durability eagle-eye system-test
            drigby Dave Rigby made changes -
            Assignee James Harrison [ james.harrison ] Girish Benakappa [ girish.benakappa ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]

            Build couchbase-server-6.5.0-4625 contains kv_engine commit aa36b57 with commit message:
            MB-36338: Ensure keyIndex entries are invalidated correctly

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4625 contains kv_engine commit aa36b57 with commit message: MB-36338 : Ensure keyIndex entries are invalidated correctly

            Build couchbase-server-7.0.0-1017 contains kv_engine commit aa36b57 with commit message:
            MB-36338: Ensure keyIndex entries are invalidated correctly

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1017 contains kv_engine commit aa36b57 with commit message: MB-36338 : Ensure keyIndex entries are invalidated correctly

            Build couchbase-server-6.5.0-4644 contains kv_engine commit 9004066 with commit message:
            MB-36338: deinline Item::isAnySyncWriteOp()

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4644 contains kv_engine commit 9004066 with commit message: MB-36338 : deinline Item::isAnySyncWriteOp()

            Build couchbase-server-7.0.0-1018 contains kv_engine commit 9004066 with commit message:
            MB-36338: deinline Item::isAnySyncWriteOp()

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1018 contains kv_engine commit 9004066 with commit message: MB-36338 : deinline Item::isAnySyncWriteOp()
            ben.huddleston Ben Huddleston made changes -
            Link This issue is duplicated by MB-36584 [ MB-36584 ]
            drigby Dave Rigby made changes -
            Epic Link MB-30048 [ 86039 ]
            ritam.sharma Ritam Sharma made changes -
            VERIFICATION STEPS 6.5.0-4908 - Closing on longevity test run for 6.5.0-4908 for 4 days. Following issues are not observed.
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              girish.benakappa Girish Benakappa
              girish.benakappa Girish Benakappa
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty