Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31258

[FTS - System test] Rebalance after failover fails with err: "nodes: sample, res: (*http.Response)(nil), urlUUID: monitor.UrlUUID{Url:\"http://172.23.96.219:8094\", UUID:\"a930699699909ebf12c8fd01d5d4e574\"}"

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.0.0
    • 6.5.0
    • fts

    Description

      Build
      6.0.0-1614

      Testcase
      ./sequoia -scope tests/fts/scope_component_fts.yml -test tests/fts/test_fts_alice_component.yml -provider file:centos_second_cluster.yml @ scale =1

      Steps:
      1. Create a single node kv+fts cluster
      2. Create 2 buckets on it, load 10M docs
      3. Create 2 default indexes - scorch and up-side down on the cluster
      4. While indexing is on,
      add kv, kv+fts, fts, kv+fts nodes.
      rebalance - goes through fine.
      5. Now create 2 more indexes - scorch and upside_down with custom mapping
      6. Add - fts, .kv+fts, kv
      Remove 2 nodes added in step 4. Rebalance.
      7. We then failover .78 and then rebalance.

      Rebalance exited with reason {service_rebalance_failed,fts,
      {rebalance_failed,
      {service_error,
      <<"nodes: sample, res: (*http.Response)(nil), urlUUID: monitor.UrlUUID{Url:\"http://172.23.96.219:8094\", UUID:\"a930699699909ebf12c8fd01d5d4e574\"}, kind: /api/stats, err: Get http://%40fts-cbauth:127688ce084d0544fa5ca3db80102158@172.23.96.219:8094/api/stats: EOF">>}}}
      ns_orchestrator 000
      ns_1@172.23.96.219
      2:55:45 AM   Wed Sep 12, 2018
      Bucket "default" rebalance does not seem to be swap rebalance
      ns_vbucket_mover 000
      ns_1@172.23.96.219
      2:39:23 AM   Wed Sep 12, 2018
      Bucket "default" loaded on node 'ns_1@172.23.96.221' in 0 seconds.
      ns_memcached 000
      ns_1@172.23.96.221
      2:39:22 AM   Wed Sep 12, 2018
      Started rebalancing bucket default
      ns_rebalancer 000
      ns_1@172.23.96.219
      2:39:21 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.104.114' saw that node 'ns_1@172.23.104.78' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.104.114
      2:37:11 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.104.148' saw that node 'ns_1@172.23.104.78' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.104.148
      2:37:11 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.104.68' saw that node 'ns_1@172.23.104.78' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.104.68
      2:37:11 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.96.220' saw that node 'ns_1@172.23.104.78' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.220
      2:37:11 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.96.221' saw that node 'ns_1@172.23.104.78' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.221
      2:37:11 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.96.219' saw that node 'ns_1@172.23.104.78' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.219
      2:37:11 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.96.223' saw that node 'ns_1@172.23.104.78' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.223
      2:37:11 AM   Wed Sep 12, 2018
      Bucket "other" rebalance does not seem to be swap rebalance
      ns_vbucket_mover 000
      ns_1@172.23.96.219
      2:37:10 AM   Wed Sep 12, 2018
      Bucket "other" loaded on node 'ns_1@172.23.96.221' in 0 seconds.
      ns_memcached 000
      ns_1@172.23.96.221
      2:37:09 AM   Wed Sep 12, 2018
      Started rebalancing bucket other
      ns_rebalancer 000
      ns_1@172.23.96.219
      2:37:08 AM   Wed Sep 12, 2018
      Deleting old data files of bucket "default"
      ns_storage_conf 000
      ns_1@172.23.96.221
      2:37:08 AM   Wed Sep 12, 2018
      Deleting old data files of bucket "other"
      ns_storage_conf 000
      ns_1@172.23.96.221
      2:37:08 AM   Wed Sep 12, 2018
      Node 'ns_1@172.23.104.78' is leaving cluster.
      ns_cluster 001
      ns_1@172.23.104.78
      2:37:08 AM   Wed Sep 12, 2018
      Starting rebalance, KeepNodes = ['ns_1@172.23.104.114','ns_1@172.23.104.148',
      'ns_1@172.23.104.68','ns_1@172.23.96.219',
      'ns_1@172.23.96.220','ns_1@172.23.96.221',
      'ns_1@172.23.96.223'], EjectNodes = [], Failed over and being ejected nodes = ['ns_1@172.23.104.78']; no delta recovery nodes
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Agree Steve Yen,  I was referring to the efforts/resource wastage from a test script perspective, not from the rebalance design aspect. Badly I took the assumption that the test may just attempt few rebalances of a random set of nodes everytime ignoring the outcome of a given rebalance, as system test always focus on performing certain tasks and not the correctness of an operation, eg: query result validation.

            Aruna Piravi, If there is a retrial of the failed rebalance attempts then yes..the efforts put/rebalance progress made may not go in vain. 

             

            Sreekanth Sivasankaran Sreekanth Sivasankaran added a comment - Agree  Steve Yen ,  I was referring to the efforts/resource wastage from a test script perspective, not from the rebalance design aspect. Badly I took the assumption that the test may just attempt few rebalances of a random set of nodes everytime ignoring the outcome of a given rebalance, as system test always focus on performing certain tasks and not the correctness of an operation, eg: query result validation. Aruna Piravi , If there is a retrial of the failed rebalance attempts then yes..the efforts put/rebalance progress made may not go in vain.   

            Ok, let me clarify. We do not retry failed rebalances. Sreekanth Sivasankaran is right that we ignore the outcome of a rebalance and keep moving on to the next steps. When I mean subsequent rebalances, i mean the next steps where we failover some other node and rebalance again, remove some nodes out, rebalance etc.

            Also, when I look at https://issues.couchbase.com/browse/MB-31258, it tells me it's the same problem here with the changes Steve is talking about?

            apiravi Aruna Piravi (Inactive) added a comment - Ok, let me clarify. We do not retry failed rebalances. Sreekanth Sivasankaran is right that we ignore the outcome of a rebalance and keep moving on to the next steps. When I mean subsequent rebalances, i mean the next steps where we failover some other node and rebalance again, remove some nodes out, rebalance etc. Also, when I look at https://issues.couchbase.com/browse/MB-31258 , it tells me it's the same problem here with the changes Steve is talking about?

            Build couchbase-server-6.0.0-1695 contains cbgt commit 7c439c6 with commit message:
            MB-31258 - Rebalance fails over stats monitor errs

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.0-1695 contains cbgt commit 7c439c6 with commit message: MB-31258 - Rebalance fails over stats monitor errs

            Build couchbase-server-6.5.0-1460 contains cbgt commit 7c439c6 with commit message:
            MB-31258 - Rebalance fails over stats monitor errs

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1460 contains cbgt commit 7c439c6 with commit message: MB-31258 - Rebalance fails over stats monitor errs

            Verified on 6.5.0-3748. We dont see this particular failure in rebalance, there are others though. Will use separate bugs to track different issues. Closing this one.

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - Verified on 6.5.0-3748. We dont see this particular failure in rebalance, there are others though. Will use separate bugs to track different issues. Closing this one.

            People

              apiravi Aruna Piravi (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty