Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59048

Stuck rebalance? (failed migrate_storage_mode_via_failover test)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • 7.6.0
    • 7.6.0
    • ns_server
    • None
    • Untriaged
    • 0
    • Unknown

    Description

      Looks like the test failed because rebalance wait timed out.

      https://cv.jenkins.couchbase.com/job/ns-server-cluster-tests/5554/console

      14:58:33 Starting testset[26/34]: BucketMigrationTest/edition=Enterprise,min_num_nodes=4,num_connected=2,min_memsize=2048...
      14:58:33   BucketMigrationTest.migrate_storage_mode_test...                            ok [2.2s]
      14:58:36   BucketMigrationTest.migrate_storage_mode_via_failover_test...           failed [606s]
      15:08:41     AssertionError: Expected final rebalance status: None
      15:08:41 Found: timeout
      15:08:41 ================== BucketMigrationTest.migrate_storage_mode_via_failover_test output begin =================
      15:08:41 sending POST http://127.0.0.1:9000/pools/default/buckets {'data': {'name': 'bucket-2', 'storageBackend': 'couchstore', 'ramQuotaMB': '1024'}, 'timeout': 60} (expected code 202)
      15:08:41 result: 202
      15:08:41 sending GET http://127.0.0.1:9000/pools/default/buckets/bucket-2 {'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9000/pools/default/buckets/bucket-2 {'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending POST http://127.0.0.1:9000/pools/default/buckets/bucket-2 {'data': {'name': 'bucket-2', 'storageBackend': 'magma', 'ramQuotaMB': '1024'}, 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9000/pools/default/buckets/bucket-2 {'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9000/nodeStatuses {'timeout': 60} (expected code None)
      15:08:41 result: 200
      15:08:41 sending POST http://127.0.0.1:9001/controller/startFailover {'data': {'user': 'Administrator', 'password': 'asdasd', 'otpNode': 'n_0@127.0.0.1', 'allowUnsafe': 'false'}, 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9001/nodeStatuses {'timeout': 60} (expected code None)
      15:08:41 result: 200
      15:08:41 sending POST http://127.0.0.1:9001/controller/setRecoveryType {'data': {'user': 'Administrator', 'password': 'asdasd', 'otpNode': 'n_0@127.0.0.1', 'recoveryType': 'full'}, 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9001/nodeStatuses {'timeout': 60} (expected code None)
      15:08:41 result: 200
      15:08:41 sending POST http://127.0.0.1:9001/controller/rebalance {'data': {'knownNodes': 'n_0@127.0.0.1,n_1@127.0.0.1', 'ejectedNodes': ''}, 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9001/pools/default {'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 Got nodes: ['127.0.0.1:9000', '127.0.0.1:9001']
      15:08:41 sending POST http://127.0.0.1:9000/diag/eval {'data': 'ns_memcached:get_config_stats("bucket-2", <<"ep_backend">>).', 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9001/pools/default/buckets/bucket-2 {'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9001/nodeStatuses {'timeout': 60} (expected code None)
      15:08:41 result: 200
      15:08:41 sending POST http://127.0.0.1:9001/controller/startFailover {'data': {'user': 'Administrator', 'password': 'asdasd', 'otpNode': 'n_0@127.0.0.1', 'allowUnsafe': 'false'}, 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9001/nodeStatuses {'timeout': 60} (expected code None)
      15:08:41 result: 200
      15:08:41 sending POST http://127.0.0.1:9001/controller/setRecoveryType {'data': {'user': 'Administrator', 'password': 'asdasd', 'otpNode': 'n_0@127.0.0.1', 'recoveryType': 'full'}, 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 sending GET http://127.0.0.1:9001/nodeStatuses {'timeout': 60} (expected code None)
      15:08:41 result: 200
      15:08:41 sending POST http://127.0.0.1:9001/controller/rebalance {'data': {'knownNodes': 'n_0@127.0.0.1,n_1@127.0.0.1', 'ejectedNodes': ''}, 'timeout': 60} (expected code 200)
      15:08:41 result: 200
      15:08:41 =================== BucketMigrationTest.migrate_storage_mode_via_failover_test output end ==================
      15:08:41 
      15:08:41 Traceback with variables (most recent call last):
      15:08:41   File "/home/couchbase/jenkins/workspace/ns-server-cluster-tests/ns_server/cluster_tests/testlib/testlib.py", line 170, in safe_test_function_call
      15:08:41     res = apply_with_seed(testset, testfunction, args, seed)
      15:08:41       testset = <testsets.bucket_migration_test.BucketMigrationTest object at 0x7fef3ed21150>
      15:08:41       testfunction = 'migrate_storage_mode_via_failover_test'
      15:08:41       args = []
      15:08:41       testiter = 0
      15:08:41       verbose = True
      15:08:41       intercept_output = True
      15:08:41       seed = b'\x11\xc7\xc1\x14M\x13\xc4\xf1eR\r\x02\x8b\x84\x8a\xcd'
      15:08:41       dry_run = False
      15:08:41       res = None
      15:08:41       error = None
      15:08:41       iter_str = ''
      15:08:41       testname = 'BucketMigrationTest.migrate_storage_mode_via_failover_test'
      15:08:41       report_call = <contextlib._GeneratorContextManager object at 0x7fef3ed23410>
      15:08:41       e = AssertionError('Expected final rebalance status: None\nFound: timeout')
      15:08:41       cscheme = <traceback_with_variables.color.ColorScheme object at 0x7fef3f594a90>
      15:08:41   File "/home/couchbase/jenkins/workspace/ns-server-cluster-tests/ns_server/cluster_tests/testlib/testlib.py", line 183, in apply_with_seed
      15:08:41     return getattr(obj, func)(*args)
      15:08:41       obj = <testsets.bucket_migration_test.BucketMigrationTest object at 0x7fef3ed21150>
      15:08:41       func = 'migrate_storage_mode_via_failover_test'
      15:08:41       args = []
      15:08:41       seed = b'\x11\xc7\xc1\x14M\x13\xc4\xf1eR\r\x02\x8b\x84\x8a\xcd'
      15:08:41       rand_state = (3, (2147483648, 3067226838, 539379446, 44504032, 2213556331, 1325318855, 341994103, 4065373004, 1736662814, 3321746883, 3832268596, 3216720752, 4215105593, 1485327632, 1347570338, 1711938777, 1763985259, 2517909544, 4065884858, 4289671405, 529153101, 3492725787, 2053740149, 3277390837, 2998983952, 3904050061, 110352404, 3371274982, 808479785, 1286042762, 626425255, 3339002176, 876716764, 592503662, 306333652, 3360562903, 1343977587, 4135536696, 212862081, 129997618, 1372807274, 413165004, 2093228232, 233160164, 2737101126, 3321827753, 1112372051, 2322863528, 2715114708, 2205896794, 1351325333, 2012125272, 176344618, 739569206, 3153957693, 2218156581, 576512850, 2244048564, 3040762061, 1592232966, 1066000556, 2220392945, 2828735643, 317550735, 1293434066, 2877516611, 2939175493, 1866800830, 2839098004, 1321036427, 2690016546, 1484619403, 1501851484, 939202463, 2516789375, 2346421129, 4256672416, 485193953, 1920406153, 4036873163, 3242787900, 1172389080, 3109024816, 3734292927, 11104334...
      15:08:41   File "/home/couchbase/jenkins/workspace/ns-server-cluster-tests/ns_server/cluster_tests/testsets/bucket_migration_test.py", line 170, in migrate_storage_mode_via_failover_test
      15:08:41     self.cluster.recover_node(node, do_rebalance=True)
      15:08:41       self = <testsets.bucket_migration_test.BucketMigrationTest object at 0x7fef3ed21150>
      15:08:41       nodes = [{'url': 'http://127.0.0.1:9001', 'hostname_cached': '127.0.0.1:9001', 'host': '127.0.0.1', 'port': 9001, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}, {'url': 'http://127.0.0.1:9000', 'hostname_cached': '127.0.0.1:9000', 'host': '127.0.0.1', 'port': 9000, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}]
      15:08:41       node = {'url': 'http://127.0.0.1:9000', 'hostname_cached': '127.0.0.1:9000', 'host': '127.0.0.1', 'port': 9000, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}
      15:08:41   File "/home/couchbase/jenkins/workspace/ns-server-cluster-tests/ns_server/cluster_tests/testlib/cluster.py", line 343, in recover_node
      15:08:41     self.rebalance(wait=True, verbose=verbose)
      15:08:41       self = {'nodes': [{'url': 'http://127.0.0.1:9000', 'hostname_cached': '127.0.0.1:9000', 'host': '127.0.0.1', 'port': 9000, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}, {'url': 'http://127.0.0.1:9001', 'hostname_cached': '127.0.0.1:9001', 'host': '127.0.0.1', 'port': 9001, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}, {'url': 'http://127.0.0.1:9002', 'hostname_cached': None, 'host': '127.0.0.1', 'port': 9002, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}, {'url': 'http://127.0.0.1:9003', 'hostname_cached': None, 'host': '127.0.0.1', 'port': 9003, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}], 'connected_nodes': [{'url': 'http://127.0.0.1:9001', 'hostname_cached': '127.0.0.1:9001', 'host': '127.0.0.1', 'port': 9001, 'auth': ('Administr...
      15:08:41       node = {'url': 'http://127.0.0.1:9000', 'hostname_cached': '127.0.0.1:9000', 'host': '127.0.0.1', 'port': 9000, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}
      15:08:41       recovery_type = 'full'
      15:08:41       do_rebalance = True
      15:08:41       verbose = False
      15:08:41       otp_nodes = {'127.0.0.1:9000': 'n_0@127.0.0.1', '127.0.0.1:9001': 'n_1@127.0.0.1'}
      15:08:41       otp_node = 'n_0@127.0.0.1'
      15:08:41       data = {'user': 'Administrator', 'password': 'asdasd', 'otpNode': 'n_0@127.0.0.1', 'recoveryType': 'full'}
      15:08:41       r = <Response [200]>
      15:08:41   File "/home/couchbase/jenkins/workspace/ns-server-cluster-tests/ns_server/cluster_tests/testlib/cluster.py", line 204, in rebalance
      15:08:41     assert error is expected_error, \
      15:08:41       self = {'nodes': [{'url': 'http://127.0.0.1:9000', 'hostname_cached': '127.0.0.1:9000', 'host': '127.0.0.1', 'port': 9000, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}, {'url': 'http://127.0.0.1:9001', 'hostname_cached': '127.0.0.1:9001', 'host': '127.0.0.1', 'port': 9001, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}, {'url': 'http://127.0.0.1:9002', 'hostname_cached': None, 'host': '127.0.0.1', 'port': 9002, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}, {'url': 'http://127.0.0.1:9003', 'hostname_cached': None, 'host': '127.0.0.1', 'port': 9003, 'auth': ('Administrator', 'asdasd'), 'data_path_cache': None, 'tls_port_cache': None, 'otp_node_cached': None}], 'connected_nodes': [{'url': 'http://127.0.0.1:9001', 'hostname_cached': '127.0.0.1:9001', 'host': '127.0.0.1', 'port': 9001, 'auth': ('Administr...
      15:08:41       ejected_nodes = None
      15:08:41       wait = True
      15:08:41       timeout_s = 600
      15:08:41       verbose = False
      15:08:41       expected_error = None
      15:08:41       initial_code = 200
      15:08:41       initial_expected_error = None
      15:08:41       known_nodes_string = 'n_0@127.0.0.1,n_1@127.0.0.1'
      15:08:41       ejected_nodes_string = ''
      15:08:41       data = {'knownNodes': 'n_0@127.0.0.1,n_1@127.0.0.1', 'ejectedNodes': ''}
      15:08:41       error = 'timeout'
      15:08:41       otp_nodes = {'127.0.0.1:9000': 'n_0@127.0.0.1', '127.0.0.1:9001': 'n_1@127.0.0.1'}
      15:08:41 builtins.AssertionError: Expected final rebalance status: None
      15:08:41 Found: timeout
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              peter.searby Peter Searby
              timofey.barmin Timofey Barmin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty