Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30907

improve logging for errors processing cluster manager rest api response

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.5.0
    • 6.0.0
    • secondary-index

    Description

      Script to Repro

      ./testrunner -i /tmp/testexec.15579.ini -p get-cbcollect-info=True,GROUP=bucket_op -t eventing.eventing_rebalance.EventingRebalance.test_rebalance_out_all_eventing_nodes_and_rebalance_in_eventing_node_and_functions_should_be_restored,nodes_init=6,services_init=kv-kv-eventing-eventing-eventing-index:n1ql,dataset=default,groups=simple,reset_services=True,doc-per-day=20,GROUP=bucket_op

      We were doing eventing rebalance when we saw rebalance failures. There are 5 more scripts which failed with the same error. Let me know if you need them.

      Logs attached.

       From Automation Log

      [2018-08-13 07:11:25,474] - [rest_client:1598] ERROR - \{u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed [2018-08-13 07:11:25,506] - [rest_client:3134] INFO - Latest logs from UI on 172.23.107.94: [2018-08-13 07:11:25,506] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Rebalance exited with reason \{service_rebalance_failed,index,\n \{linked_process_died,<22839.7271.0>,\n \{timeout,\n \{gen_server,call,\n [<22839.3251.0>,\n \{call,"ServiceAPI.GetTaskList",\n #Fun<json_rpc_connection.0.125340786>},\n 60000]}}}}', u'shortText': u'message', u'serverTime': u'2018-08-13T07:11:15.690Z', u'module': u'ns_orchestrator', u'tstamp': 1534169475690, u'type': u'critical'} [2018-08-13 07:11:25,506] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Bucket "src_bucket" rebalance appears to be swap rebalance', u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:15.643Z', u'module': u'ns_vbucket_mover', u'tstamp': 1534169415643, u'type': u'info'} [2018-08-13 07:11:25,506] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Started rebalancing bucket src_bucket', u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:15.528Z', u'module': u'ns_rebalancer', u'tstamp': 1534169415528, u'type': u'info'} [2018-08-13 07:11:25,506] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Bucket "dst_bucket" rebalance appears to be swap rebalance', u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:15.502Z', u'module': u'ns_vbucket_mover', u'tstamp': 1534169415502, u'type': u'info'} [2018-08-13 07:11:25,506] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Started rebalancing bucket dst_bucket', u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:15.393Z', u'module': u'ns_rebalancer', u'tstamp': 1534169415393, u'type': u'info'} [2018-08-13 07:11:25,507] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Bucket "metadata" rebalance appears to be swap rebalance', u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:15.369Z', u'module': u'ns_vbucket_mover', u'tstamp': 1534169415369, u'type': u'info'} [2018-08-13 07:11:25,507] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Started rebalancing bucket metadata', u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:15.193Z', u'module': u'ns_rebalancer', u'tstamp': 1534169415193, u'type': u'info'} [2018-08-13 07:11:25,507] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.107.94','ns_1@172.23.108.109',\n 'ns_1@172.23.108.17','ns_1@172.23.108.192'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:15.044Z', u'module': u'ns_orchestrator', u'tstamp': 1534169415044, u'type': u'info'} [2018-08-13 07:11:25,507] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u'Enabled auto-failover with timeout 120 and max count 1', u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:14.865Z', u'module': u'auto_failover', u'tstamp': 1534169414865, u'type': u'info'} [2018-08-13 07:11:25,507] - [rest_client:3135] ERROR - \{u'node': u'ns_1@172.23.108.192', u'code': 0, u'text': u"Haven't heard from a higher priority node or a master, so I'm taking over.", u'shortText': u'message', u'serverTime': u'2018-08-13T07:10:14.857Z', u'module': u'mb_master', u'tstamp': 1534169414857, u'type': u'info'}
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            vikas.chaudhary Vikas Chaudhary
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty